Grubbs' Test Calculator

Identify anomalies with statistical confidence. Detect significant outliers in your normally distributed datasets using the industry-standard Grubbs method.

Test My Data View Methodology

Grubbs' Test Calculator: Fast

Detect outliers in your dataset using Grubbs' test. Our calculator provides the G statistic, critical value, and step-by-step mathematical computations.

Enter your data points

#	Value

Significance Level (α)

0.05 0.01 0.10

How to Use This Grubbs' Test Calculator

Outlier Detection

Identifies statistically significant outliers in univariate datasets using Grubbs' method.

Input Your Data

Enter comma-separated numbers and select a significance level (α = 0.05 or 0.01).

Instant Results

Get G statistic, critical value, p-value, and a clear outlier verdict.

Best For

Lab data, quality control, sensor readings, and any normally distributed sample.

Always verify your data is approximately normal before applying Grubbs' test — non-normal data can produce misleading results.

What Is Grubbs' Test?

📊 Grubbs' test (also called the maximum normed residual test or ESDM — Extreme Studentized Deviate Method for a single outlier) is a statistical procedure used to detect a single outlier in a univariate dataset that follows an approximately normal distribution. It was developed by Frank E.

📊 Grubbs, an American statistician who published the test in his landmark 1969 paper "Procedures for Detecting Outlying Observations in Samples" in Technometrics. Grubbs' work built on earlier contributions by Thompson (1935) and others, but his systematic treatment of the problem — including critical value tables for various sample sizes and significance levels — made the test accessible and widely adopted.

📊 The test evaluates the null hypothesis H₀: there are no outliers in the data against the alternative hypothesis H₁: there is exactly one outlier. It works by identifying the data point that is furthest from the sample mean, computing a test statistic G that measures how extreme that point is relative to the overall spread of the data, and comparing G against a critical value derived from the t-distribution.

📊 If G exceeds the critical value at the chosen significance level α, the null hypothesis is rejected and the extreme point is declared a statistical outlier. Grubbs' test is particularly popular in quality control, analytical chemistry, and laboratory medicine, where a single contaminated sample, instrument malfunction, or transcription error can produce an observation that is dramatically different from the rest. The test assumes that the underlying data (excluding the potential outlier) follows a normal distribution, which means it should only be applied after verifying normality — you can use our Regression Assumptions Checker to assess normality on your dataset. When multiple outliers are suspected, Grubbs' test should be applied iteratively (remove one outlier at a time and re-run), though this inflates the overall Type I error rate. For detecting multiple outliers simultaneously, consider the Generalized ESD (Extreme Studentized Deviate) test or Rosner's test instead.

Grubbs Test Formula Calculator Explained

📊 The Grubbs test statistic is defined as G = max|xᵢ − x̄| / s, where x̄ is the sample mean, s is the sample standard deviation (using the divisor n − 1), and the numerator selects the largest absolute deviation from the mean among all n observations.

📊 This ratio measures how many standard deviations the most extreme data point lies from the center of the distribution — conceptually similar to a z-score, but computed using sample estimates rather than known population parameters. A large G value indicates that the most extreme point is far from the bulk of the data relative to the overall spread, suggesting it may be an outlier.

📊 The critical value G_crit against which G is compared is derived from the t-distribution. For a two-tailed test (where either the maximum or the minimum could be an outlier), the critical value is G_crit = (n − 1) / √n × √(t²_{α/(2n), n−2} / (n − 2 + t²_{α/(2n), n−2})), where t_{α/(2n), n−2} is the critical value from the t-distribution with n − 2 degrees of freedom at a Bonferroni-adjusted significance level of α/(2n).

📐 The Bonferroni adjustment accounts for the fact that we are simultaneously testing both tails of the distribution. For a one-tailed test (where you suspect the outlier is specifically the maximum or specifically the minimum), the formula uses α/n instead of α/(2n). If G > G_crit, you reject the null hypothesis and conclude that the extreme point is a statistical outlier at the α significance level.

📊 The p-value can be approximated from the relationship between G and the t-distribution: p ≈ 2n × (1 − T_n−2(t_G)) for a two-tailed test, where T_n−2 is the cumulative distribution function of the t-distribution with n − 2 degrees of freedom and t_G = |G|√(n − 2) / √((n − 1)(n − 2 + (|G|²/(n − 1)))).

⚠️ Common significance levels are α = 0.05 (5% chance of falsely declaring an outlier) and α = 0.01 (1% chance). A lower α makes the test more conservative — fewer points will be flagged as outliers, but there is less risk of removing a genuine data point.

Component	Symbol	Description
Test Statistic	G	Maximum absolute deviation from the mean divided by the standard deviation
Sample Mean	x̄	Sum of all values divided by the number of observations n
Standard Deviation	s	Square root of the sum of squared deviations divided by (n−1)
Critical Value	Gcrit	Derived from the t-distribution at the chosen significance level α
Significance Level	α	Type I error rate, typically 0.05 or 0.01
p-value	p	Probability of observing a G statistic this extreme under the null hypothesis

1 Grubbs test statistic: G = max|xᵢ − x̄| / s

2 Critical value (two-tailed): Gcrit = (n − 1) / √n × √(t²/(n − 2 + t²)) where t = t_{α/(2n), n−2}

3 Critical value (one-tailed): Gcrit = (n − 1) / √n × √(t²/(n − 2 + t²)) where t = t_{α/n, n−2}

4 Sample mean: x̄ = Σxᵢ / n

5 Sample standard deviation: s = √[Σ(xᵢ − x̄)² / (n − 1)]

6 Decision rule: Reject H₀ if G > Gcrit (the extreme point is a statistical outlier)

How Grubbs' Test Works

Calculate the sample mean (x̄) and standard deviation (s)

Find the observation furthest from the mean: max|xᵢ − x̄|

Compute the G statistic: G = max|xᵢ − x̄| / s

Compare G to the critical value from the t-distribution

If G > critical value, the extreme point is an outlier

Assumptions

The data comes from a normally distributed population

Only one outlier may be present (test one at a time)

The test is applied to a single variable (univariate)

The sample size should be at least 3

When to Use Grubbs' Test

📊 Grubbs' test is the appropriate tool when you need a formal, objective statistical test to determine whether a single data point in a univariate dataset is an outlier. It is most effective when the data (excluding the suspected outlier) is approximately normally distributed, when you have at least 6 to 8 observations, and when you suspect that exactly one extreme value is present.

📊 Common scenarios include validating regression assumptions by checking for influential outliers, auditing data for entry errors, and quality-checking measurements before computing a Pearson correlation or fitting a regression model. Grubbs' test provides a rigorous alternative to informal rules like "remove points more than 3 standard deviations from the mean" because it accounts for sample size through the critical value. The table below summarizes common fields where Grubbs' test is applied with concrete examples.

You suspect a single outlier in normally distributed data

You want a formal statistical test rather than visual inspection

You need to decide whether to remove an extreme data point

You are quality-checking measurements for anomalies

Frequently Asked Questions

What does Grubbs' test detect?

📊 Grubbs' test detects a single outlier in a univariate dataset that is assumed to follow an approximately normal distribution. It tests whether the most extreme data point (the observation furthest from the sample mean) is statistically significantly different from the rest of the data.

📊 The null hypothesis is that no outliers exist; the alternative is that exactly one outlier is present. If the test statistic G exceeds the critical value, the extreme point is declared an outlier. Grubbs' test does not identify which direction the outlier lies in — it simply flags the most extreme point regardless of whether it is the maximum or minimum value.

What is the difference between one-tailed and two-tailed Grubbs' test?

⚠️ A two-tailed Grubbs' test tests whether the most extreme value in either direction (the maximum or the minimum) is an outlier. This is the default and most common form, because typically you do not know in advance whether the outlier will be unusually high or unusually low. A one-tailed Grubbs' test tests whether only the maximum value is an outlier (upper-tailed) or only the minimum value is an outlier (lower-tailed).

📐 Use the one-tailed version when you have a priori knowledge about the direction of the potential outlier — for example, if you are specifically concerned about contamination inflating a measurement upward, you would use the upper-tailed test. The one-tailed test is more powerful for detecting outliers in the specified direction because it uses α/n rather than α/(2n) in the critical value calculation, but it will completely miss an outlier in the opposite direction.

Why does Grubbs' test assume normality?

📊 Grubbs' test derives its critical values from the assumption that the data (excluding the outlier) follows a normal distribution. The critical value depends on the t-distribution, which itself assumes normally distributed data. If the underlying data is heavily skewed or has heavy tails (e.

📊 g., from a log-normal, exponential, or Cauchy distribution), the test may produce too many false positives — flagging legitimate extreme values from the tails of the non-normal distribution as outliers. Conversely, with light-tailed distributions, the test may fail to detect genuine outliers.

📊 Always verify normality before applying Grubbs' test using a histogram, Q-Q plot, or formal normality test. If normality is violated, consider using Dixon's Q-test (for small samples), the median absolute deviation (MAD) method, or a robust outlier detection approach that does not assume normality.

What sample size do I need for Grubbs' test?

🔍 The mathematical minimum is n = 3, but Grubbs' test is unreliable with very small samples because the critical values become extremely high — with n = 3 at α = 0.05, the critical value is approximately 1.155, meaning an observation must deviate by more than 1.155 standard deviations to be flagged, which is barely distinguishable from normal variation.

🔍 With n = 6, G_crit ≈ 1.887, and with n = 10, G_crit ≈ 2.286. For practical reliability, use at least n ≥ 8 observations. With larger samples (n > 30), the critical value approaches a stable limit near 3.0 for α = 0.05, and the test becomes more powerful at detecting moderate outliers. However, very large samples can also make the test overly sensitive — detecting deviations that are statistically significant but practically unimportant.

Can I use Grubbs' test iteratively for multiple outliers?

⚠️ Yes, Grubbs' test can be applied iteratively: run the test, remove the identified outlier if one is found, then re-run the test on the remaining n − 1 data points, and repeat until no more outliers are detected. However, this procedure has an important limitation: each iteration inflates the overall Type I error rate.

❌ If you run the test k times at α = 0.05, the probability of falsely declaring at least one non-outlier point as an outlier across all iterations can be substantially higher than 0.05. To control this, apply a Bonferroni correction by using α/k for the k-th iteration, or use the Generalized Extreme Studentized Deviate (ESD) test (also called Rosner's test), which is specifically designed to detect multiple outliers while controlling the overall error rate.

💻 Our calculator performs a single Grubbs' test; for iterative testing, you would need to manually remove the outlier and re-run the test.

How does Grubbs' test compare to Dixon's Q-test?

📊 Both Grubbs' test and Dixon's Q-test detect single outliers in univariate data, but they differ in their approach and applicable sample sizes. Grubbs' test uses the mean and standard deviation of the entire dataset, making it sensitive to the outlier itself (because the outlier inflates both statistics).

🔍 Dixon's Q-test uses only the gap between the suspected outlier and its nearest neighbor, divided by the overall range, making it less influenced by the outlier on the denominator. Dixon's Q-test is preferred for very small samples (n = 3 to 7) because its critical values are well-established for these sizes and the test is more powerful when n is tiny.

📊 Grubbs' test is preferred for larger samples (n ≥ 8) because it makes fuller use of the data and has better statistical power. Both tests assume normality. For datasets with n > 25, Grubbs' test is generally the better choice.

What happens if I remove an outlier and recalculate?

📊 When you remove an outlier identified by Grubbs' test, the sample mean and standard deviation change — typically, the mean shifts away from the outlier and the standard deviation decreases, because the extreme value that was inflating the variance has been removed.

⚠️ This means the remaining data will appear more tightly clustered, which can sometimes reveal a second outlier that was previously masked by the first. However, removing data points should never be done lightly — always investigate the cause of the outlier before deleting it. If the outlier arose from a measurement error, equipment malfunction, or data-entry mistake, removal is justified.

📊 If the outlier represents a genuine rare event, removing it may obscure important information about the population. Document every removal with a justification in your analysis report.

What is the Grubbs test statistic G?

📊 The Grubbs test statistic G = max|xᵢ − x̄| / s is the ratio of the largest absolute deviation from the sample mean to the sample standard deviation. Conceptually, it tells you how many standard deviations the most extreme observation lies from the center of the data — similar to a z-score, but computed from sample statistics rather than known population parameters.

📊 In a normal distribution without outliers, G typically ranges from about 1.5 to 2.5 depending on sample size. Values significantly above the critical value indicate that the most extreme point is implausibly far from the rest of the data, given the assumed normal distribution. The larger G is relative to the critical value, the stronger the evidence that the extreme point is an outlier.

How is the critical value for Grubbs' test determined?

📐 The critical value G_crit is derived from the t-distribution and depends on two parameters: the sample size n and the significance level α. For a two-tailed test, the formula is G_crit = (n − 1) / √n × √(t²_{α/(2n), n−2} / (n − 2 + t²_{α/(2n), n−2})), where t_{α/(2n), n−2} is the (1 − α/(2n)) quantile of the t-distribution with n − 2 degrees of freedom.

🔍 The α/(2n) term reflects a Bonferroni adjustment for testing both tails simultaneously. As n increases, the critical value increases because larger samples are more likely to contain extreme observations by chance alone, so the threshold for declaring an outlier must be higher. As α decreases (e.g., from 0.05 to 0.01), the critical value also increases, making the test more conservative.

Can Grubbs' test be used for non-normal data?

📊 Grubbs' test is not recommended for non-normal data because its critical values and p-values are derived under the assumption of normality. If your data is skewed (e.g., income data, reaction times), the test may flag legitimate tail values as outliers, producing false positives. If your data has heavy tails (e.g., financial returns), the test may also over-detect.

📊 For non-normal data, consider these alternatives: (1) Transform the data (e.g., log transformation for right-skewed data) to achieve approximate normality, then apply Grubbs' test on the transformed values. (2) Use the median absolute deviation (MAD) method, which uses the median and MAD instead of the mean and standard deviation and is robust to non-normality. (3) Use a boxplot rule (1.5 × IQR), which does not assume normality. (4) Use robust z-scores based on the median and MAD.

What is the difference between Grubbs' test and the Generalized ESD test?

❌ Grubbs' test detects exactly one outlier per application. The Generalized Extreme Studentized Deviate (ESD) test, also known as Rosner's test, is an extension designed to detect up to k outliers simultaneously while controlling the overall Type I error rate.

📊 It works by running Grubbs' test k times iteratively, each time removing the most extreme point, and then comparing each test statistic to a progressively adjusted critical value. The key advantage over simple iterative Grubbs' testing is that the Generalized ESD test uses pre-specified critical values that account for the multiple testing, so the overall error rate remains at α.

🔍 If you suspect multiple outliers, the Generalized ESD test is statistically more rigorous than running Grubbs' test repeatedly with the same α level.

What significance level should I use for Grubbs' test?

📊 The choice of significance level α balances two risks: α = 0.05 is the most common choice and provides a 5% chance of falsely declaring a legitimate data point as an outlier (Type I error). α = 0.01 is more conservative — only a 1% chance of a false positive, but a higher chance of missing a real outlier (Type II error).

📊 Use α = 0.05 when the cost of missing an outlier is high (e.g., undetected contamination in pharmaceutical manufacturing could endanger patients). Use α = 0.01 when the cost of incorrectly removing a data point is high (e.g., in clinical trials where removing a patient's data could bias the results). Some practitioners recommend α = 0.10 for exploratory analysis to cast a wider net, then investigating flagged points carefully before deciding on removal.

Does Grubbs' test work for paired or multivariate data?

📊 The standard Grubbs' test is designed for univariate data only — a single variable measured on n observations. It cannot be directly applied to paired (X, Y) data or multivariate datasets. For paired data, you can apply Grubbs' test separately to each variable, but this misses outliers that are unusual in the joint distribution (e.g., a point with individually normal X and Y values but an extreme combination).

📊 For multivariate outlier detection, use Mahalanobis distance, which measures how far each observation is from the multivariate mean accounting for correlations between variables. Outliers are identified by comparing Mahalanobis distances to a chi-squared distribution. For residuals from a regression analysis, apply Grubbs' test to the residual values, which are univariate, to detect outlier residuals that may indicate influential data points.

Can I use Grubbs' test on residuals from regression?

📊 Yes, applying Grubbs' test to regression residuals is a valid way to detect a single outlier residual — an observation whose vertical distance from the regression line is unusually large. This can identify influential data points that may be distorting the regression model. However, be aware that the residuals from OLS regression are not fully independent (they sum to zero and have n − p − 1 degrees of freedom), so the nominal p-value from Grubbs' test is only approximate.

📊 For simple linear regression with one predictor, the effective degrees of freedom are n − 2 rather than n − 1, which slightly affects the critical value. For practical purposes, this approximation is usually adequate, especially with moderate sample sizes (n > 10). You can use our Regression Assumptions Checker to complement Grubbs' test with a full assessment of regression assumptions.

What is the maximum normed residual test?

📊 The maximum normed residual test is another name for Grubbs' test. The term "normed" refers to the fact that the absolute deviation is divided by (normalized by) the standard deviation s, producing a dimensionless ratio G.

📊 The term "maximum" indicates that the test focuses on the single largest such ratio — the observation with the greatest standardized distance from the mean. This alternative name is commonly used in engineering standards, particularly in ASTM E178 — the American Society for Testing and Materials standard for dealing with outlying observations. The mathematical formulation and interpretation are identical regardless of which name is used.