{
  "linear-regression-basics": {
    "en": {
      "id": "en/linear-regression-basics",
      "data": {
        "title": "The Ultimate Guide to Linear Regression: From Theory to Application",
        "description": "Master linear regression with this 2000+ word guide. Explore the history, mathematical derivation, real-world case studies, and advanced FAQs. Learn how to use a regression equation calculator with steps to optimize your analysis.",
        "image": "/blog/linear-regression.svg",
        "date": "2025-01-15T00:00:00.000Z",
        "author": "Regression Equation Calculator",
        "category": "Statistics"
      },
      "body": "import InteractiveImage from '../../../theme/components/InteractiveImage.astro';\nimport MathFormula from '../../../theme/components/MathFormula.astro';\nimport Callout from '../../../theme/components/Callout.astro';\n\nImagine you're a business owner trying to predict next month's revenue based on your advertising spend. Or perhaps you're a student wondering how study hours translate into exam scores, or a farmer estimating crop yield from rainfall.\n\nIn each case, you have two variables: one you can measure or control (ad spending, study time, rainfall) and one you want to predict (revenue, exam score, yield).\n\n**Linear regression** is the simplest and most fundamental tool for making these predictions. It draws the best-fitting straight line through your data points, giving you an equation you can use to forecast, explain, and understand the relationship between your variables.\n\nIn this guide, we'll cover everything you need to know about simple linear regression: the equation, the math behind it, how to interpret every part of the output, the assumptions you must verify, real-world applications, and common mistakes.\n\n<Callout type=\"conceptual\" title=\"The Core Idea\">\nLinear regression finds the straight line that best describes how a dependent variable changes as an independent variable changes. Think of it as summarizing a cloud of data points into a single, predictable mathematical rule. For complex problems, you might eventually need [multiple regression analysis](/blog/multiple-regression-explained) or a [step-by-step mathematical walkthrough](/blog/simple-linear-regression-step-by-step) to see the inner workings.\n</Callout>\n\n---\n\n## What Is Linear Regression?\n\nLinear regression is a statistical method that models the relationship between **one dependent (response) variable** and **one independent (predictor) variable** by fitting a straight line through the data. \n\nThe \"simple\" in \"simple linear regression\" means there is exactly one predictor. This distinguishes it from [multiple regression](/multiple-regression-calculator/), which handles two or more predictors to explain a single outcome.\n\nThe goal is to find the line that minimizes the total prediction error across all data points. This line — called the **regression line** or **line of best fit** — becomes your model. \n\nOnce you have it, you can plug in any value of x and get a predicted value of y. For a deeper technical dive, you can explore the [Simple Linear Regression entry on Wikipedia](https://en.wikipedia.org/wiki/Simple_linear_regression).\n\n### Why \"Linear\"?\n\nThe term \"linear\" refers to the fact that the relationship is modeled as a straight line. If the true relationship between your variables is curved — for example, diminishing returns — a simple linear model may not capture it well. \n\nIn such cases, you might need to use a [regression curve calculator](/regression-curve-calculator/) or explore polynomial regression to better fit your data's unique shape.\n\n### Correlation vs. Regression\n\nPeople often confuse correlation with regression. **Correlation** measures the *strength and direction* of a linear relationship (a single number between −1 and +1). You can calculate this using our [Pearson correlation calculator](/pearson-correlation-calculator/).\n\n**Regression** goes further — it gives you the *equation* of the line, so you can make predictions. Correlation tells you \"these variables are related\"; regression tells you \"here's exactly how they're related.\"\n\n\n---\n\n## The Linear Regression Equation\n\nThe simple linear regression equation takes the familiar form:\n\n<MathFormula formula=\"y = mx + b\" description=\"Common Algebraic Form\" />\n\nOr, using statistical notation:\n\n<MathFormula formula=\"y = b₀ + b₁x\" description=\"Statistical Form\" />\n\nWhere:\n- **y** is the dependent (response) variable — the outcome you're predicting.\n- **x** is the independent (predictor) variable — the input you're using to make the prediction.\n- **b₁** (or **m**) is the **slope** — the change in y for every one-unit increase in x.\n- **b₀** (or **b**) is the **y-intercept** — the predicted value of y when x equals zero.\n\n### Interpreting the Slope\n\nThe slope is the heart of the regression equation. It tells you the **rate of change**: for every one-unit increase in x, y changes by **b₁** units.\n\n**Example:** If you model exam scores (y) against study hours (x) and get a slope of 5, it means each additional hour of study is associated with a 5-point increase in exam score, on average.\n\n- A **positive slope** means y increases as x increases (positive relationship).\n- A **negative slope** means y decreases as x increases (negative relationship).\n- A **slope of zero** means there is no linear relationship — the line is flat.\n\n### Interpreting the Intercept\n\nThe intercept **b₀** is the predicted value of y when x = 0. Sometimes this has a clear real-world meaning (e.g., predicted revenue with zero ad spend), but often it's just a mathematical anchor. \n\nIf x = 0 is far outside the range of your data (e.g., predicting salary from years of experience), the intercept may not have a meaningful interpretation. In such cases, what matters is the **predictions within your data's range**, not the extrapolated value at x = 0.\n\n### The Predicted vs. Actual Equation\n\nAn important distinction: the regression equation predicts the **average** value of y for a given x. Any individual observation will typically differ from the prediction. The full model is:\n\n<MathFormula formula=\"y = b₀ + b₁x + ε\" description=\"Full Population Model\" />\n\nWhere **ε** (epsilon) is the **residual** — the difference between the actual and predicted value. This error term captures everything the model doesn't explain: random variation, measurement error, and the influence of variables not included in the model.\n\n---\n\n## How Linear Regression Works: The Least Squares Method\n\n<InteractiveImage src=\"/blog/linear-regression-how.svg\" alt=\"How linear regression works — the regression line minimizes the sum of squared residuals\" />\n\nLinear regression uses the [method of least squares](https://www.statology.org/least-squares-method/) (also called ordinary least squares, or OLS) to find the best-fitting line. But what does \"best-fitting\" actually mean?\n\n### Residuals: The Distance from the Line\n\nFor each data point, the regression line makes a prediction. The difference between the actual y value and the predicted y value is called the **residual** (or error):\n\n**Residual = yᵢ − ŷᵢ**\n\nWhere **yᵢ** is the actual value and **ŷᵢ** (y-hat) is the predicted value from the line.\n\nSome residuals are positive (the point is above the line), some are negative (below the line). If you simply added them up, positive and negative residuals would cancel out, making a terrible line look good. To avoid this, we square each residual before summing.\n\n### Minimizing the Sum of Squared Residuals\n\nThe least squares method finds the values of **b₀** and **b₁** that minimize:\n\n<MathFormula formula=\"SSR = Σ(yᵢ − ŷᵢ)²\" description=\"Sum of Squared Residuals\" />\n\nThis is a calculus optimization problem. By taking partial derivatives with respect to b₀ and b₁, setting them equal to zero, and solving, we get closed-form formulas:\n\n<MathFormula formula=\"b₁ = \\frac{Σ(xᵢ − x̄)(yᵢ − ȳ)}{Σ(xᵢ − x̄)²}\" description=\"Slope Calculation\" />\n\n<MathFormula formula=\"b₀ = ȳ − b₁x̄\" description=\"Intercept Calculation\" />\n\nWhere **x̄** and **ȳ** are the means (averages) of x and y respectively.\n\nThis is why you'll often see the slope described as the ratio of the **covariance of x and y** to the **variance of x**. The numerator captures how x and y move together; the denominator normalizes by how much x varies on its own.\n\n### Why Squared Residuals?\n\nYou might wonder: why square the residuals instead of using absolute values? Three reasons:\n\n1. **Squaring penalizes large errors more heavily** — a residual of 4 contributes 16 to the sum, while two residuals of 2 contribute only 8 total. This makes the line more sensitive to outliers and generally produces a better fit for the majority of data.\n2. **Squaring makes all values positive**, preventing positive and negative residuals from canceling out.\n3. **Squaring produces differentiable functions**, which means calculus can be used to find the exact minimum. The sum of absolute values has a \"kink\" at zero that makes optimization harder.\n\n---\n\n## Step-by-Step: Calculating a Linear Regression\n\n<InteractiveImage src=\"/blog/linear-regression-steps.svg\" alt=\"Four steps to calculate a linear regression equation\" />\n\nLet's walk through a complete example with real numbers. Suppose we have the following data showing study hours (x) and exam scores (y):\n\n| Student | Study Hours (x) | Exam Score (y) |\n|---|---|---|\n| A | 2 | 65 |\n| B | 4 | 75 |\n| C | 6 | 80 |\n| D | 8 | 90 |\n| E | 10 | 95 |\n\n### Step 1: Calculate the Means\n\n**x̄ = (2 + 4 + 6 + 8 + 10) / 5 = 6**\n\n**ȳ = (65 + 75 + 80 + 90 + 95) / 5 = 81**\n\n### Step 2: Compute the Deviations and Their Products\n\n| xᵢ | yᵢ | (xᵢ − x̄) | (yᵢ − ȳ) | (xᵢ − x̄)(yᵢ − ȳ) | (xᵢ − x̄)² |\n|---|---|---|---|---|---|\n| 2 | 65 | −4 | −16 | 64 | 16 |\n| 4 | 75 | −2 | −6 | 12 | 4 |\n| 6 | 80 | 0 | −1 | 0 | 0 |\n| 8 | 90 | 2 | 9 | 18 | 4 |\n| 10 | 95 | 4 | 14 | 56 | 16 |\n| **Sum** | | | | **150** | **40** |\n\n### Step 3: Calculate the Slope and Intercept\n\n**b₁ = 150 / 40 = 3.75**\n\n**b₀ = 81 − 3.75 × 6 = 81 − 22.5 = 58.5**\n\n### Step 4: Write the Regression Equation\n\n**y = 58.5 + 3.75x**\n\n**Interpretation:** A student who studies 0 hours is predicted to score 58.5 points. Each additional hour of study is associated with a 3.75-point increase in the exam score. A student studying 7 hours would be predicted to score: 58.5 + 3.75 × 7 = **84.75**.\n\nWant to skip the manual math? Our [Regression Equation Calculator](/) performs all these calculations instantly — with a full step-by-step breakdown so you can see exactly how each number was derived.\n\n---\n\n## Key Statistics: Measuring How Good the Model Is\n\nGetting the regression equation is only half the job. You also need to know how well the line actually fits the data. Several statistics help answer this question.\n\n### R-Squared (R²)\n\n**R²** (the coefficient of determination) measures the proportion of variance in y that is explained by x. It ranges from **0 to 1**:\n\n- **R² = 0** — the model explains none of the variation in y (the line is flat)\n- **R² = 1** — the model explains all the variation (every point falls exactly on the line)\n- **R² = 0.85** — 85% of the variation in y is explained by x\n\nThe formula:\n\n<MathFormula formula=\"R² = 1 − \\frac{SS_{res}}{SS_{tot}}\" description=\"Coefficient of Determination\" />\n\nWhere **SSᵣₑₛ** is the sum of squared residuals (unexplained variation) and **SSₜₒₜ** is the total sum of squares (total variation in y).\n\nAs a rule of thumb:\n- **R² > 0.7** — strong relationship (common in physical sciences)\n- **0.3 < R² < 0.7** — moderate relationship (common in social sciences)\n- **R² < 0.3** — weak relationship (the model may not be very useful)\n\n### Correlation Coefficient (r)\n\nThe **Pearson correlation coefficient r** measures the strength and direction of the linear relationship between x and y. It ranges from **−1 to +1**:\n\n- **r = +1** — perfect positive linear relationship\n- **r = 0** — no linear relationship\n- **r = −1** — perfect negative linear relationship\n\nFor simple linear regression, **R² = r²**. The sign of r tells you the direction of the relationship, and squaring it gives you the proportion of variance explained.\n\n### Standard Error of the Estimate\n\nThe **standard error of the estimate (Sₑ)** measures the average distance that the observed values fall from the regression line. It's in the same units as y, making it intuitive:\n\n<MathFormula formula=\"S_e = \\sqrt{\\frac{SS_{res}}{n−2}}\" description=\"Standard Error\" />\n\nA smaller Sₑ means the data points are closer to the line — the predictions are more precise. Roughly, about 95% of actual values will fall within **±2Sₑ** of the predicted value.\n\n### P-Values and Statistical Significance\n\nThe **p-value** for the slope tests the null hypothesis that the true slope is zero (i.e., no linear relationship). A small p-value (typically < 0.05) means the observed relationship is unlikely to be due to chance, and the slope is **statistically significant**.\n\nKey points about p-values:\n- **p < 0.05** — statistically significant at the 5% level\n- **p < 0.01** — statistically significant at the 1% level (stronger evidence)\n- **p > 0.05** — not statistically significant; the relationship may be due to random variation\n- A significant p-value does **not** mean the model is good — it only means the slope is probably not zero. Always check R² and residual plots too.\n\n---\n\n## The Four Key Assumptions of Linear Regression\n\nBefore you can trust the results of a linear regression, you must verify that the data meets four critical assumptions. Violating these can produce misleading coefficients and unreliable predictions.\n\nYou can use our [regression assumptions checker](/regression-assumptions-checker/) to test your data automatically against these criteria.\n\n### 1. Linearity\nThe relationship between x and y must be approximately linear. If the true relationship is curved, a straight line will systematically mispredict in certain ranges.\n\n**How to check:** Create a scatter plot of x vs. y. If the points follow a curve, the linearity assumption is violated. Also check residual plots — a random scatter of residuals supports linearity.\n\n### 2. Independence of Errors\nResiduals must be independent of each other. This is especially important for time-series data, where consecutive observations are often correlated.\n\n### 3. Homoscedasticity (Constant Variance)\nThe variance of residuals should be roughly constant across all predicted values. If the spread of residuals increases or decreases with the predicted value (a \"funnel\" shape), the model has **heteroscedasticity**.\n\n### 4. Normality of Residuals\nThe residuals should be approximately normally distributed. This assumption matters for valid hypothesis tests (p-values) and confidence intervals.\n\n\n---\n\n## Interpreting the Output: What to Look For\n\nWhen you run a linear regression (whether using our calculator, R, Python, or Excel), the output typically includes several key numbers. Here's what each one tells you:\n\n### Slope (b₁) and Its Confidence Interval\n\n- **Sign** — positive or negative relationship direction\n- **Magnitude** — the size of the effect in original units (e.g., \"3.75 points per hour\")\n- **95% confidence interval** — the range within which the true slope likely falls. If the interval doesn't contain zero, the relationship is statistically significant at the 5% level.\n\n### R² Value\n\n- Indicates what percentage of the variation in y your model explains\n- **High R²** doesn't automatically mean the model is good — it could reflect overfitting or an outlier driving the result\n- **Low R²** doesn't mean the model is useless — in some fields (social sciences, biology), even modest R² values can be meaningful\n\n### p-Value for the Slope\n\n- Tests whether the observed relationship could be due to random chance\n- **p < 0.05** is the conventional threshold for statistical significance\n- A very small p-value with a low R² means: \"the relationship is real, but weak\"\n\n### Residual Standard Error\n\n- The typical size of prediction errors, in the same units as y\n- Useful for constructing prediction intervals: roughly, 95% of observations fall within ±2 × Sₑ of the predicted value\n\n---\n\n## Real-World Applications of Linear Regression\n\n<InteractiveImage src=\"/blog/linear-regression-applications.svg\" alt=\"Real-world applications of linear regression across industries\" />\n\nLinear regression is one of the most widely used statistical techniques across virtually every field. Here are some of the most common applications:\n\n### Business and Economics\n\nCompanies use linear regression to forecast sales from advertising spend, predict demand from price changes, and estimate costs from production volume. Economists model the relationship between consumer spending and income, or between unemployment and inflation (the Phillips Curve).\n\n### Science and Engineering\n\nPhysicists use linear regression to calibrate instruments (e.g., converting sensor readings to physical quantities). Biologists model the relationship between temperature and reaction rates. Engineers predict material fatigue from stress cycles.\n\n### Healthcare and Medicine\n\nResearchers examine how dosage relates to drug efficacy, how BMI correlates with blood pressure, or how exercise frequency relates to cholesterol levels. Clinical trials often use regression to quantify treatment effects while controlling for baseline characteristics.\n\n### Education\n\nSchools and researchers predict student performance from study time, attendance, or socioeconomic indicators. This helps identify at-risk students early and allocate resources effectively.\n\n### Sports Analytics\n\nTeams and analysts use linear regression to predict player performance from training metrics, estimate win probability from game statistics, and evaluate the impact of coaching changes.\n\n### Environmental Science\n\nScientists model the relationship between CO₂ concentration and temperature, between pollutants and health outcomes, or between rainfall and crop yield — informing policy and resource management.\n\n---\n\n## Common Mistakes and How to Avoid Them\n\nEven experienced analysts can fall into these traps when using linear regression:\n\n### Extrapolation\n\nDon't predict outside the range of your data. If your data covers study times from 2 to 10 hours, predicting the score for someone who studies 50 hours is unreliable — the relationship may not remain linear at extreme values. \n<Callout type=\"warning\" title=\"Prevention\">Only make predictions within (or very close to) the range of your training data, and always report that range.</Callout>\n\n### Confusing Correlation with Causation\n\nA significant regression slope means x and y are **associated** — it does **not** mean x *causes* y. Ice cream sales and drowning deaths are both correlated with temperature, but eating ice cream doesn't cause drowning. \n<Callout type=\"conceptual\" title=\"Prevention\">Use domain knowledge, consider confounding variables, and design experiments when possible.</Callout>\n\n### Ignoring Outliers\n\nA single extreme data point can dramatically shift the regression line, especially in small datasets. This is because least squares penalizes large errors quadratically — an outlier with a large residual exerts disproportionate influence. \n<Callout type=\"warning\" title=\"Prevention\">Always visualize your data first. Identify outliers and investigate whether they're genuine observations or data errors. Consider robust regression methods if outliers are legitimate but influential.</Callout>\n\n### Assuming Linearity Without Checking\n\nNot all relationships are linear. Diminishing returns, threshold effects, and exponential growth all produce curved relationships that a straight line will poorly represent. \n<Callout type=\"info\" title=\"Prevention\">Always plot your data before fitting a regression. If you see curvature, try transformations or polynomial terms.</Callout>\n\n### Over-relying on R²\n\nA high R² doesn't guarantee a good model. You can have a high R² with violated assumptions, meaningless relationships, or poor predictive performance on new data. Conversely, a low R² in some contexts (like predicting human behavior) may be perfectly acceptable. \n<Callout type=\"warning\" title=\"Prevention\">Always check residual plots, consider the practical significance of the effect size, and validate predictions on new data.</Callout>\n\n### Small Sample Sizes\n\nWith very few data points, the regression line is highly sensitive to individual observations. The slope might look impressive but be statistically non-significant due to the small sample. \n<Callout type=\"info\" title=\"Prevention\">Aim for at least 10–15 observations for simple regression. Report confidence intervals to show the uncertainty in your estimates.</Callout>\n\n---\n\n## When to Use (and Not Use) Linear Regression\n\n**Use linear regression when:**\n- You have **one predictor** and one outcome variable.\n- The relationship appears **approximately linear**.\n- Your dependent variable is **continuous** (numeric).\n- You want a **simple, interpretable** model.\n\n**Avoid linear regression when:**\n- Your outcome is **categorical** (e.g., pass/fail).\n- The relationship is **clearly curved** — in this case, try a [quadratic regression calculator](/quadratic-regression-calculator/) or [exponential regression calculator](/exponential-regression-calculator/).\n- You have **multiple predictors** — use [multiple regression](/multiple-regression-calculator/) instead.\n- Your data has severe **autocorrelation**.\n\n---\n\n## Linear Regression vs. Other Techniques\n\nLinear regression is just one tool in the statistical toolbox. Here's how it compares to alternatives:\n\n| Technique | When to Use | Key Difference |\n|---|---|---|\n| **Multiple Linear Regression** | 2+ predictors | Handles [multiple inputs](/multiple-regression-calculator/) simultaneously. |\n| **Logistic Regression** | Categorical outcome | Models probability (e.g., 0 to 1 range). |\n| **Polynomial Regression** | Curved relationships | Fits curves using [quadratic](/quadratic-regression-calculator/) or higher powers. |\n| **Exponential Regression** | Growth/Decay models | Models data using [exponential](/exponential-regression-calculator/) functions. |\n\nFor a broader perspective on how these techniques are used in finance and research, see the [Investopedia guide to Linear Regression](https://www.investopedia.com/terms/l/linearregression.asp).\n\n---\n\n## Try It Yourself: Use Our Calculator\n\nOur **Regression Equation Calculator** lets you enter data points and instantly get the regression equation, slope, intercept, R² value, and a detailed step-by-step breakdown.\n\n<a href=\"/\" class=\"not-prose my-8 flex items-center justify-center\">\n  <span class=\"inline-flex items-center gap-2 px-8 py-3.5 text-base font-semibold text-white bg-gradient-to-r from-primary-600 to-accent-600 rounded-xl shadow-lg shadow-primary-600/25 hover:shadow-xl hover:shadow-primary-600/35 hover:brightness-110 hover:scale-[1.03] active:scale-[0.98] transition-all duration-200\">\n    <svg class=\"w-5 h-5\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 7h6m0 10v-3m-3 3h.01M9 17h.01M9 14h.01M12 14h.01M15 11h.01M12 11h.01M9 11h.01M7 21h10a2 2 0 002-2V5a2 2 0 00-2-2H7a2 2 0 00-2 2v14a2 2 0 002 2z\" /></svg>\n    Try the Regression Equation Calculator — It's Free\n  </span>\n</a>\n\n---\n\n## Key Takeaways\n\n1. **Linear regression** models the relationship between one predictor and one outcome using a straight line.\n2. The **least squares method** finds the \"best\" fit by minimizing squared errors.\n3. The **slope** and **intercept** define the line; **R²** defines how well it fits.\n4. Always verify **linearity, independence, homoscedasticity, and normality**.\n5. Association is not causation — use domain knowledge to interpret findings.\n\n---\n\n## A Brief History of Linear Regression\n\nThe roots of linear regression stretch back to the late 18th and early 19th centuries, born from the needs of astronomers to navigate the stars. \n\n### The Method of Least Squares\nIn 1805, French mathematician **Adrien-Marie Legendre** published the first description of the method of least squares. However, **Carl Friedrich Gauss**, arguably the greatest mathematician of his era, claimed he had been using the method since 1795 to predict the orbits of celestial bodies like Ceres. This dispute between two giants of mathematics highlights how critical this tool was for scientific progress.\n\n### Sir Francis Galton and \"Regression\"\nThe term \"regression\" itself wasn't coined until 1886 by **Sir Francis Galton**. While studying the relationship between the heights of parents and their children, Galton observed that children of very tall parents tended to be shorter than their parents, while children of very short parents tended to be taller. He described this phenomenon as \"regression toward mediocrity\" (now known as regression to the mean). \n\nWhat started as a tool for tracking planets and measuring height has since evolved into the backbone of modern machine learning and econometrics.\n\n---\n\n## Deep Dive: Mathematical Derivation of OLS\n\nTo truly understand our [regression equation calculator with steps](/) , one must peek under the hood at the calculus that drives it. We want to minimize the sum of squared residuals:\n\n<MathFormula formula=\"S(b₀, b₁) = Σ(yᵢ - (b₀ + b₁xᵢ))²\" description=\"Objective Function\" />\n\nTo find the minimum, we take the partial derivative with respect to $b₀$ and $b₁$ and set them to zero.\n\n### Derivative with respect to $b₀$:\n<MathFormula formula=\"∂S/∂b₀ = -2Σ(yᵢ - b₀ - b₁xᵢ) = 0\" description=\"Partial b₀\" />\n\n### Derivative with respect to $b₁$:\n<MathFormula formula=\"∂S/∂b₁ = -2Σxᵢ(yᵢ - b₀ - b₁xᵢ) = 0\" description=\"Partial b₁\" />\n\nSolving these \"normal equations\" yields the formulas we use today:\n1. $b_1 = Cov(x,y) / Var(x)$\n2. $b₀ = ȳ - b₁x̄$\n\nThis derivation ensures that the resulting line is mathematically guaranteed to be the \"best\" fit for your data under the OLS framework.\n\n---\n\n## Case Study: Predicting Real Estate Energy Costs\n\nLet's look at a practical application. A property management firm wants to predict the monthly heating cost ($y$) of a building based on the outside temperature ($x$).\n\n**The Data:** Over 12 months, they record the average temperature and the heating bill.\n\n**The Model:** They use our [free regression equation calculator](/) and find:\n<MathFormula formula=\"\\text{Cost} = 250 - 4.5(\\text{Temperature})\" description=\"Heating Cost Model\" />\n\n**Interpretation:**\n- **The Intercept (250):** If it were 0 degrees outside, the predicted heating cost would be $250.\n- **The Slope (-4.5):** For every 1-degree increase in temperature, the heating bill drops by $4.50.\n\n**The Result:** The firm uses this equation to budget for the winter months. When the forecast calls for a 10-degree drop, they know to set aside an extra $45 per building.\n\n---\n\n## Advanced Frequently Asked Questions (FAQ)\n\n### 1. Is linear regression a machine learning algorithm?\nYes. While it is rooted in statistics, linear regression is considered a \"supervised learning\" algorithm in machine learning. It is often the first model taught in data science because it is highly interpretable and serves as a baseline for more complex models like neural networks.\n\n### 2. What happens if my residuals are not normally distributed?\nIf your sample size is large (typically >30), the Central Limit Theorem helps ensure that your coefficient estimates remain unbiased. However, non-normal residuals can make your p-values and confidence intervals unreliable. You might need to transform your data (e.g., using a log transform) or use a non-parametric regression method.\n\n### 3. What is the difference between R² and Adjusted R²?\nR² tells you how much variance is explained by your model. However, R² will always stay the same or increase as you add more variables, even if they are useless. [Multiple regression analysis](/blog/multiple-regression-explained) uses **Adjusted R²**, which penalizes you for adding \"junk\" variables, providing a more honest assessment of the model's quality.\n\n### 4. Can linear regression handle categorical data?\nYes, using **dummy variables**. For example, if you want to include \"Gender\" in a model, you can code \"Male\" as 1 and \"Female\" as 0. The coefficient then represents the average difference in the outcome between the two groups.\n\n### 5. Why is it called \"Simple\" Linear Regression?\nIt's \"simple\" because it only involves one predictor variable. If you add a second or third predictor (like predicting weight from both height and age), it becomes \"Multiple\" Linear Regression.\n\n### 6. How do I know if my model is \"good\"?\nA \"good\" model depends on the field. In physics, an R² of 0.99 might be expected. In psychology or social sciences, an R² of 0.30 could be considered a breakthrough. Always look at the **Residual Standard Error** to see the average error in the same units as your $y$ variable.\n\n### 7. What is \"Regression to the Mean\"?\nThis is the tendency for extreme scores to be followed by scores that are closer to the average. For example, a student who gets a perfect score on one test is likely to get a slightly lower score on the next, simply because luck usually balances out over time.\n\n### 8. Does a high R² prove causation?\nAbsolutely not. You could find a 0.99 R² between the number of pool drownings and the number of Nicholas Cage movies released in a year. This is a **spurious correlation**. Causation requires a logical mechanism and often experimental data.\n\n### 9. When should I use a Quadratic vs. Linear model?\nIf you plot your data and see a \"U\" shape or an inverted \"U\", a straight line will fail. In these cases, you should use a [quadratic regression calculator](/quadratic-regression-calculator/) which adds an $x²$ term to the equation.\n\n### 10. Can I use linear regression for time-series forecasting?\nYou can, but you must be careful about **autocorrelation** (where today's value depends on yesterday's). If your errors are correlated over time, the standard linear regression assumptions are violated, and you may need specialized time-series models like ARIMA.\n\n---\n\n## Pro-Tips for Using the Regression Equation Calculator\n\n1. **Clean Your Data:** One typo in your data table can ruin your entire model. Always double-check your inputs.\n2. **Standardize Units:** Ensure all your $x$ values use the same units (e.g., don't mix inches and centimeters).\n3. **Check the Range:** Only trust predictions that stay within the minimum and maximum values of your input data.\n4. **Use Visuals:** Always look at the scatter plot generated by the [regression equation calculator with steps](/) to ensure a line is actually appropriate.\n\nBy following these principles, you can turn raw data into powerful insights using the world's most trusted statistical tool.",
      "filePath": "src/blog/en/linear-regression-basics.mdx",
      "digest": "53cc3622af3436db",
      "deferredRender": true,
      "collection": "blog"
    }
  },
  "multiple-regression-explained": {
    "en": {
      "id": "en/multiple-regression-explained",
      "data": {
        "title": "Multiple Regression Analysis: The Ultimate Guide for Data Scientists",
        "description": "Master multiple regression analysis with this 2000+ word deep dive. Learn about partial coefficients, multicollinearity, Adjusted R², and matrix notation. Use our multiple regression calculator to solve complex problems with ease.",
        "image": "/blog/multiple-regression.svg",
        "date": "2025-02-10T00:00:00.000Z",
        "author": "Regression Equation Calculator",
        "category": "Statistics"
      },
      "body": "import MultiRegDemo from '../../../theme/components/MultiRegDemo.astro';\nimport InteractiveImage from '../../../theme/components/InteractiveImage.astro';\nimport MathFormula from '../../../theme/components/MathFormula.astro';\nimport Callout from '../../../theme/components/Callout.astro';\n\nImagine you're trying to predict house prices. Square footage alone gives you a rough estimate — but what about the number of bedrooms, the age of the property, or the neighborhood crime rate? \n\nWhen a single independent variable cannot adequately explain the variation in your dependent variable, **multiple regression analysis** comes into play.\n\nUnlike [simple linear regression](/blog/linear-regression-basics), which models the relationship between one predictor and one outcome, multiple regression lets you account for **two or more predictors simultaneously**. The result is a far more accurate, nuanced, and actionable model of what truly drives your outcome variable. If you want to see the math in action, check our [step-by-step guide](/blog/simple-linear-regression-step-by-step).\n\nIn this guide, we'll cover everything you need to know about multiple regression: the equation, how to interpret coefficients, key assumptions, real-world applications, and common pitfalls. For a high-level strategic perspective, see this [refresher on regression analysis by Harvard Business Review](https://hbr.org/2015/11/a-refresher-on-regression-analysis).\n\n---\n\n## What Is Multiple Regression Analysis?\n\nMultiple regression analysis is a statistical technique used to model the relationship between **one dependent (response) variable** and **two or more independent (predictor) variables**. \n\nIt extends simple linear regression to situations where multiple factors jointly influence an outcome. You can use our [multiple regression calculator](/multiple-regression-calculator/) to perform these analyses instantly.\n\n### The Hyperplane Concept\n\nThe core idea is straightforward: instead of fitting a line through data in two dimensions (x and y), multiple regression fits a **hyperplane** through data in three or more dimensions. \n\nEach predictor gets its own coefficient, telling you how much the outcome changes per unit change in that predictor, **holding all other predictors constant**. For more technical details, check the [Multiple Linear Regression entry on Wikipedia](https://en.wikipedia.org/wiki/Linear_regression#Multiple_linear_regression).\n\n### Why \"Holding Constant\" Matters\n\nThis \"holding constant\" property is what makes multiple regression so valuable. It lets you isolate the effect of each individual variable — something you simply cannot do with separate simple regressions.\n\n<InteractiveImage src=\"/blog/multiple-regression-3d.png\" alt=\"3D visualization of multiple regression with multiple predictors\" />\n\n---\n\n## The Multiple Regression Equation\n\nThe multiple regression equation extends the familiar **y = mx + b** form:\n\n<MathFormula formula=\"y = b₀ + b₁x₁ + b₂x₂ + ... + bₙxₙ\" description=\"Multiple Regression Equation\" />\n\nWhere:\n- **y** is the predicted value of the dependent variable.\n- **b₀** is the y-intercept (the predicted value when all predictors are zero).\n- **b₁, b₂, ..., bₙ** are the **partial regression coefficients**.\n- **x₁, x₂, ..., xₙ** are the independent variables (predictors).\n\n### Interpreting the Coefficients\n\nEach coefficient **bᵢ** represents the change in y for a one-unit increase in **xᵢ**, assuming all other predictors remain constant. This is often called the **\"partial effect\"**.\n\n**Example Model for Home Prices:**\n<MathFormula formula=\"Price = 50,000 + 150(SqFt) + 20,000(Beds) − 1,000(Age)\" description=\"Example Practical Model\" />\n\n- **150**: Each additional square foot adds $150 to the price, holding bedrooms and age constant.\n- **20,000**: Each additional bedroom adds $20,000, holding size and age constant.\n- **−1,000**: Each additional year of age reduces the price by $1,000, holding size and bedrooms constant.\n\n---\n\n## Multiple vs. Simple Regression\n\nUnderstanding the distinction is essential for choosing the right approach:\n\n| Feature | Simple Linear Regression | Multiple Linear Regression |\n|---|---|---|\n| **Predictors** | Exactly 1 | 2 or more |\n| **Equation** | [y = mx + b](/) | y = b₀ + b₁x₁ + b₂x₂ + ... |\n| **Effect** | Total effect of x | Partial effect (controlled) |\n| **Risk** | High omitted variable bias | Reduced bias (if specified well) |\n| **Tool** | [Linear Calculator](/) | [Multiple Calculator](/multiple-regression-calculator/) |\n\n### Adjusted R²: The Proper Metric\n\nAdding more predictors to a model will **always** increase raw R². **Adjusted R²** corrects for this by penalizing the addition of predictors that don't genuinely improve the model. Always use adjusted R² when comparing models with different numbers of predictors.\n\n---\n\n## The Five Key Assumptions\n\nBefore you can trust your results, you must verify five critical assumptions. You can use our [regression assumptions checker](/regression-assumptions-checker/) to validate your data.\n\n### 1. Linearity\nThe relationship between each predictor and the outcome must be approximately linear.\n\n### 2. Independence of Errors\nResiduals must be independent of each other (crucial for time-series data).\n\n### 3. Homoscedasticity\nThe variance of residuals should be constant across all predicted values.\n\n### 4. Normality of Residuals\nThe errors (residuals) should be approximately normally distributed.\n\n### 5. No Multicollinearity\nPredictors should not be too highly correlated with each other. If they are, individual coefficients become unstable. You can check initial correlations with our [Pearson correlation calculator](/pearson-correlation-calculator/).\n\n---\n\n## Real-World Applications\n\n<InteractiveImage src=\"/blog/regression-applications.svg\" alt=\"Real-world applications of multiple regression analysis\" />\n\n- **Real Estate**: Estimating home values based on size, location, and age.\n- **Finance**: Explaining stock returns using market risk and company size.\n- **Marketing**: Quantifying the impact of TV, digital, and print ads on total sales.\n- **Healthcare**: Predicting patient recovery time based on age, dosage, and comorbidities.\n\n---\n\n## Common Pitfalls to Avoid\n\n1. **Overfitting**: Adding too many predictors relative to your sample size.\n2. **Ignoring Multicollinearity**: Using two highly correlated variables (like height in inches and height in cm) in the same model.\n3. **Extrapolation**: Predicting values far outside the range of your original data.\n4. **Confusing Correlation with Causation**: Just because variables move together doesn't mean one causes the other.\n\n---\n\n## Try It Yourself: Interactive Demo\n\nAdjust the sliders below to see how different factors influence a predicted house price in real time:\n\n<MultiRegDemo />\n\n---\n\n## Ready to Calculate?\n\nOur **Multiple Regression Calculator** handles complex data sets and provides the full equation, partial coefficients, and significance levels.\n\n<a href=\"/multiple-regression-calculator/\" class=\"not-prose my-8 flex items-center justify-center\">\n  <span class=\"inline-flex items-center gap-2 px-8 py-3.5 text-base font-semibold text-white bg-gradient-to-r from-primary-600 to-accent-600 rounded-xl shadow-lg shadow-primary-600/25 hover:shadow-xl hover:shadow-primary-600/35 hover:brightness-110 hover:scale-[1.03] active:scale-[0.98] transition-all duration-200\">\n    <svg class=\"w-5 h-5\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 7h6m0 10v-3m-3 3h.01M9 17h.01M9 14h.01M12 14h.01M15 11h.01M12 11h.01M9 11h.01M7 21h10a2 2 0 002-2V5a2 2 0 00-2-2H7a2 2 0 00-2 2v14a2 2 0 002 2z\" /></svg>\n    Try the Multiple Regression Calculator\n  </span>\n</a>\n\n---\n\n## Key Takeaways\n\n1. **Multiple regression** handles two or more predictors to explain a single outcome.\n2. **Partial coefficients** isolate the effect of one variable while holding others constant.\n3. **Adjusted R²** is the gold standard for model comparison.\n4. **Multicollinearity** is a unique risk in multiple regression — always check for redundant predictors.\n5. Use **simple regression** to build intuition, then scale up to multiple regression as your questions grow more complex.\n\n---\n\n## Under the Hood: Matrix Notation\n\nWhile simple regression can be solved with basic algebra, multiple regression is almost always expressed using **Matrix Algebra**. This allows us to represent dozens or even hundreds of variables in a single, elegant equation.\n\nThe model is written as:\n<MathFormula formula=\"\\mathbf{Y} = \\mathbf{X}\\beta + \\epsilon\" description=\"Matrix Regression Form\" />\n\nWhere:\n- $Y$ is a vector of $n$ observations of the dependent variable.\n- $X$ is a matrix (often called the **Design Matrix**) of $n$ observations of $k$ independent variables, plus a column of 1s for the intercept.\n- $\\beta$ is a vector of $k+1$ coefficients to be estimated.\n- $\\epsilon$ is a vector of $n$ error terms.\n\nThe \"Normal Equation\" used to find the best-fitting coefficients is:\n<MathFormula formula=\"\\hat{\\beta} = (\\mathbf{X}^T\\mathbf{X})^{-1}\\mathbf{X}^T\\mathbf{Y}\" description=\"The OLS Solution\" />\n\nThis formula is what our [multiple regression calculator](/multiple-regression-calculator/) computes instantly, handling the complex matrix inversion that would take a human hours to complete by hand.\n\n---\n\n## Advanced Feature Selection: Which Variables Belong?\n\nOne of the hardest parts of multiple regression isn't the math—it's deciding which variables to include. Include too few, and you have **Omitted Variable Bias**. Include too many, and you have **Overfitting**.\n\n### 1. Forward Selection\nYou start with no variables and add the one that provides the highest statistical significance. You continue adding variables one by one until no new variable improves the model.\n\n### 2. Backward Elimination\nYou start with *all* possible variables and remove the least significant one (highest p-value). You repeat this until only significant variables remain.\n\n### 3. Stepwise Regression\nA combination of both. You add variables like Forward Selection but also re-evaluate existing variables to see if they should be removed.\n\n---\n\n## The Danger of Multicollinearity\n\nMulticollinearity occurs when two or more of your independent variables are highly correlated with each other. \n\n**Why it's a problem:** If $x_1$ and $x_2$ move perfectly together, the regression model can't tell which one is actually causing the change in $y$. This leads to:\n- Inflated standard errors (making variables look \"not significant\" when they actually are).\n- Unstable coefficients (small changes in data cause huge swings in coefficients).\n- Counter-intuitive signs (e.g., a positive effect showing up as negative).\n\n**How to detect it:** Check the **Variance Inflation Factor (VIF)**. A VIF > 5 or 10 usually indicates a problem. You can start by running a [Pearson correlation calculator](/pearson-correlation-calculator/) on all your predictors to see if any are redundant.\n\n---\n\n## Interaction Effects: When Variables Work Together\n\nSometimes, the effect of one variable depends on the value of another. This is called an **Interaction Effect**.\n\n**Example:** Suppose you are predicting plant growth based on both \"Water\" and \"Sunlight\". \n- Water helps growth.\n- Sunlight helps growth.\n- But Water + Sunlight together might provide a massive boost that neither provides alone.\n\nTo model this, you add an interaction term:\n<MathFormula formula=\"y = b₀ + b₁(\\text{Water}) + b₂(\\text{Sunlight}) + b₃(\\text{Water} \\times \\text{Sunlight})\" description=\"Interaction Model\" />\n\n---\n\n## Advanced Frequently Asked Questions (FAQ)\n\n### 1. How many observations do I need for multiple regression?\nA common rule of thumb is at least **10 to 20 observations per predictor variable**. If you have 5 predictors, you should aim for at least 50 to 100 data points to ensure stable estimates.\n\n### 2. Can I compare coefficients to see which variable is \"most important\"?\nOnly if the variables are on the same scale. If one variable is \"Income in Dollars\" and another is \"Age in Years,\" you cannot compare their coefficients. To compare importance, use **Standardized Coefficients** (Beta weights), which our [multiple regression calculator](/multiple-regression-calculator/) can provide.\n\n### 3. What is \"Overfitting\"?\nOverfitting happens when your model is so complex that it starts modeling the \"noise\" in your specific dataset rather than the actual underlying trend. An overfitted model will look great on your current data but fail miserably when used on new data.\n\n### 4. What is the difference between R² and Adjusted R²?\nR² always goes up when you add variables. Adjusted R² only goes up if the new variable improves the model more than would be expected by chance. In multiple regression, **always report Adjusted R²**.\n\n### 5. Can I use multiple regression for binary (Yes/No) outcomes?\nNo. For binary outcomes, you should use **Logistic Regression**. Linear regression can produce predictions above 1 or below 0, which doesn't make sense for probabilities.\n\n### 6. What is a \"Dummy Variable\"?\nA dummy variable is a numeric variable used to represent categorical data. For example, to include \"Region\" (North, South, East, West), you would create three dummy variables. You always need one fewer dummy variable than there are categories.\n\n### 7. How do I handle missing data?\nYou can either delete the rows with missing data (Listwise Deletion) or use \"Imputation\" to fill in the gaps. Be careful, as both methods can introduce bias if not handled correctly.\n\n### 8. What is \"Endogeneity\"?\nEndogeneity occurs when a predictor variable is correlated with the error term. This often happens if there is a \"feedback loop\" (where $y$ also affects $x$) or if a critical variable is missing from the model. It is a serious problem that requires advanced techniques like \"Instrumental Variables.\"\n\n### 9. How do I interpret a log-transformed predictor?\nIf you take the natural log of a predictor, the coefficient represents the change in $y$ for a **1% change** in $x$ (approximately). This is common in economics when dealing with things like income or population.\n\n### 10. When should I stop adding variables?\nStop when the Adjusted R² stops increasing significantly, or when your p-values for new variables are high (> 0.05). Use \"Parsimony\" as your guide: the simplest model that explains the data is usually the best.\n\n---\n\n## Summary Case Study: Marketing Mix Modeling\n\nA global retailer wants to understand which advertising channels are most effective. They collect weekly data on:\n- **Total Sales ($y$)**\n- **TV Spend ($x_1$)**\n- **Social Media Spend ($x_2$)**\n- **In-Store Promotions ($x_3$)**\n\nUsing our [free multiple regression calculator](/), they find:\n<MathFormula formula=\"\\text{Sales} = 100,000 + 5(TV) + 12(\\text{Social}) + 20(\\text{Promos})\" description=\"Marketing Model\" />\n\n**Insights:**\n1. **Social Media** ($12) is more than twice as effective as **TV** ($5) per dollar spent.\n2. **In-Store Promotions** have the highest immediate impact ($20).\n3. The retailer shifts 20% of their TV budget to Social Media, resulting in a predicted 15% increase in total ROI.\n\nThis level of strategic insight is only possible with multiple regression, allowing you to see the \"big picture\" of how multiple forces interact to drive success.",
      "filePath": "src/blog/en/multiple-regression-explained.mdx",
      "digest": "7b5ef21eba27b4b3",
      "deferredRender": true,
      "collection": "blog"
    },
    "ko": {
      "id": "ko/multiple-regression-explained",
      "data": {
        "title": "다중 회귀 분석: 하나의 예측 변수로 충분하지 않을 때",
        "description": "다중 회귀 분석을 마스터하세요. 방정식, 주요 가정, 다중공선성, 실제 응용 및 편 회귀 계수 해석을 배우세요.",
        "image": "/blog/multiple-regression.svg",
        "date": "2025-02-10T00:00:00.000Z",
        "author": "회귀 방정식 계산기",
        "category": "통계학"
      },
      "body": "import MultiRegDemo from '../../../theme/components/MultiRegDemo.astro';\nimport InteractiveImage from '../../../theme/components/InteractiveImage.astro';\nimport MathFormula from '../../../theme/components/MathFormula.astro';\nimport Callout from '../../../theme/components/Callout.astro';\n\n집값을 예측하려 한다고 가정해 보세요. 평수만으로는 대략적인 추정치를 얻을 수 있지만, 침실 수, 건물 연식, 동네 범죄율은 어떨까요?\n\n단일 독립 변수로 종속 변수의 변동을 충분히 설명할 수 없을 때, **다중 회귀 분석**이 필요합니다.\n\n하나의 예측 변수와 하나의 결과 간의 관계를 모델링하는 [단순 선형 회귀](/)와 달리, 다중 회귀는 **두 개 이상의 예측 변수를 동시에** 고려할 수 있습니다. 그 결과, 결과 변수를 실제로 이끄는 요인에 대해 훨씬 더 정확하고 세밀하며 실행 가능한 모델을 얻을 수 있습니다.\n\n이 가이드에서는 다중 회귀에 대해 알아야 할 모든 것을 다룹니다: 방정식, 계수 해석, 주요 가정, 실제 응용, 그리고 흔한 함정. 전략적 관점의 개요는 [하버드 비즈니스 리뷰의 회귀 분석 복습](https://hbr.org/2015/11/a-refresher-on-regression-analysis)을 참조하세요.\n\n---\n\n## 다중 회귀 분석이란?\n\n다중 회귀 분석은 **하나의 종속(반응) 변수**와 **두 개 이상의 독립(예측) 변수** 간의 관계를 모델링하는 통계 기법입니다.\n\n이는 여러 요인이 공동으로 결과에 영향을 미치는 상황으로 단순 선형 회귀를 확장한 것입니다. [다중 회귀 계산기](/multiple-regression-calculator/)를 사용하여 즉시 분석을 수행할 수 있습니다.\n\n### 초평면의 개념\n\n핵심 아이디어는 간단합니다: 2차원(x와 y)에서 데이터에 선을 맞추는 대신, 다중 회귀는 3차원 이상에서 데이터에 **초평면**을 맞춥니다.\n\n각 예측 변수는 고유한 계수를 가지며, **다른 모든 예측 변수를 일정하게 유지한 상태에서** 해당 예측 변수의 단위 변화당 결과가 얼마나 변하는지 알려줍니다. 더 자세한 기술적 내용은 [위키백과의 다중 선형 회귀 항목](https://en.wikipedia.org/wiki/Linear_regression#Multiple_linear_regression)을 참조하세요.\n\n### \"일정하게 유지\"가 중요한 이유\n\n이 \"일정하게 유지\" 속성이 다중 회귀를 그토록 가치 있게 만듭니다. 개별 변수의 효과를 분리할 수 있게 해주며, 이는 별도의 단순 회귀로는 할 수 없는 일입니다.\n\n<InteractiveImage src=\"/blog/multiple-regression-3d.png\" alt=\"여러 예측 변수를 사용한 다중 회귀의 3D 시각화\" />\n\n---\n\n## 다중 회귀 방정식\n\n다중 회귀 방정식은 익숙한 **y = mx + b** 형태를 확장합니다:\n\n<MathFormula formula=\"y = b₀ + b₁x₁ + b₂x₂ + ... + bₙxₙ\" description=\"다중 회귀 방정식\" />\n\n여기서:\n- **y**는 종속 변수의 예측값입니다.\n- **b₀**는 y절편(모든 예측 변수가 0일 때의 예측값)입니다.\n- **b₁, b₂, ..., bₙ**은 **편 회귀 계수**입니다.\n- **x₁, x₂, ..., xₙ**은 독립 변수(예측 변수)입니다.\n\n### 계수 해석하기\n\n각 계수 **bᵢ**는 다른 모든 예측 변수가 일정하게 유지된다고 가정할 때, **xᵢ**가 한 단위 증가할 때 y의 변화량을 나타냅니다. 이를 종종 **\"편효과\"**라고 부릅니다.\n\n**주택 가격 예시 모델:**\n<MathFormula formula=\"Price = 50,000 + 150(SqFt) + 20,000(Beds) − 1,000(Age)\" description=\"실용적인 예시 모델\" />\n\n- **150**: 평수가 1 증가할 때마다 가격이 $150 증가하며, 침실 수와 연식은 일정하게 유지됩니다.\n- **20,000**: 침실이 하나 추가될 때마다 가격이 $20,000 증가하며, 평수와 연식은 일정하게 유지됩니다.\n- **−1,000**: 연식이 1년 증가할 때마다 가격이 $1,000 감소하며, 평수와 침실 수는 일정하게 유지됩니다.\n\n---\n\n## 다중 회귀와 단순 회귀 비교\n\n올바른 접근법을 선택하기 위해 이 구별을 이해하는 것이 필수적입니다:\n\n| 특징 | 단순 선형 회귀 | 다중 선형 회귀 |\n|---|---|---|\n| **예측 변수** | 정확히 1개 | 2개 이상 |\n| **방정식** | [y = mx + b](/) | y = b₀ + b₁x₁ + b₂x₂ + ... |\n| **효과** | x의 총효과 | 편효과(통제됨) |\n| **위험** | 높은 누락 변수 편향 | 감소된 편향(잘 지정된 경우) |\n| **도구** | [선형 계산기](/) | [다중 회귀 계산기](/multiple-regression-calculator/) |\n\n### 수정 R²: 올바른 지표\n\n모델에 더 많은 예측 변수를 추가하면 원시 R²가 **항상** 증가합니다. **수정 R²**는 모델을 진정으로 개선하지 않는 예측 변수의 추가에 페널티를 부여하여 이를 보정합니다. 예측 변수 수가 다른 모델을 비교할 때는 항상 수정 R²를 사용하세요.\n\n---\n\n## 다섯 가지 주요 가정\n\n결과를 신뢰하기 전에 다섯 가지 중요한 가정을 검증해야 합니다. [회귀 가정 검사기](/regression-assumptions-checker/)를 사용하여 데이터를 검증할 수 있습니다.\n\n### 1. 선형성\n각 예측 변수와 결과 간의 관계는 대략적으로 선형이어야 합니다.\n\n### 2. 오차의 독립성\n잔차는 서로 독립적이어야 합니다(시계열 데이터에 특히 중요).\n\n### 3. 등분산성\n잔차의 분산은 모든 예측값에서 일정해야 합니다.\n\n### 4. 잔차의 정규성\n오차(잔차)는 대략적으로 정규 분포를 따라야 합니다.\n\n### 5. 다중공선성 없음\n예측 변수 간에 상관관계가 너무 높으면 안 됩니다. 그렇지 않으면 개별 계수가 불안정해집니다. [피어슨 상관 계수 계산기](/pearson-correlation-calculator/)로 초기 상관관계를 확인할 수 있습니다.\n\n---\n\n## 실제 응용 분야\n\n<InteractiveImage src=\"/blog/regression-applications.svg\" alt=\"다중 회귀 분석의 실제 응용\" />\n\n- **부동산**: 크기, 위치, 연식을 기반으로 주택 가치 추정.\n- **금융**: 시장 위험과 기업 규모를 사용하여 주식 수익률 설명.\n- **마케팅**: TV, 디지털, 인쇄 광고가 총 매출에 미치는 영향 정량화.\n- **의료**: 연령, 투여량, 합병증을 기반으로 환자 회복 시간 예측.\n\n---\n\n## 피해야 할 흔한 함정\n\n1. **과적합**: 표본 크기에 비해 너무 많은 예측 변수를 추가함.\n2. **다중공선성 무시**: 동일한 모델에서 높은 상관관계를 가진 두 변수(예: 인치 단위 키와 cm 단위 키)를 사용함.\n3. **외삽**: 원본 데이터의 범위를 크게 벗어난 값을 예측함.\n4. **상관관계와 인과관계 혼동**: 변수들이 함께 움직인다고 해서 하나가 다른 하나의 원인이라는 의미는 아님.\n\n---\n\n## 직접 해보세요: 대화형 데모\n\n아래 슬라이더를 조정하여 다양한 요인이 예측된 주택 가격에 어떻게 영향을 미치는지 실시간으로 확인하세요:\n\n<MultiRegDemo />\n\n---\n\n## 계산할 준비가 되셨나요?\n\n**다중 회귀 계산기**는 복잡한 데이터 세트를 처리하고 전체 방정식, 편 회귀 계수, 유의성 수준을 제공합니다.\n\n<a href=\"/multiple-regression-calculator/\" class=\"not-prose my-8 flex items-center justify-center\">\n  <span class=\"inline-flex items-center gap-2 px-8 py-3.5 text-base font-semibold text-white bg-gradient-to-r from-primary-600 to-accent-600 rounded-xl shadow-lg shadow-primary-600/25 hover:shadow-xl hover:shadow-primary-600/35 hover:brightness-110 hover:scale-[1.03] active:scale-[0.98] transition-all duration-200\">\n    <svg class=\"w-5 h-5\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 7h6m0 10v-3m-3 3h.01M9 17h.01M9 14h.01M12 14h.01M15 11h.01M12 11h.01M9 11h.01M7 21h10a2 2 0 002-2V5a2 2 0 00-2-2H7a2 2 0 00-2 2v14a2 2 0 002 2z\" /></svg>\n    다중 회귀 계산기 사용하기\n  </span>\n</a>\n\n---\n\n## 핵심 요약\n\n1. **다중 회귀**는 두 개 이상의 예측 변수를 처리하여 단일 결과를 설명합니다.\n2. **편 계수**는 다른 변수를 일정하게 유지하면서 하나의 변수 효과를 분리합니다.\n3. **수정 R²**는 모델 비교의 황금 표준입니다.\n4. **다중공선성**은 다중 회귀의 고유한 위험입니다 — 항상 중복 예측 변수를 확인하세요.\n5. **단순 회귀**로 직관을 쌓은 후, 질문이 복잡해지면 다중 회귀로 확장하세요.",
      "filePath": "src/blog/ko/multiple-regression-explained.mdx",
      "digest": "351f08537fd8cab1",
      "deferredRender": true,
      "collection": "blog"
    },
    "ms": {
      "id": "ms/multiple-regression-explained",
      "data": {
        "title": "Analisis Regresi Berganda: Apabila Satu Peramal Tidak Mencukupi",
        "description": "Kuasai analisis regresi berganda. Pelajari persamaan, andaian utama, multikolineariti, aplikasi dunia sebenar dan mentafsir pekali separa.",
        "image": "/blog/multiple-regression.svg",
        "date": "2025-02-10T00:00:00.000Z",
        "author": "Kalkulator Persamaan Regresi",
        "category": "Statistik"
      },
      "body": "import MultiRegDemo from '../../../theme/components/MultiRegDemo.astro';\nimport InteractiveImage from '../../../theme/components/InteractiveImage.astro';\nimport MathFormula from '../../../theme/components/MathFormula.astro';\nimport Callout from '../../../theme/components/Callout.astro';\n\nBayangkan anda cuba meramal harga rumah. Kaki persegi sahaja memberikan anggaran kasar — tetapi bagaimana pula dengan bilangan bilik tidur, usia harta, atau kadar jenayah kejiranan?\n\nApabila satu pemboleh ubah tak bebas tidak dapat menjelaskan variasi dalam pemboleh ubah bersubah anda dengan memadai, **analisis regresi berganda** memainkan peranannya.\n\nBerbeza dengan [regresi linear ringkas](/) yang memodelkan hubungan antara satu peramal dan satu hasil, regresi berganda membolehkan anda mengambil kira **dua atau lebih peramal secara serentak**. Hasilnya adalah model yang jauh lebih tepat, bernuansa, dan boleh dilaksanakan tentang apa yang sebenarnya memacu pemboleh ubah hasil anda.\n\nDalam panduan ini, kami akan merangkumi semua yang perlu anda ketahui tentang regresi berganda: persamaan, cara mentafsir pekali, andaian utama, aplikasi dunia sebenar, dan kesilapan biasa. Untuk perspektif strategik peringkat tinggi, lihat [segar semula analisis regresi oleh Harvard Business Review](https://hbr.org/2015/11/a-refresher-on-regression-analysis).\n\n---\n\n## Apakah Analisis Regresi Berganda?\n\nAnalisis regresi berganda ialah teknik statistik yang digunakan untuk memodelkan hubungan antara **satu pemboleh ubah bersubah (tindak balas)** dan **dua atau lebih pemboleh ubah tak bebas (peramal)**.\n\nIa meluaskan regresi linear ringkas kepada situasi di mana pelbagai faktor secara bersama mempengaruhi hasil. Anda boleh menggunakan [kalkulator regresi berganda](/multiple-regression-calculator/) kami untuk menjalankan analisis ini dengan serta-merta.\n\n### Konsep Hipersatah\n\nIdea terasnya mudah: bukannya memuatkan garis melalui data dalam dua dimensi (x dan y), regresi berganda memuatkan **hipersatah** melalui data dalam tiga atau lebih dimensi.\n\nSetiap peramal mendapat pekali tersendiri, memberitahu anda berapa banyak hasil berubah per unit perubahan dalam peramal tersebut, **mengekalkan semua peramal lain secara malar**. Untuk butiran teknikal lanjut, semak [entri Regresi Linear Berganda di Wikipedia](https://en.wikipedia.org/wiki/Linear_regression#Multiple_linear_regression).\n\n### Mengapa \"Mengekalkan Malar\" Penting\n\nSifat \"mengekalkan malar\" inilah yang menjadikan regresi berganda sangat berharga. Ia membolehkan anda mengasingkan kesan setiap pemboleh ubah individu — sesuatu yang anda tidak dapat lakukan dengan regresi ringkas berasingan.\n\n<InteractiveImage src=\"/blog/multiple-regression-3d.png\" alt=\"Visualisasi 3D regresi berganda dengan pelbagai peramal\" />\n\n---\n\n## Persamaan Regresi Berganda\n\nPersamaan regresi berganda meluaskan bentuk **y = mx + b** yang biasa:\n\n<MathFormula formula=\"y = b₀ + b₁x₁ + b₂x₂ + ... + bₙxₙ\" description=\"Persamaan Regresi Berganda\" />\n\nDi mana:\n- **y** ialah nilai ramalan pemboleh ubah bersubah.\n- **b₀** ialah pintasan-y (nilai ramalan apabila semua peramal adalah sifar).\n- **b₁, b₂, ..., bₙ** ialah **pekali regresi separa**.\n- **x₁, x₂, ..., xₙ** ialah pemboleh ubah tak bebas (peramal).\n\n### Mentafsir Pekali\n\nSetiap pekali **bᵢ** mewakili perubahan dalam y bagi peningkatan satu unit dalam **xᵢ**, dengan anggapan semua peramal lain kekal malar. Ini sering dipanggil **\"kesan separa\"**.\n\n**Model Contoh untuk Harga Rumah:**\n<MathFormula formula=\"Price = 50,000 + 150(SqFt) + 20,000(Beds) − 1,000(Age)\" description=\"Model Praktikal Contoh\" />\n\n- **150**: Setiap kaki persegi tambahan menambah $150 pada harga, mengekalkan bilik tidur dan usia malar.\n- **20,000**: Setiap bilik tidur tambahan menambah $20,000, mengekalkan saiz dan usia malar.\n- **−1,000**: Setiap tahun usia tambahan mengurangkan harga sebanyak $1,000, mengekalkan saiz dan bilik tidur malar.\n\n---\n\n## Regresi Berganda lwn. Regresi Ringkas\n\nMemahami perbezaannya adalah penting untuk memilih pendekatan yang betul:\n\n| Ciri | Regresi Linear Ringkas | Regresi Linear Berganda |\n|---|---|---|\n| **Peramal** | Tepat 1 | 2 atau lebih |\n| **Persamaan** | [y = mx + b](/) | y = b₀ + b₁x₁ + b₂x₂ + ... |\n| **Kesan** | Kesan jumlah x | Kesan separa (dikawal) |\n| **Risiko** | Pihang pemboleh ubah tertinggal tinggi | Pihang berkurangan (jika dinyatakan dengan baik) |\n| **Alat** | [Kalkulator Linear](/) | [Kalkulator Berganda](/multiple-regression-calculator/) |\n\n### R² Larasan: Metrik Yang Betul\n\nMenambah lebih banyak peramal kepada model akan **sentiasa** meningkatkan R² mentah. **R² larasan** membetulkan ini dengan mengenakan penalti atas penambahan peramal yang tidak benar-benar meningkatkan model. Sentiasa gunakan R² larasan apabila membandingkan model dengan bilangan peramal yang berbeza.\n\n---\n\n## Lima Andaian Utama\n\nSebelum anda boleh mempercayai keputusan anda, anda mesti mengesahkan lima andaian kritikal. Anda boleh menggunakan [pemeriksa andaian regresi](/regression-assumptions-checker/) kami untuk mengesahkan data anda.\n\n### 1. Keberlinearan\nHubungan antara setiap peramal dan hasil mesti lebih kurang linear.\n\n### 2. Kebebasan Ralat\nResidu mesti bebas antara satu sama lain (penting untuk data siri masa).\n\n### 3. Homoskedastisiti\nVarians residu hendaklah malar merentasi semua nilai ramalan.\n\n### 4. Kenormalan Residu\nRalat (residu) hendaklah lebih kurang tertabur secara normal.\n\n### 5. Tiada Multikolineariti\nPeramal tidak boleh terlalu berkorelasi tingi antara satu sama lain. Jika ya, pekali individu menjadi tidak stabil. Anda boleh menyemak korelasi awal dengan [kalkulator korelasi Pearson](/pearson-correlation-calculator/) kami.\n\n---\n\n## Aplikasi Dunia Sebenar\n\n<InteractiveImage src=\"/blog/regression-applications.svg\" alt=\"Aplikasi dunia sebenar analisis regresi berganda\" />\n\n- **Hartanah**: Menganggar nilai rumah berdasarkan saiz, lokasi dan usia.\n- **Kewangan**: Menerangkan pulangan saham menggunakan risiko pasaran dan saiz syarikat.\n- **Pemasaran**: Mengukur kesan iklan TV, digital dan cetak ke atas jumlah jualan.\n- **Penjagaan Kesihatan**: Meramal masa pemulihan pesakit berdasarkan usia, dos dan komorbiditi.\n\n---\n\n## Kesilapan Biasa yang Perlu Dielakkan\n\n1. **Lampau padat**: Menambah terlalu banyak peramal berbanding saiz sampel anda.\n2. **Mengabaikan Multikolineariti**: Menggunakan dua pemboleh ubah yang sangat berkorelasi (seperti tinggi dalam inci dan tinggi dalam cm) dalam model yang sama.\n3. **Ekstrapolasi**: Meramal nilai jauh di luar julat data asal anda.\n4. **Mencampuradukkan Korelasi dengan Kausaliti**: Hanya kerana pemboleh ubah bergerak bersama tidak bermakna satu menyebabkan yang lain.\n\n---\n\n## Cuba Sendiri: Demo Interaktif\n\nLaraslah peluncur di bawah untuk melihat bagaimana faktor berbeza mempengaruhi harga rumah ramalan secara masa nyata:\n\n<MultiRegDemo />\n\n---\n\n## Sedia untuk Mengira?\n\n**Kalkulator Regresi Berganda** kami mengendalikan set data kompleks dan menyediakan persamaan lengkap, pekali separa dan tahap keertian.\n\n<a href=\"/multiple-regression-calculator/\" class=\"not-prose my-8 flex items-center justify-center\">\n  <span class=\"inline-flex items-center gap-2 px-8 py-3.5 text-base font-semibold text-white bg-gradient-to-r from-primary-600 to-accent-600 rounded-xl shadow-lg shadow-primary-600/25 hover:shadow-xl hover:shadow-primary-600/35 hover:brightness-110 hover:scale-[1.03] active:scale-[0.98] transition-all duration-200\">\n    <svg class=\"w-5 h-5\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 7h6m0 10v-3m-3 3h.01M9 17h.01M9 14h.01M12 14h.01M15 11h.01M12 11h.01M9 11h.01M7 21h10a2 2 0 002-2V5a2 2 0 00-2-2H7a2 2 0 00-2 2v14a2 2 0 002 2z\" /></svg>\n    Cuba Kalkulator Regresi Berganda\n  </span>\n</a>\n\n---\n\n## Pengambilan Utama\n\n1. **Regresi berganda** mengendalikan dua atau lebih peramal untuk menjelaskan satu hasil.\n2. **Pekali separa** mengasingkan kesan satu pemboleh ubah sambil mengekalkan yang lain malar.\n3. **R² larasan** ialah standard emas untuk perbandingan model.\n4. **Multikolineariti** ialah risiko unik dalam regresi berganda — sentiasa periksa peramal berlebihan.\n5. Gunakan **regresi ringkas** untuk membina intuition, kemudian tingkatkan kepada regresi berganda apabila soalan anda menjadi lebih kompleks.",
      "filePath": "src/blog/ms/multiple-regression-explained.mdx",
      "digest": "471cf5d107017b70",
      "deferredRender": true,
      "collection": "blog"
    }
  },
  "simple-linear-regression-step-by-step": {
    "en": {
      "id": "en/simple-linear-regression-step-by-step",
      "data": {
        "title": "Simple Linear Regression Step-by-Step: The Definitive Mathematical Guide",
        "description": "Learn how to calculate simple linear regression by hand with this 2000+ word walkthrough. Master the slope formula, intercept, R², and residual analysis. Use our regression equation calculator with steps to verify your manual work.",
        "image": "/blog/linear-regression-steps.svg",
        "date": "2025-03-20T00:00:00.000Z",
        "author": "Regression Equation Calculator",
        "category": "Statistics"
      },
      "body": "import Callout from '../../../theme/components/Callout.astro';\nimport MathFormula from '../../../theme/components/MathFormula.astro';\n\nEvery statistical journey starts with a single line. Simple linear regression is that line — the most fundamental predictive model in data science, and the foundation upon which every advanced regression technique is built. \n\nIf you want to predict a dependent variable from a single independent variable, our [free regression equation calculator](/) will give you the answer in seconds. However, understanding *how* that answer is derived is what separates a data practitioner from someone who merely pushes buttons. For more complex datasets, you might eventually need [multiple regression analysis](/blog/multiple-regression-explained) or a broader [understanding of regression basics](/blog/linear-regression-basics).\n\n<Callout type=\"conceptual\" title=\"What You'll Learn\">\nBy the end of this article, you will be able to calculate the regression equation **y = mx + b** from raw data, interpret the results, and verify that your data meets required assumptions.\n</Callout>\n\n---\n\n## What Is Simple Linear Regression?\n\nSimple linear regression models the relationship between **one independent variable (x)** and **one dependent variable (y)** by fitting a straight line through the data. \n\nThe word \"simple\" distinguishes it from [multiple regression](/multiple-regression-calculator/), which uses two or more predictors. The fitted line is chosen to minimize the sum of squared vertical distances — a method called **ordinary least squares (OLS)**.\n\n### When to Use (and When to Avoid)\n\n**Use it when:**\n- You have one continuous predictor and one continuous outcome.\n- Your scatter plot shows an **approximately linear pattern**.\n- You want to quantify how much y changes per unit of x.\n\n**Avoid it when:**\n- The scatter plot shows a **clear curve** — try our [quadratic regression calculator](/quadratic-regression-calculator/) instead.\n- You have **multiple predictors** — use [multiple linear regression](/multiple-regression-calculator/).\n- Your data contains **extreme outliers** that could skew the entire model.\n\n---\n\n## The Dataset\n\nSuppose a tutoring company tracks study hours (x) and resulting test scores (y):\n\n| Student | Study Hours (x) | Test Score (y) |\n|---------|-----------------|----------------|\n| 1       | 2               | 65             |\n| 2       | 4               | 75             |\n| 3       | 6               | 80             |\n| 4       | 8               | 90             |\n| 5       | 10              | 95             |\n\n---\n\n## Step 1: Calculate the Means\n\nThe first step is to compute the arithmetic mean of both variables.\n\n**Mean of x (x̄):** (2 + 4 + 6 + 8 + 10) / 5 = **6.0**\n**Mean of y (ȳ):** (65 + 75 + 80 + 90 + 95) / 5 = **81.0**\n\nThe regression line will always pass through the point (**6.0, 81.0**).\n\n---\n\n## Step 2: Compute Deviations and Products\n\nNext, we calculate how far each point is from the mean and multiply the results.\n\n| Student | x − x̄ | y − ȳ | (x − x̄)(y − ȳ) | (x − x̄)² |\n|---------|--------|--------|-----------------|----------|\n| 1       | −4     | −16    | 64              | 16       |\n| 2       | −2     | −6     | 12              | 4        |\n| 3       | 0      | −1     | 0               | 0        |\n| 4       | 2      | 9      | 18              | 4        |\n| 5       | 4      | 14     | 56              | 16       |\n| **Sum** | | | **150** | **40** |\n\n---\n\n## Step 3: Calculate the Slope (b₁)\n\nThe slope tells you how much y changes for each one-unit increase in x. \n\n**b₁ = Σ(x − x̄)(y − ȳ) / Σ(x − x̄)²**\n**b₁ = 150 / 40 = 3.75**\n\n**Interpretation**: For every additional hour of study, the predicted test score increases by **3.75 points**.\n\n---\n\n## Step 4: Calculate the Intercept (b₀)\n\nThe intercept is the predicted y when x = 0. \n\n**b₀ = ȳ − b₁ × x̄**\n**b₀ = 81.0 − 3.75 × 6.0 = 58.5**\n\n**Interpretation**: A student studying zero hours is predicted to score **58.5**.\n\n---\n\n## Step 5: Write the Final Equation\n\nCombining the two:\n**y = 58.5 + 3.75x**\n\nThis model lets you make predictions. For example, studying for **7 hours** yields:\n58.5 + 3.75(7) = **84.75**.\n\n<Callout type=\"warning\" title=\"Extrapolation Danger\">\nPredicting outside your data's range (e.g., studying 50 hours) is called **extrapolation**. It often yields nonsensical results and should be avoided.\n</Callout>\n\n---\n\n## Step 6: Measure the Fit (R² and r)\n\n**R²** measures how much of the variation in y is explained by the model. \n**r** (Pearson correlation) measures the strength and direction of the linear relationship. \n\nFor this dataset, our [Pearson correlation calculator](/pearson-correlation-calculator/) would yield an **r of 0.9934**, which indicates a very strong positive relationship. Learn more about the [Pearson Correlation Coefficient on Statology](https://www.statology.org/pearson-correlation-coefficient/).\n\n---\n\n## Step 7: Verify Assumptions\n\nBefore trusting your results, you must satisfy the four OLS assumptions. Our [regression assumptions checker](/regression-assumptions-checker/) can help you automate this:\n\n1. **Linearity**: The relationship follows a straight-line pattern.\n2. **Independence**: Observations are not dependent on one another.\n3. **Homoscedasticity**: Residuals (errors) have constant variance.\n4. **Normality**: Residuals are approximately normally distributed.\n\n---\n\n## Beyond Simple Regression\n\nOnce you've mastered the basics, you might need more advanced tools:\n\n- **Multiple Predictors**: Use [multiple linear regression](/multiple-regression-calculator/) for complex scenarios.\n- **Curved Patterns**: Use our [quadratic regression calculator](/quadratic-regression-calculator/).\n- **Growth Models**: Explore the [exponential regression calculator](/exponential-regression-calculator/).\n\n---\n\n## Key Takeaways\n\n1. **The Slope** represents the rate of change.\n2. **The Intercept** provides the baseline value at x=0.\n3. **R²** defines the model's explanatory power.\n4. **Extrapolation** is risky — stay within your data's range.\n5. **Correlation is not causation** — statistics show association, not necessarily cause-and-effect.\n\nReady to test your own data? Head over to our [free regression calculator](/) and get started today!\n\n---\n\n## Deep Dive: Residual Analysis Step-by-Step\n\nOnce you have your equation ($y = 58.5 + 3.75x$), the job isn't finished. You must check the **residuals** (the errors) to ensure your model is valid.\n\n### What is a Residual?\nA residual ($e$) is the difference between the observed value ($y$) and the value predicted by your equation ($ŷ$).\n<MathFormula formula=\"e = y - \\hat{y}\" description=\"Residual Formula\" />\n\n### Let's Calculate the Residuals for our Dataset:\n\n| Student | Hours ($x$) | Actual Score ($y$) | Predicted Score ($ŷ$) | Residual ($e$) |\n|---------|-------------|-------------------|-----------------------------|----------------|\n| 1       | 2           | 65                | 58.5 + 3.75(2) = 66.0       | 65 - 66.0 = -1.0 |\n| 2       | 4           | 75                | 58.5 + 3.75(4) = 73.5       | 75 - 73.5 = +1.5 |\n| 3       | 6           | 80                | 58.5 + 3.75(6) = 81.0       | 80 - 81.0 = -1.0 |\n| 4       | 8           | 90                | 58.5 + 3.75(8) = 88.5       | 90 - 88.5 = +1.5 |\n| 5       | 10          | 95                | 58.5 + 3.75(10) = 96.0      | 95 - 96.0 = -1.0 |\n\n### What to Look for in Residuals:\n1. **Sum of Residuals:** In OLS regression, the sum of residuals should always be **zero** (allowing for tiny rounding errors). $-1.0 + 1.5 - 1.0 + 1.5 - 1.0 = 0$.\n2. **Randomness:** If you plot these residuals, they should look like random \"noise\". If they show a pattern (like a \"U\" shape), your relationship might not be linear, and you should consider a [quadratic regression calculator](/quadratic-regression-calculator/).\n3. **Outliers:** A residual that is much larger than the others (e.g., a residual of 20 in this dataset) would indicate an outlier that might be skewing your results.\n\n---\n\n## Confidence Intervals vs. Prediction Intervals\n\nA common point of confusion in our [regression equation calculator with steps](/) is the difference between these two intervals.\n\n### 1. Confidence Interval (for the Mean)\nThis tells you where the **average** $y$ for a given $x$ likely falls. \n*Example:* \"We are 95% confident that the average score for all students who study 6 hours is between 78 and 84.\"\n\n### 2. Prediction Interval (for an Individual)\nThis tells you where a **single new observation** likely falls. This interval is always wider than the confidence interval because individuals are more unpredictable than averages.\n*Example:* \"We are 95% confident that **John**, who studied 6 hours, will score between 70 and 92.\"\n\n---\n\n## Frequently Asked Questions (FAQ)\n\n### 1. Can the slope be negative?\nYes. A negative slope means that as $x$ increases, $y$ decreases. For example, the more miles you drive a car, the lower its resale value. The math remains the same, but your $b₁$ will be a negative number.\n\n### 2. What if my $x$ and $y$ are swapped?\nThe regression of $y$ on $x$ is **not** the same as the regression of $x$ on $y$. Linear regression minimizes the vertical distances ($y$ errors). If you swap them, you are minimizing horizontal distances, which will result in a different equation unless the correlation is perfect (+1 or -1).\n\n### 3. Why do we square the errors in Least Squares?\n- It makes all errors positive (so they don't cancel out).\n- It penalizes large errors more than small ones (a 10-point error is 100 times worse than a 1-point error).\n- It makes the math (calculus) easier to solve for a single \"best\" answer.\n\n### 4. What is a \"good\" R² value?\nThere is no universal \"good\" value. In a lab experiment, you might want 0.99. In social science, 0.20 might be impressive. What matters more is the **context** and whether the model helps you make better decisions than a simple average would.\n\n### 5. Can I use this for non-linear data?\nOnly if you transform it first. Many researchers take the log of $x$ or $y$ to turn a curve into a line. If the curve is a simple parabola, use a [quadratic regression calculator](/quadratic-regression-calculator/).\n\n### 6. What is the \"Standard Error of the Estimate\"?\nIt's roughly the average distance that data points fall from the regression line. If your Standard Error is 5, it means your predictions are usually off by about 5 units.\n\n### 7. How do I know if the slope is \"Significant\"?\nCheck the **p-value**. If the p-value is less than 0.05, we reject the idea that the true slope is zero. This means there is likely a real relationship between $x$ and $y$ that isn't just due to luck.\n\n### 8. What is the difference between Correlation and Regression?\n- **Correlation** ($r$) is a single number that tells you how tightly the points cluster around a line.\n- **Regression** ($y = mx + b$) is the equation that describes that line and allows you to make predictions.\n\n### 9. What is \"Overfitting\"?\nOverfitting is when you try to make your model too perfect for your small dataset, often by adding too many variables. In simple regression, this is less common, but you can still \"overfit\" by ignoring that a single outlier is driving your entire line.\n\n### 10. Is the intercept always meaningful?\nNo. Often the intercept (where $x=0$) is far outside your data range. For example, if you are predicting weight from height, the intercept is the predicted weight of a person with 0 height. This is physically impossible, so the intercept is just a mathematical anchor.\n\n---\n\n## Practical Checklist for Manual Calculation\n\nIf you are performing these calculations for a class or a research project, follow this checklist:\n1. **Scatter Plot:** Always look at the data first. Is it a line?\n2. **Mean Check:** Ensure your means ($x̄, ȳ$) are accurate to at least 2 decimal places.\n3. **Table Method:** Use the table format shown above to avoid losing track of negative signs.\n4. **Equation Check:** Plug your $x̄$ into your final equation. It **must** result in $ȳ$.\n5. **Prediction Check:** Pick a point from your data and see how close your equation gets to the actual $y$.\n\nBy mastering these steps, you gain a superpower: the ability to turn a messy cloud of data into a clear, predictive mathematical law. Use our [regression equation calculator with steps](/) to verify your manual work and explore even deeper insights!",
      "filePath": "src/blog/en/simple-linear-regression-step-by-step.mdx",
      "digest": "38e55719a9265c7b",
      "deferredRender": true,
      "collection": "blog"
    },
    "es": {
      "id": "es/simple-linear-regression-step-by-step",
      "data": {
        "title": "Regresión Lineal Simple: Una Guía Matemática Paso a Paso",
        "description": "Domina la regresión lineal simple paso a paso. Aprende a calcular pendiente, intercepto, R² y correlación a mano con ejemplos reales.",
        "image": "/blog/linear-regression-steps.svg",
        "date": "2025-03-20T00:00:00.000Z",
        "author": "Calculadora de Ecuaciones de Regresión",
        "category": "Estadística"
      },
      "body": "import Callout from '../../../theme/components/Callout.astro';\n\nTodo viaje estadístico comienza con una sola línea. La regresión lineal simple es esa línea — el modelo predictivo más fundamental en la ciencia de datos, y la base sobre la que se construye cada técnica de regresión avanzada.\n\nSi deseas predecir una variable dependiente a partir de una sola variable independiente, nuestra [Calculadora de Regresión Lineal](/) te dará la respuesta en segundos. Sin embargo, comprender *cómo* se obtiene esa respuesta es lo que distingue a un profesional de datos de alguien que simplemente presiona botones.\n\nEsta guía te acompaña por la regresión lineal simple desde los primeros principios. Tomaremos un conjunto de datos pequeño, calcularemos cada valor intermedio a mano y llegaremos juntos a la ecuación final. Para una excelente introducción visual, consulta la [guía de Líneas de Tendencia de Khan Academy](https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/introduction-to-trend-lines/v/fitting-a-line-to-data).\n\n<Callout type=\"conceptual\" title=\"Lo Que Aprenderás\">\nAl finalizar este artículo, serás capaz de calcular la ecuación de regresión **y = mx + b** a partir de datos brutos, interpretar los resultados y verificar que tus datos cumplen con los supuestos requeridos.\n</Callout>\n\n---\n\n## ¿Qué Es la Regresión Lineal Simple?\n\nLa regresión lineal simple modela la relación entre **una variable independiente (x)** y **una variable dependiente (y)** ajustando una línea recta a través de los datos.\n\nLa palabra \"simple\" la distingue de la [regresión múltiple](/multiple-regression-calculator/), que utiliza dos o más predictores. La línea ajustada se elige para minimizar la suma de las distancias verticales al cuadrado — un método llamado **mínimos cuadrados ordinarios (OLS)**.\n\n### Cuándo Usarla (y Cuándo Evitarla)\n\n**Úsala cuando:**\n- Tienes un predictor continuo y un resultado continuo.\n- Tu diagrama de dispersión muestra un **patrón aproximadamente lineal**.\n- Deseas cuantificar cuánto cambia y por cada unidad de x.\n\n**Evítala cuando:**\n- El diagrama de dispersión muestra una **curva clara** — prueba nuestra [calculadora de regresión cuadrática](/quadratic-regression-calculator/) en su lugar.\n- Tienes **múltiples predictores** — usa [regresión lineal múltiple](/multiple-regression-calculator/).\n- Tus datos contienen **valores atípicos extremos** que podrían sesgar todo el modelo.\n\n---\n\n## El Conjunto de Datos\n\nSupongamos que una empresa de tutorías registra las horas de estudio (x) y las calificaciones de examen resultantes (y):\n\n| Estudiante | Horas de Estudio (x) | Calificación del Examen (y) |\n|------------|----------------------|-----------------------------|\n| 1          | 2                    | 65                          |\n| 2          | 4                    | 75                          |\n| 3          | 6                    | 80                          |\n| 4          | 8                    | 90                          |\n| 5          | 10                   | 95                          |\n\n---\n\n## Paso 1: Calcular las Medias\n\nEl primer paso es calcular la media aritmética de ambas variables.\n\n**Media de x (x̄):** (2 + 4 + 6 + 8 + 10) / 5 = **6.0**\n**Media de y (ȳ):** (65 + 75 + 80 + 90 + 95) / 5 = **81.0**\n\nLa línea de regresión siempre pasará por el punto (**6.0, 81.0**).\n\n---\n\n## Paso 2: Calcular Desviaciones y Productos\n\nA continuación, calculamos qué tan lejos está cada punto de la media y multiplicamos los resultados.\n\n| Estudiante | x − x̄ | y − ȳ | (x − x̄)(y − ȳ) | (x − x̄)² |\n|------------|--------|--------|-----------------|----------|\n| 1          | −4     | −16    | 64              | 16       |\n| 2          | −2     | −6     | 12              | 4        |\n| 3          | 0      | −1     | 0               | 0        |\n| 4          | 2      | 9      | 18              | 4        |\n| 5          | 4      | 14     | 56              | 16       |\n| **Suma**   |        |        | **150**         | **40**   |\n\n---\n\n## Paso 3: Calcular la Pendiente (b₁)\n\nLa pendiente te dice cuánto cambia y por cada aumento de una unidad en x.\n\n**b₁ = Σ(x − x̄)(y − ȳ) / Σ(x − x̄)²**\n**b₁ = 150 / 40 = 3.75**\n\n**Interpretación**: Por cada hora adicional de estudio, la calificación predicha del examen aumenta en **3.75 puntos**.\n\n---\n\n## Paso 4: Calcular el Intercepto (b₀)\n\nEl intercepto es el valor predicho de y cuando x = 0.\n\n**b₀ = ȳ − b₁ × x̄**\n**b₀ = 81.0 − 3.75 × 6.0 = 58.5**\n\n**Interpretación**: Un estudiante que estudia cero horas tiene una calificación predicha de **58.5**.\n\n---\n\n## Paso 5: Escribir la Ecuación Final\n\nCombinando ambos:\n**y = 58.5 + 3.75x**\n\nEste modelo te permite hacer predicciones. Por ejemplo, estudiar **7 horas** produce:\n58.5 + 3.75(7) = **84.75**.\n\n<Callout type=\"warning\" title=\"Peligro de Extrapolación\">\nPredecir fuera del rango de tus datos (por ejemplo, estudiar 50 horas) se llama **extrapolación**. A menudo produce resultados sin sentido y debe evitarse.\n</Callout>\n\n---\n\n## Paso 6: Medir el Ajuste (R² y r)\n\n**R²** mide cuánta de la variación en y es explicada por el modelo.\n**r** (correlación de Pearson) mide la fuerza y dirección de la relación lineal.\n\nPara este conjunto de datos, nuestra [calculadora de correlación de Pearson](/pearson-correlation-calculator/) arrojaría un **r de 0.9934**, lo que indica una relación positiva muy fuerte. Obtén más información sobre el [Coeficiente de Correlación de Pearson en Statology](https://www.statology.org/pearson-correlation-coefficient/).\n\n---\n\n## Paso 7: Verificar los Supuestos\n\nAntes de confiar en tus resultados, debes satisfacer los cuatro supuestos de OLS. Nuestro [verificador de supuestos de regresión](/regression-assumptions-checker/) puede ayudarte a automatizar esto:\n\n1. **Linealidad**: La relación sigue un patrón de línea recta.\n2. **Independencia**: Las observaciones no dependen unas de otras.\n3. **Homocedasticidad**: Los residuos (errores) tienen varianza constante.\n4. **Normalidad**: Los residuos están aproximadamente distribuidos de forma normal.\n\n---\n\n## Más Allá de la Regresión Simple\n\nUna vez que domines lo básico, podrías necesitar herramientas más avanzadas:\n\n- **Múltiples Predictores**: Usa [regresión lineal múltiple](/multiple-regression-calculator/) para escenarios complejos.\n- **Patrón Curvo**: Usa nuestra [calculadora de regresión cuadrática](/quadratic-regression-calculator/).\n- **Modelos de Crecimiento**: Explora la [calculadora de regresión exponencial](/exponential-regression-calculator/).\n\n---\n\n## Conclusiones Clave\n\n1. **La Pendiente** representa la tasa de cambio.\n2. **El Intercepto** proporciona el valor base cuando x=0.\n3. **R²** define el poder explicativo del modelo.\n4. **La Extrapolación** es riesgosa — mantente dentro del rango de tus datos.\n5. **Correlación no implica causalidad** — la estadística muestra asociación, no necesariamente causa y efecto.\n\n¿Listo para probar tus propios datos? Visita nuestra [calculadora de regresión gratuita](/) y comienza hoy mismo!",
      "filePath": "src/blog/es/simple-linear-regression-step-by-step.mdx",
      "digest": "dd7440c7f6ce96b4",
      "deferredRender": true,
      "collection": "blog"
    },
    "fr": {
      "id": "fr/simple-linear-regression-step-by-step",
      "data": {
        "title": "Régression Linéaire Simple : Un Guide Mathématique Étape par Étape",
        "description": "Maîtrisez la régression linéaire simple étape par étape. Apprenez à calculer pente, ordonnée, R² et corrélation à la main avec des exemples concrets.",
        "image": "/blog/linear-regression-steps.svg",
        "date": "2025-03-20T00:00:00.000Z",
        "author": "Calculateur d'Équations de Régression",
        "category": "Statistiques"
      },
      "body": "import Callout from '../../../theme/components/Callout.astro';\n\nTout voyage statistique commence par une seule ligne. La régression linéaire simple est cette ligne — le modèle prédictif le plus fondamental en science des données, et la base sur laquelle repose chaque technique de régression avancée.\n\nSi vous souhaitez prédire une variable dépendante à partir d'une seule variable indépendante, notre [Calculateur de Régression Linéaire](/) vous donnera la réponse en quelques secondes. Cependant, comprendre *comment* cette réponse est obtenue est ce qui distingue un praticien des données de quelqu'un qui appuie simplement sur des boutons.\n\nCe guide vous accompagne dans la régression linéaire simple depuis les premiers principes. Nous prendrons un petit jeu de données, calculerons chaque valeur intermédiaire à la main et parviendrons ensemble à l'équation finale. Pour une excellente introduction visuelle, consultez le [guide des Lignes de Tendance de Khan Academy](https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/introduction-to-trend-lines/v/fitting-a-line-to-data).\n\n<Callout type=\"conceptual\" title=\"Ce Que Vous Apprendrez\">\nÀ la fin de cet article, vous serez capable de calculer l'équation de régression **y = mx + b** à partir de données brutes, d'interpréter les résultats et de vérifier que vos données satisfont les hypothèses requises.\n</Callout>\n\n---\n\n## Qu'Est-ce Que la Régression Linéaire Simple ?\n\nLa régression linéaire simple modélise la relation entre **une variable indépendante (x)** et **une variable dépendante (y)** en ajustant une droite à travers les données.\n\nLe mot « simple » la distingue de la [régression multiple](/multiple-regression-calculator/), qui utilise deux prédicteurs ou plus. La droite ajustée est choisie pour minimiser la somme des distances verticales au carré — une méthode appelée **moindres carrés ordinaires (OLS)**.\n\n### Quand L'utiliser (et Quand L'éviter)\n\n**Utilisez-la lorsque :**\n- Vous avez un prédicteur continu et un résultat continu.\n- Votre nuage de points montre un **motif approximativement linéaire**.\n- Vous souhaitez quantifier de combien y change par unité de x.\n\n**Évitez-la lorsque :**\n- Le nuage de points montre une **courbe évidente** — essayez notre [calculateur de régression quadratique](/quadratic-regression-calculator/) à la place.\n- Vous avez **plusieurs prédicteurs** — utilisez la [régression linéaire multiple](/multiple-regression-calculator/).\n- Vos données contiennent des **valeurs aberrantes extrêmes** qui pourraient fausser l'ensemble du modèle.\n\n---\n\n## Le Jeu de Données\n\nSupposons qu'une entreprise de tutorat enregistre les heures d'étude (x) et les scores de test résultants (y) :\n\n| Étudiant | Heures d'Étude (x) | Score au Test (y) |\n|----------|---------------------|-------------------|\n| 1        | 2                   | 65                |\n| 2        | 4                   | 75                |\n| 3        | 6                   | 80                |\n| 4        | 8                   | 90                |\n| 5        | 10                  | 95                |\n\n---\n\n## Étape 1 : Calculer les Moyennes\n\nLa première étape consiste à calculer la moyenne arithmétique des deux variables.\n\n**Moyenne de x (x̄) :** (2 + 4 + 6 + 8 + 10) / 5 = **6.0**\n**Moyenne de y (ȳ) :** (65 + 75 + 80 + 90 + 95) / 5 = **81.0**\n\nLa droite de régression passera toujours par le point (**6.0, 81.0**).\n\n---\n\n## Étape 2 : Calculer les Écarts et les Produits\n\nEnsuite, nous calculons la distance de chaque point par rapport à la moyenne et multiplions les résultats.\n\n| Étudiant | x − x̄ | y − ȳ | (x − x̄)(y − ȳ) | (x − x̄)² |\n|----------|--------|--------|-----------------|----------|\n| 1        | −4     | −16    | 64              | 16       |\n| 2        | −2     | −6     | 12              | 4        |\n| 3        | 0      | −1     | 0               | 0        |\n| 4        | 2      | 9      | 18              | 4        |\n| 5        | 4      | 14     | 56              | 16       |\n| **Somme** |       |        | **150**         | **40**   |\n\n---\n\n## Étape 3 : Calculer la Pente (b₁)\n\nLa pente indique de combien y change pour chaque augmentation d'une unité de x.\n\n**b₁ = Σ(x − x̄)(y − ȳ) / Σ(x − x̄)²**\n**b₁ = 150 / 40 = 3.75**\n\n**Interprétation** : Pour chaque heure supplémentaire d'étude, le score prédit au test augmente de **3.75 points**.\n\n---\n\n## Étape 4 : Calculer l'Ordonnée à l'Origine (b₀)\n\nL'ordonnée à l'origine est la valeur prédite de y lorsque x = 0.\n\n**b₀ = ȳ − b₁ × x̄**\n**b₀ = 81.0 − 3.75 × 6.0 = 58.5**\n\n**Interprétation** : Un étudiant qui étudie zéro heure obtiendrait un score prédit de **58.5**.\n\n---\n\n## Étape 5 : Écrire l'Équation Finale\n\nEn combinant les deux :\n**y = 58.5 + 3.75x**\n\nCe modèle vous permet de faire des prédictions. Par exemple, étudier **7 heures** donne :\n58.5 + 3.75(7) = **84.75**.\n\n<Callout type=\"warning\" title=\"Danger d'Extrapolation\">\nPrédire en dehors de la plage de vos données (par exemple, étudier 50 heures) s'appelle **l'extrapolation**. Cela donne souvent des résultats absurdes et doit être évité.\n</Callout>\n\n---\n\n## Étape 6 : Mesurer l'Ajustement (R² et r)\n\n**R²** mesure quelle part de la variation de y est expliquée par le modèle.\n**r** (corrélation de Pearson) mesure la force et la direction de la relation linéaire.\n\nPour ce jeu de données, notre [calculateur de corrélation de Pearson](/pearson-correlation-calculator/) donnerait un **r de 0.9934**, ce qui indique une relation positive très forte. En savoir plus sur le [Coefficient de Corrélation de Pearson sur Statology](https://www.statology.org/pearson-correlation-coefficient/).\n\n---\n\n## Étape 7 : Vérifier les Hypothèses\n\nAvant de faire confiance à vos résultats, vous devez satisfaire les quatre hypothèses OLS. Notre [vérificateur d'hypothèses de régression](/regression-assumptions-checker/) peut vous aider à automatiser cela :\n\n1. **Linéarité** : La relation suit un motif en ligne droite.\n2. **Indépendance** : Les observations ne dépendent pas les unes des autres.\n3. **Homoscédasticité** : Les résidus (erreurs) ont une variance constante.\n4. **Normalité** : Les résidus sont approximativement distribués normalement.\n\n---\n\n## Au-delà de la Régression Simple\n\nUne fois les bases maîtrisées, vous pourriez avoir besoin d'outils plus avancés :\n\n- **Plusieurs Prédicteurs** : Utilisez la [régression linéaire multiple](/multiple-regression-calculator/) pour des scénarios complexes.\n- **Motifs Courbés** : Utilisez notre [calculateur de régression quadratique](/quadratic-regression-calculator/).\n- **Modèles de Croissance** : Explorez le [calculateur de régression exponentielle](/exponential-regression-calculator/).\n\n---\n\n## Points Clés à Retenir\n\n1. **La Pente** représente le taux de changement.\n2. **L'Ordonnée à l'Origine** fournit la valeur de référence à x=0.\n3. **R²** définit le pouvoir explicatif du modèle.\n4. **L'Extrapolation** est risquée — restez dans la plage de vos données.\n5. **Corrélation n'est pas causalité** — les statistiques montrent une association, pas nécessairement une cause à effet.\n\nPrêt à tester vos propres données ? Rendez-vous sur notre [calculateur de régression gratuit](/) et commencez dès aujourd'hui !",
      "filePath": "src/blog/fr/simple-linear-regression-step-by-step.mdx",
      "digest": "f804bdd9ae290485",
      "deferredRender": true,
      "collection": "blog"
    },
    "hi": {
      "id": "hi/simple-linear-regression-step-by-step",
      "data": {
        "title": "साधारण रैखिक प्रतिगमन: एक चरण-दर-चरण गणितीय वॉकथ्रू",
        "description": "साधारण रैखिक प्रतिगमन चरण-दर-चरण सीखें। ढाल, अवरोध, R² और सहसंबंध हाथ से गणना करें वास्तविक उदाहरणों के साथ।",
        "image": "/blog/linear-regression-steps.svg",
        "date": "2025-03-20T00:00:00.000Z",
        "author": "प्रतिगमन समीकरण कैलकुलेटर",
        "category": "सांख्यिकी"
      },
      "body": "import Callout from 'theme/components/Callout.astro';\n\nप्रत्येक सांख्यिकीय यात्रा एक एकल रेखा से शुरू होती है। साधारण रैखिक प्रतिगमन वह रेखा है — डेटा विज्ञान में सबसे मौलिक भविष्य कहने वाला मॉडल, और वह आधार जिस पर प्रत्येक उन्नत प्रतिगमन तकनीक बनाई गई है।\n\nयदि आप एकल स्वतंत्र चर से आश्रित चर की भविष्यवाणी करना चाहते हैं, तो हमारा [रैखिक प्रतिगमन कैलकुलेटर](/) आपको सेकंडों में उत्तर दे देगा। हालाँकि, यह समझना कि वह उत्तर *कैसे* प्राप्त किया गया है, एक डेटा अभ्यासी को उस व्यक्ति से अलग करता है जो केवल बटन दबाता है।\n\nयह मार्गदर्शिका आपको प्रथम सिद्धांतों से साधारण रैखिक प्रतिगमन के माध्यम से ले जाती है। हम एक छोटा डेटासेट लेंगे, प्रत्येक मध्यवर्ती मान की गणना हाथ से करेंगे, और एक साथ अंतिम समीकरण तक पहुँचेंगे। एक महान दृश्य परिचय के लिए, [खान अकादमी की ट्रेंड लाइन्स की मार्गदर्शिका](https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/introduction-to-trend-lines/v/fitting-a-line-to-data) देखें।\n\n<Callout type=\"conceptual\" title=\"आप क्या सीखेंगे\">\nइस लेख के अंत तक, आप कच्चे डेटा से प्रतिगमन समीकरण **y = mx + b** की गणना करने, परिणामों की व्याख्या करने और यह सत्यापित करने में सक्षम होंगे कि आपका डेटा आवश्यक मान्यताओं को पूरा करता है।\n</Callout>\n\n---\n\n## साधारण रैखिक प्रतिगमन क्या है?\n\nसाधारण रैखिक प्रतिगमन डेटा के माध्यम से एक सीधी रेखा फिट करके **एक स्वतंत्र चर (x)** और **एक आश्रित चर (y)** के बीच संबंध को मॉडल करता है।\n\n\"साधारण\" शब्द इसे [बहुचर प्रतिगमन](/multiple-regression-calculator/) से अलग करता है, जो दो या अधिक भविष्यवक्ताओं का उपयोग करता है। फिट की गई रेखा को वर्ग ऊर्ध्वाधर दूरियों के योग को न्यूनतम करने के लिए चुना जाता है — एक विधि जिसे **साधारण न्यूनतम वर्ग (OLS)** कहा जाता है।\n\n### कब उपयोग करें (और कब बचें)\n\n**इसका उपयोग तब करें जब:**\n- आपके पास एक सतत भविष्यवक्ता और एक सतत परिणाम हो।\n- आपका स्कैटर प्लॉट एक **लगभग रैखिक पैटर्न** दिखाता है।\n- आप यह मात्रा निर्धारित करना चाहते हैं कि प्रति इकाई x पर y कितना बदलता है।\n\n**इससे तब बचें जब:**\n- स्कैटर प्लॉट एक **स्पष्ट वक्र** दिखाता है — इसके बजाय हमारे [द्विघात प्रतिगमन कैलकुलेटर](/quadratic-regression-calculator/) को आज़माएं।\n- आपके पास **कई भविष्यवक्ता** हैं — [बहुचर रैखिक प्रतिगमन](/multiple-regression-calculator/) का उपयोग करें।\n- आपके डेटा में **चरम बाह्य बिंदु** हैं जो पूरे मॉडल को विकृत कर सकते हैं।\n\n---\n\n## डेटासेट\n\nमान लीजिए कि एक ट्यूशन कंपनी अध्ययन के घंटों (x) और परिणामी परीक्षण अंकों (y) को ट्रैक करती है:\n\n| छात्र | अध्ययन के घंटे (x) | परीक्षण अंक (y) |\n|-------|-------------------|----------------|\n| 1     | 2                 | 65             |\n| 2     | 4                 | 75             |\n| 3     | 6                 | 80             |\n| 4     | 8                 | 90             |\n| 5     | 10                | 95             |\n\n---\n\n## चरण 1: माध्यों की गणना करें\n\nपहला चरण दोनों चरों के अंकगणितीय माध्य की गणना करना है।\n\n**x का माध्य (x̄):** (2 + 4 + 6 + 8 + 10) / 5 = **6.0**\n**y का माध्य (ȳ):** (65 + 75 + 80 + 90 + 95) / 5 = **81.0**\n\nप्रतिगमन रेखा हमेशा बिंदु (**6.0, 81.0**) से गुजरेगी।\n\n---\n\n## चरण 2: विचलन और उत्पादों की गणना करें\n\nइसके बाद, हम गणना करते हैं कि प्रत्येक बिंदु माध्य से कितना दूर है और परिणामों को गुणा करते हैं।\n\n| छात्र | x − x̄ | y − ȳ | (x − x̄)(y − ȳ) | (x − x̄)² |\n|-------|--------|--------|-----------------|----------|\n| 1     | −4     | −16    | 64              | 16       |\n| 2     | −2     | −6     | 12              | 4        |\n| 3     | 0      | −1     | 0               | 0        |\n| 4     | 2      | 9      | 18              | 4        |\n| 5     | 4      | 14     | 56              | 16       |\n| **योग** | | | **150** | **40** |\n\n---\n\n## चरण 3: ढाल (b₁) की गणना करें\n\nढाल आपको बताती है कि x में प्रत्येक एक-इकाई वृद्धि के लिए y कितना बदलता है।\n\n**b₁ = Σ(x − x̄)(y − ȳ) / Σ(x − x̄)²**\n**b₁ = 150 / 40 = 3.75**\n\n**व्याख्या**: अध्ययन के प्रत्येक अतिरिक्त घंटे के लिए, अनुमानित परीक्षण स्कोर **3.75 अंक** बढ़ जाता है।\n\n---\n\n## चरण 4: अवरोध (b₀) की गणना करें\n\nअवरोध अनुमानित y है जब x = 0 हो।\n\n**b₀ = ȳ − b₁ × x̄**\n**b₀ = 81.0 − 3.75 × 6.0 = 58.5**\n\n**व्याख्या**: शून्य घंटे अध्ययन करने वाले छात्र का स्कोर **58.5** होने का अनुमान है।\n\n---\n\n## चरण 5: अंतिम समीकरण लिखें\n\nदोनों को मिलाकर:\n**y = 58.5 + 3.75x**\n\nयह मॉडल आपको भविष्यवाणियाँ करने देता है। उदाहरण के लिए, **7 घंटे** अध्ययन करने पर परिणाम मिलता है:\n58.5 + 3.75(7) = **84.75**।\n\n<Callout type=\"warning\" title=\"बहिर्वेशन का खतरा\">\nअपने डेटा की सीमा के बाहर भविष्यवाणी करना (जैसे, 50 घंटे अध्ययन करना) **बहिर्वेशन (extrapolation)** कहलाता है। यह अक्सर निरर्थक परिणाम देता है और इससे बचना चाहिए।\n</Callout>\n\n---\n\n## चरण 6: फिट को मापें (R² और r)\n\n**R²** मापता है कि मॉडल द्वारा y में कितनी भिन्नता स्पष्ट की गई है।\n**r** (पियर्सन सहसंबंध) रैखिक संबंध की प्रबलता और दिशा को मापता है।\n\nइस डेटासेट के लिए, हमारा [पियर्सन सहसंबंध कैलकुलेटर](/pearson-correlation-calculator/) **0.9934 का r** देगा, जो एक बहुत मजबूत सकारात्मक संबंध इंगित करता है। [Statology पर पियर्सन सहसंबंध गुणांक](https://www.statology.org/pearson-correlation-coefficient/) के बारे में और जानें।\n\n---\n\n## चरण 7: मान्यताओं को सत्यापित करें\n\nअपने परिणामों पर भरोसा करने से पहले, आपको चार OLS मान्यताओं को पूरा करना होगा। हमारा [प्रतिगमन अनुमान परीक्षक](/regression-assumptions-checker/) इसे स्वचालित करने में आपकी सहायता कर सकता है:\n\n1. **रैखिकता**: संबंध एक सीधी-रेखा पैटर्न का अनुसरण करता है।\n2. **स्वतंत्रता**: अवलोकन एक-दूसरे पर निर्भर नहीं हैं।\n3. **समप्रसरणता**: अवशिष्टों (त्रुटियों) में नियत विचरण होता है।\n4. **सामान्यता**: अवशिष्ट लगभग सामान्य रूप से वितरित होते हैं।\n\n---\n\n## साधारण प्रतिगमन से आगे\n\nएक बार जब आप मूल बातें सीख लेते हैं, तो आपको अधिक उन्नत टूल की आवश्यकता हो सकती है:\n\n- **बहु भविष्यवक्ता**: जटिल परिदृश्यों के लिए [बहुचर रैखिक प्रतिगमन](/multiple-regression-calculator/) का उपयोग करें।\n- **वक्र पैटर्न**: हमारे [द्विघात प्रतिगमन कैलकुलेटर](/quadratic-regression-calculator/) का उपयोग करें।\n- **विकास मॉडल**: [घातांतीय प्रतिगमन कैलकुलेटर](/exponential-regression-calculator/) का अन्वेषण करें।\n\n---\n\n## मुख्य निष्कर्ष\n\n1. **ढाल** परिवर्तन की दर का प्रतिनिधित्व करती है।\n2. **अवरोध** x=0 पर आधारभूत मान प्रदान करता है।\n3. **R²** मॉडल की व्याख्यात्मक शक्ति को परिभाषित करता है।\n4. **बहिर्वेशन** जोखिम भरा है — अपने डेटा की सीमा के भीतर रहें।\n5. **सहसंबंध कारण-कार्य नहीं है** — सांख्यिकी संघ दिखाती है, जरूरी नहीं कि कारण-और-प्रभाव।\n\nअपना स्वयं का डेटा परीक्षण करने के लिए तैयार हैं? हमारे [मुफ्त प्रतिगमन कैलकुलेटर](/) पर जाएं और आज ही शुरू करें!",
      "filePath": "src/blog/hi/simple-linear-regression-step-by-step.mdx",
      "digest": "fb8edca47f621ecf",
      "deferredRender": true,
      "collection": "blog"
    },
    "ko": {
      "id": "ko/simple-linear-regression-step-by-step",
      "data": {
        "title": "단순 선형 회귀: 단계별 수학적 해설 가이드",
        "description": "단순 선형 회귀를 단계별로 마스터하세요. 실제 예제로 기울기, 절편, R², 상관계수(r)를 직접 계산하는 방법을 배우세요.",
        "image": "/blog/linear-regression-steps.svg",
        "date": "2025-03-20T00:00:00.000Z",
        "author": "회귀 방정식 계산기",
        "category": "통계학"
      },
      "body": "import Callout from '../../../theme/components/Callout.astro';\n\n모든 통계적 여정은 하나의 선에서 시작됩니다. 단순 선형 회귀가 바로 그 선입니다 — 데이터 과학에서 가장 기본적인 예측 모델이자, 모든 고급 회귀 기법의 기반이 되는 것입니다.\n\n하나의 독립 변수로 종속 변수를 예측하고 싶다면, 우리의 [선형 회귀 계산기](/)가 몇 초 만에 답을 제공할 것입니다. 하지만 그 답이 *어떻게* 도출되는지 이해하는 것이 단순히 버튼만 누르는 사람과 데이터 실무자를 구분 짓습니다.\n\n이 가이드는 최초 원리부터 단순 선형 회귀를 안내합니다. 작은 데이터셋을 사용하여 모든 중간 값을 직접 계산하고, 최종 방정식을 함께 도출할 것입니다. 시각적 소개를 원하신다면 [칸아카데미의 추세선 가이드](https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/introduction-to-trend-lines/v/fitting-a-line-to-data)를 확인해 보세요.\n\n<Callout type=\"conceptual\" title=\"배울 내용\">\n이 글을 마치면 원시 데이터에서 회귀 방정식 **y = mx + b**를 계산하고, 결과를 해석하며, 데이터가 필요한 가정을 충족하는지 확인할 수 있게 됩니다.\n</Callout>\n\n---\n\n## 단순 선형 회귀란 무엇인가?\n\n단순 선형 회귀는 데이터에 직선을 맞추어 **하나의 독립 변수(x)**와 **하나의 종속 변수(y)** 사이의 관계를 모델링합니다.\n\n\"단순\"이라는 단어는 두 개 이상의 예측 변수를 사용하는 [다중 회귀](/multiple-regression-calculator/)와 구별하기 위한 것입니다. 적합된 선은 제곱된 수직 거리의 합을 최소화하도록 선택됩니다 — 이를 **최소제곱법(OLS)**이라고 합니다.\n\n### 사용해야 할 때 (피해야 할 때)\n\n**사용해야 할 때:**\n- 하나의 연속형 예측 변수와 하나의 연속형 결과 변수가 있을 때\n- 산점도가 **대략 선형 패턴**을 보일 때\n- x 단위당 y가 얼마나 변하는지 정량화하고 싶을 때\n\n**피해야 할 때:**\n- 산점도가 **명확한 곡선**을 보일 때 — 대신 우리의 [이차 회귀 계산기](/quadratic-regression-calculator/)를 사용해 보세요.\n- **여러 예측 변수**가 있을 때 — [다중 선형 회귀](/multiple-regression-calculator/)를 사용하세요.\n- 데이터에 전체 모델을 왜곡할 수 있는 **극단적 이상치**가 있을 때\n\n---\n\n## 데이터셋\n\n어느 과외 회사가 학습 시간(x)과 그에 따른 시험 점수(y)를 기록했다고 가정해 봅시다:\n\n| 학생 | 학습 시간 (x) | 시험 점수 (y) |\n|------|---------------|---------------|\n| 1    | 2             | 65            |\n| 2    | 4             | 75            |\n| 3    | 6             | 80            |\n| 4    | 8             | 90            |\n| 5    | 10            | 95            |\n\n---\n\n## 1단계: 평균 계산하기\n\n첫 번째 단계는 두 변수의 산술 평균을 계산하는 것입니다.\n\n**x의 평균 (x̄):** (2 + 4 + 6 + 8 + 10) / 5 = **6.0**\n**y의 평균 (ȳ):** (65 + 75 + 80 + 90 + 95) / 5 = **81.0**\n\n회귀선은 항상 점 (**6.0, 81.0**)을 지납니다.\n\n---\n\n## 2단계: 편차와 곱 계산하기\n\n다음으로 각 점이 평균에서 얼마나 떨어져 있는지 계산하고 결과를 곱합니다.\n\n| 학생 | x − x̄ | y − ȳ | (x − x̄)(y − ȳ) | (x − x̄)² |\n|------|--------|--------|-----------------|----------|\n| 1    | −4     | −16    | 64              | 16       |\n| 2    | −2     | −6     | 12              | 4        |\n| 3    | 0      | −1     | 0               | 0        |\n| 4    | 2      | 9      | 18              | 4        |\n| 5    | 4      | 14     | 56              | 16       |\n| **합계** | | | **150** | **40** |\n\n---\n\n## 3단계: 기울기 계산하기 (b₁)\n\n기울기는 x가 1단위 증가할 때 y가 얼마나 변하는지 알려줍니다.\n\n**b₁ = Σ(x − x̄)(y − ȳ) / Σ(x − x̄)²**\n**b₁ = 150 / 40 = 3.75**\n\n**해석**: 학습 시간이 1시간 추가될 때마다 예측 시험 점수가 **3.75점** 증가합니다.\n\n---\n\n## 4단계: 절편 계산하기 (b₀)\n\n절편은 x = 0일 때 예측되는 y값입니다.\n\n**b₀ = ȳ − b₁ × x̄**\n**b₀ = 81.0 − 3.75 × 6.0 = 58.5**\n\n**해석**: 학습 시간이 0시간인 학생은 **58.5점**을 받을 것으로 예측됩니다.\n\n---\n\n## 5단계: 최종 방정식 작성하기\n\n둘을 결합하면:\n**y = 58.5 + 3.75x**\n\n이 모델로 예측을 할 수 있습니다. 예를 들어, **7시간**을 학습하면:\n58.5 + 3.75(7) = **84.75**.\n\n<Callout type=\"warning\" title=\"외삽의 위험\">\n데이터 범위를 벗어나 예측하는 것(예: 50시간 학습)을 **외삽**이라고 합니다. 이는 종종 무의미한 결과를 내며 피해야 합니다.\n</Callout>\n\n---\n\n## 6단계: 적합도 측정하기 (R²와 r)\n\n**R²**는 y의 변동 중 모델이 설명하는 비율을 측정합니다.\n**r**(피어슨 상관계수)는 선형 관계의 강도와 방향을 측정합니다.\n\n이 데이터셋에 대해 우리의 [피어슨 상관계수 계산기](/pearson-correlation-calculator/)는 **r이 0.9934**임을 보여주며, 이는 매우 강한 양의 관계를 나타냅니다. [Statology의 피어슨 상관계수](https://www.statology.org/pearson-correlation-coefficient/)에 대해 더 알아보세요.\n\n---\n\n## 7단계: 가정 확인하기\n\n결과를 신뢰하기 전에 네 가지 OLS 가정을 충족해야 합니다. 우리의 [회귀 가정 검사기](/regression-assumptions-checker/)가 이를 자동화하는 데 도움을 줄 수 있습니다:\n\n1. **선형성**: 관계가 직선 패턴을 따릅니다.\n2. **독립성**: 관측치가 서로 종속적이지 않습니다.\n3. **등분산성**: 잔차(오차)가 일정한 분산을 가집니다.\n4. **정규성**: 잔차가 대략 정규 분포를 따릅니다.\n\n---\n\n## 단순 회귀 그 이상\n\n기본기를 마스터한 후에는 더 고급 도구가 필요할 수 있습니다:\n\n- **여러 예측 변수**: 복잡한 시나리오에 [다중 선형 회귀](/multiple-regression-calculator/)를 사용하세요.\n- **곡선 패턴**: 우리의 [이차 회귀 계산기](/quadratic-regression-calculator/)를 사용하세요.\n- **성장 모델**: [지수 회귀 계산기](/exponential-regression-calculator/)를 탐색해 보세요.\n\n---\n\n## 핵심 요약\n\n1. **기울기**는 변화율을 나타냅니다.\n2. **절편**은 x=0에서의 기준값을 제공합니다.\n3. **R²**는 모델의 설명력을 정의합니다.\n4. **외삽**은 위험합니다 — 데이터 범위 내에 머무르세요.\n5. **상관관계는 인과관계가 아닙니다** — 통계는 연관성을 보여줄 뿐, 반드시 인과관계를 의미하지는 않습니다.\n\n직접 데이터를 테스트할 준비가 되셨나요? 우리의 [무료 회귀 계산기](/)로 오늘 바로 시작해 보세요!",
      "filePath": "src/blog/ko/simple-linear-regression-step-by-step.mdx",
      "digest": "88a737d9b61eb2ba",
      "deferredRender": true,
      "collection": "blog"
    },
    "ms": {
      "id": "ms/simple-linear-regression-step-by-step",
      "data": {
        "title": "Regresi Linear Ringkas: Panduan Matematik Langkah Demi Langkah",
        "description": "Kuasai regresi linear ringkas langkah demi langkah. Belajar mengira kecerunan, pintasan, R² dan korelasi (r) secara manual dengan contoh sebenar.",
        "image": "/blog/linear-regression-steps.svg",
        "date": "2025-03-20T00:00:00.000Z",
        "author": "Kalkulator Persamaan Regresi",
        "category": "Statistik"
      },
      "body": "import Callout from '../../../theme/components/Callout.astro';\n\nSetiap perjalanan statistik bermula dengan satu garis. Regresi linear ringkas ialah garis tersebut — model ramalan paling asas dalam sains data, dan asas yang membina setiap teknik regresi lanjutan.\n\nJika anda ingin meramal pembolehubah bersandar daripada satu pembolehubah bebas, [Kalkulator Regresi Linear](/) kami akan memberikan jawapan dalam beberapa saat. Walau bagaimanapun, memahami *bagaimana* jawapan itu diperoleh membezakan pengamal data daripada seseorang yang hanya menekan butang.\n\nPanduan ini membawa anda melalui regresi linear ringkas dari prinsip asas. Kami akan mengambil set data kecil, mengira setiap nilai perantara secara manual, dan tiba pada persamaan akhir bersama-sama. Untuk pengenalan visual yang baik, layari [panduan Garis Trend Khan Academy](https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/introduction-to-trend-lines/v/fitting-a-line-to-data).\n\n<Callout type=\"conceptual\" title=\"Apa yang Akan Anda Pelajari\">\nMenjelang akhir artikel ini, anda akan dapat mengira persamaan regresi **y = mx + b** daripada data mentah, mentafsir keputusan, dan mengesahkan bahawa data anda memenuhi anggapan yang diperlukan.\n</Callout>\n\n---\n\n## Apa Itu Regresi Linear Ringkas?\n\nRegresi linear ringkas memodelkan hubungan antara **satu pembolehubah bebas (x)** dan **satu pembolehubah bersandar (y)** dengan memasang garis lurus melalui data.\n\nPerkataan \"ringkas\" membezakannya daripada [regresi berganda](/multiple-regression-calculator/), yang menggunakan dua atau lebih peramal. Garis yang dipasang dipilih untuk meminimumkan jumlah jarak menegak kuasa dua — kaedah yang dipanggil **kuasa dua terkecil biasa (OLS)**.\n\n### Bila Menggunakannya (dan Bila Mengelaknya)\n\n**Gunakannya apabila:**\n- Anda mempunyai satu peramal berterusan dan satu hasil berterusan.\n- Plot serakan menunjukkan **corak lebih kurang linear**.\n- Anda ingin mengukur berapa banyak y berubah per unit x.\n\n**Elakkan apabila:**\n- Plot serakan menunjukkan **lengkungan jelas** — cuba [kalkulator regresi kuadratik](/quadratic-regression-calculator/) kami sebaliknya.\n- Anda mempunyai **berbilang peramal** — gunakan [regresi linear berganda](/multiple-regression-calculator/).\n- Data anda mengandungi **pencilan ekstrem** yang boleh memesongkan seluruh model.\n\n---\n\n## Set Data\n\nKatakan sebuah syarikat tuisyen mengesan jam belajar (x) dan skor ujian yang dihasilkan (y):\n\n| Pelajar | Jam Belajar (x) | Skor Ujian (y) |\n|---------|------------------|-----------------|\n| 1       | 2                | 65              |\n| 2       | 4                | 75              |\n| 3       | 6                | 80              |\n| 4       | 8                | 90              |\n| 5       | 10               | 95              |\n\n---\n\n## Langkah 1: Kira Min\n\nLangkah pertama ialah mengira min aritmetik kedua-dua pembolehubah.\n\n**Min x (x̄):** (2 + 4 + 6 + 8 + 10) / 5 = **6.0**\n**Min y (ȳ):** (65 + 75 + 80 + 90 + 95) / 5 = **81.0**\n\nGaris regresi akan sentiasa melalui titik (**6.0, 81.0**).\n\n---\n\n## Langkah 2: Kirakan Sisihan dan Hasil Darab\n\nSeterusnya, kami mengira sejauh mana setiap titik daripada min dan mendarab keputusan tersebut.\n\n| Pelajar | x − x̄ | y − ȳ | (x − x̄)(y − ȳ) | (x − x̄)² |\n|---------|--------|--------|-----------------|----------|\n| 1       | −4     | −16    | 64              | 16       |\n| 2       | −2     | −6     | 12              | 4        |\n| 3       | 0      | −1     | 0               | 0        |\n| 4       | 2      | 9      | 18              | 4        |\n| 5       | 4      | 14     | 56              | 16       |\n| **Jumlah** | | | **150** | **40** |\n\n---\n\n## Langkah 3: Kira Kecerunan (b₁)\n\nKecerunan memberitahu anda berapa banyak y berubah bagi setiap peningkatan satu unit dalam x.\n\n**b₁ = Σ(x − x̄)(y − ȳ) / Σ(x − x̄)²**\n**b₁ = 150 / 40 = 3.75**\n\n**Tafsiran**: Setiap jam belajar tambahan, skor ujian ramalan meningkat sebanyak **3.75 mata**.\n\n---\n\n## Langkah 4: Kira Pintasan (b₀)\n\nPintasan ialah y yang diramal apabila x = 0.\n\n**b₀ = ȳ − b₁ × x̄**\n**b₀ = 81.0 − 3.75 × 6.0 = 58.5**\n\n**Tafsiran**: Pelajar yang belajar sifar jam diramal mendapat skor **58.5**.\n\n---\n\n## Langkah 5: Tulis Persamaan Akhir\n\nMenggabungkan kedua-duanya:\n**y = 58.5 + 3.75x**\n\nModel ini membolehkan anda membuat ramalan. Contohnya, belajar selama **7 jam** menghasilkan:\n58.5 + 3.75(7) = **84.75**.\n\n<Callout type=\"warning\" title=\"Bahaya Ekstrapolasi\">\nMeramal di luar julat data anda (contohnya, belajar 50 jam) dipanggil **ekstrapolasi**. Ia sering menghasilkan keputusan yang tidak masuk akal dan harus dielakkan.\n</Callout>\n\n---\n\n## Langkah 6: Ukur Kesesuaian (R² dan r)\n\n**R²** mengukur berapa banyak variasi dalam diterangkan oleh model.\n**r** (korelasi Pearson) mengukur kekuatan dan arah hubungan linear.\n\nUntuk set data ini, [kalkulator korelasi Pearson](/pearson-correlation-calculator/) kami akan menghasilkan **r sebanyak 0.9934**, menunjukkan hubungan positif yang sangat kuat. Ketahui lebih lanjut tentang [Pekali Korelasi Pearson di Statology](https://www.statology.org/pearson-correlation-coefficient/).\n\n---\n\n## Langkah 7: Sahkan Anggapan\n\nSebelum mempercayai keputusan anda, anda mesti memenuhi empat anggapan OLS. [Pemeriksa anggapan regresi](/regression-assumptions-checker/) kami boleh membantu mengautomasikan ini:\n\n1. **Kelinearan**: Hubungan mengikut corak garis lurus.\n2. **Kemandirian**: Pemerhatian tidak bergantung antara satu sama lain.\n3. **Homoskedastisiti**: Residual (ralat) mempunyai varians malar.\n4. **Kenormalan**: Residual adalah lebih kurang bertaburan normal.\n\n---\n\n## Melangkah Melebihi Regresi Ringkas\n\nSebaik sahaja anda menguasai asas, anda mungkin memerlukan alat yang lebih lanjut:\n\n- **Berbilang Peramal**: Gunakan [regresi linear berganda](/multiple-regression-calculator/) untuk senario kompleks.\n- **Corak Melengkung**: Gunakan [kalkulator regresi kuadratik](/quadratic-regression-calculator/) kami.\n- **Model Pertumbuhan**: Terokai [kalkulator regresi eksponen](/exponential-regression-calculator/).\n\n---\n\n## Ringkasan Utama\n\n1. **Kecerunan** mewakili kadar perubahan.\n2. **Pintasan** menyediakan nilai asas pada x=0.\n3. **R²** mentakrifkan kuasa penjelasan model.\n4. **Ekstrapolasi** berisiko — kekal dalam julat data anda.\n5. **Korelasi bukan kausaliti** — statistik menunjukkan perkaitan, bukan semestinya sebab-akibat.\n\nSedia menguji data anda sendiri? Pergi ke [kalkulator regresi percuma](/) kami dan mulakan hari ini!",
      "filePath": "src/blog/ms/simple-linear-regression-step-by-step.mdx",
      "digest": "82fa5287245baab2",
      "deferredRender": true,
      "collection": "blog"
    },
    "ru": {
      "id": "ru/simple-linear-regression-step-by-step",
      "data": {
        "title": "Простая линейная регрессия: Пошаговое математическое руководство",
        "description": "Освойте простую линейную регрессию шаг за шагом. Научитесь вычислять наклон, пересечение, R² и корреляцию вручную с реальными примерами.",
        "image": "/blog/linear-regression-steps.svg",
        "date": "2025-03-20T00:00:00.000Z",
        "author": "Калькулятор Уравнений Регрессии",
        "category": "Статистика"
      },
      "body": "import Callout from '../../../theme/components/Callout.astro';\n\nКаждое статистическое путешествие начинается с одной линии. Простая линейная регрессия — это та линия, самая фундаментальная прогностическая модель в анализе данных и основа, на которой строится каждый продвинутый метод регрессии.\n\nЕсли вы хотите предсказать зависимую переменную по одной независимой переменной, наш [Калькулятор Линейной Регрессии](/) даст вам ответ за секунды. Однако понимание *того, как* получен этот ответ, отличает специалиста по данным от того, кто просто нажимает кнопки.\n\nЭто руководство проведёт вас через простую линейную регрессию от первых принципов. Мы возьмём небольшой набор данных, вычислим каждое промежуточное значение вручную и вместе придём к итоговому уравнению. Для отличного визуального введения ознакомьтесь с [руководством по Линиям Тренда от Khan Academy](https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/introduction-to-trend-lines/v/fitting-a-line-to-data).\n\n<Callout type=\"conceptual\" title=\"Что Вы Узнаете\">\nК концу этой статьи вы сможете вычислить уравнение регрессии **y = mx + b** из исходных данных, интерпретировать результаты и убедиться, что ваши данные удовлетворяют необходимым предположениям.\n</Callout>\n\n---\n\n## Что Такое Простая Линейная Регрессия?\n\nПростая линейная регрессия моделирует связь между **одной независимой переменной (x)** и **одной зависимой переменной (y)**, проводя прямую линию через данные.\n\nСлово «простая» отличает её от [множественной регрессии](/multiple-regression-calculator/), которая использует два или более предиктора. Подогнанная линия выбирается так, чтобы минимизировать сумму квадратов вертикальных расстояний — метод, называемый **обычным методом наименьших квадратов (OLS)**.\n\n### Когда Использовать (и Когда Избегать)\n\n**Используйте, когда:**\n- У вас один непрерывный предиктор и один непрерывный результат.\n- Диаграмма рассеяния показывает **приблизительно линейный паттерн**.\n- Вы хотите количественно определить, насколько y изменяется на единицу x.\n\n**Избегайте, когда:**\n- Диаграмма рассеяния показывает **явную кривую** — попробуйте наш [калькулятор квадратичной регрессии](/quadratic-regression-calculator/).\n- У вас **множество предикторов** — используйте [множественную линейную регрессию](/multiple-regression-calculator/).\n- Ваши данные содержат **экстремальные выбросы**, которые могут исказить всю модель.\n\n---\n\n## Набор Данных\n\nПредположим, компания репетиторов отслеживает часы учёбы (x) и итоговые оценки за тест (y):\n\n| Студент | Часы Учёбы (x) | Оценка за Тест (y) |\n|---------|-----------------|---------------------|\n| 1       | 2               | 65                  |\n| 2       | 4               | 75                  |\n| 3       | 6               | 80                  |\n| 4       | 8               | 90                  |\n| 5       | 10              | 95                  |\n\n---\n\n## Шаг 1: Вычислить Средние Значения\n\nПервый шаг — вычислить среднее арифметическое обеих переменных.\n\n**Среднее x (x̄):** (2 + 4 + 6 + 8 + 10) / 5 = **6.0**\n**Среднее y (ȳ):** (65 + 75 + 80 + 90 + 95) / 5 = **81.0**\n\nЛиния регрессии всегда проходит через точку (**6.0, 81.0**).\n\n---\n\n## Шаг 2: Вычислить Отклонения и Произведения\n\nДалее вычислим, насколько каждая точка удалена от среднего, и перемножим результаты.\n\n| Студент | x − x̄ | y − ȳ | (x − x̄)(y − ȳ) | (x − x̄)² |\n|---------|--------|--------|-----------------|----------|\n| 1       | −4     | −16    | 64              | 16       |\n| 2       | −2     | −6     | 12              | 4        |\n| 3       | 0      | −1     | 0               | 0        |\n| 4       | 2      | 9      | 18              | 4        |\n| 5       | 4      | 14     | 56              | 16       |\n| **Сумма** |      |        | **150**         | **40**   |\n\n---\n\n## Шаг 3: Вычислить Наклон (b₁)\n\nНаклон показывает, насколько изменяется y при увеличении x на одну единицу.\n\n**b₁ = Σ(x − x̄)(y − ȳ) / Σ(x − x̄)²**\n**b₁ = 150 / 40 = 3.75**\n\n**Интерпретация**: За каждый дополнительный час учёбы предсказанная оценка за тест увеличивается на **3.75 балла**.\n\n---\n\n## Шаг 4: Вычислить Пересечение (b₀)\n\nПересечение — это предсказанное значение y при x = 0.\n\n**b₀ = ȳ − b₁ × x̄**\n**b₀ = 81.0 − 3.75 × 6.0 = 58.5**\n\n**Интерпретация**: Студент, не занимающийся вообще, получит предсказанную оценку **58.5**.\n\n---\n\n## Шаг 5: Записать Итоговое Уравнение\n\nОбъединяя оба значения:\n**y = 58.5 + 3.75x**\n\nЭта модель позволяет делать прогнозы. Например, при учёбе **7 часов** получаем:\n58.5 + 3.75(7) = **84.75**.\n\n<Callout type=\"warning\" title=\"Опасность Экстраполяции\">\nПрогнозирование за пределами диапазона ваших данных (например, 50 часов учёбы) называется **экстраполяцией**. Это часто даёт бессмысленные результаты, и этого следует избегать.\n</Callout>\n\n---\n\n## Шаг 6: Оценить Качество Подгонки (R² и r)\n\n**R²** измеряет, какая доля вариации y объясняется моделью.\n**r** (корреляция Пирсона) измеряет силу и направление линейной связи.\n\nДля этого набора данных наш [калькулятор корреляции Пирсона](/pearson-correlation-calculator/) даст **r = 0.9934**, что указывает на очень сильную положительную связь. Узнайте больше о [Коэффициенте Корреляции Пирсона на Statology](https://www.statology.org/pearson-correlation-coefficient/).\n\n---\n\n## Шаг 7: Проверить Предположения\n\nПрежде чем доверять результатам, вы должны удовлетворить четырём предположениям OLS. Наш [проверщик предположений регрессии](/regression-assumptions-checker/) поможет автоматизировать это:\n\n1. **Линейность**: Связь следует прямолинейному паттерну.\n2. **Независимость**: Наблюдения не зависят друг от друга.\n3. **Гомоскедастичность**: Остатки (ошибки) имеют постоянную дисперсию.\n4. **Нормальность**: Остатки приблизительно нормально распределены.\n\n---\n\n## За Пределами Простой Регрессии\n\nОсвоив основы, вам могут понадобиться более продвинутые инструменты:\n\n- **Множественные Предикторы**: Используйте [множественную линейную регрессию](/multiple-regression-calculator/) для сложных сценариев.\n- **Криволинейные Паттерны**: Используйте наш [калькулятор квадратичной регрессии](/quadratic-regression-calculator/).\n- **Модели Роста**: Исследуйте [калькулятор экспоненциальной регрессии](/exponential-regression-calculator/).\n\n---\n\n## Ключевые Выводы\n\n1. **Наклон** отражает скорость изменения.\n2. **Пересечение** даёт базовое значение при x=0.\n3. **R²** определяет объяснительную силу модели.\n4. **Экстраполяция** рискованна — оставайтесь в пределах диапазона данных.\n5. **Корреляция — это не причинность** — статистика показывает ассоциацию, а не обязательно причину и следствие.\n\nГотовы проверить свои данные? Перейдите к нашему [бесплатному калькулятору регрессии](/) и начните прямо сейчас!",
      "filePath": "src/blog/ru/simple-linear-regression-step-by-step.mdx",
      "digest": "68b58d0bd0ec9097",
      "deferredRender": true,
      "collection": "blog"
    }
  }
}