Simple Linear Regression

Yaser Rahmati | یاسر رحمتی

Simple linear regression aims to model the relationship between a dependent variable 𝑦 and a single independent variable 𝑥 using a linear equation. The linear equation is represented as:

y=β0+β1x+ϵy=\beta_{0}+\beta_{1}x+\epsilon

where:

  • y{y} is the dependent variable.

  • x{x} is the independent variable.

  • β0{\beta_{0}} is the intercept (the value of y{y} when x=0{x=0}.

  • β1{\beta_{1}} is the slope.

  • ϵ{\epsilon} is the error term (residual).

Objective

The goal is to estimate the coefficients β0{\beta_{0}}​ and β1{\beta_{1}}​ that minimize the sum of squared residuals (errors). The residual for each observation is the difference between the observed value yi{y_{i}}​ and the predicted value yi^{\widehat{y_{i}}}​:

RSS=i=1n(yiyi^)2=i=1n(yi(β0+β1x1))2RSS=\sum_{i=1}^{n}(y_{i}-\widehat{y_{i}})^2=\sum_{i=1}^{n}(y_{i}-(\beta_{0}+\beta_{1}x_{1}))^2

Estimating Coefficients

To find the estimates of β0{\beta_{0}}​ and β1{\beta_{1}}, we use the least squares method:

1. Slope

β1=i=15(xix)(yiy)i=15(xix)2\beta_{1}=\frac{\sum_{i=1}^{5}(x_{i}-\overline{x})(y_{i}-\overline{y})}{\sum_{i=1}^{5}(x_{i}-\overline{x})^{2}}

2. Intercept

β0=yβ1x\beta_{0}=\overline{y}-\beta_{1}\overline{x}

Numerical Example

Let's consider a simple dataset with five observations to illustrate the calculation.

Dataset

Observation
x
y

1

1

2

2

2

3

3

3

5

4

4

4

5

5

6

Step-by-Step Calculation

1. Calculate the means

x=1+2+3+4+55=3y=2+3+5+4+65=4\overline{x}=\frac{1+2+3+4+5}{5}=3 \\ \\ \overline{y}=\frac{2+3+5+4+6}{5}=4

2. Calculate the slope

β1=i=15(xix)(yiy)i=15(xix)2=0.9\beta_{1}=\frac{\sum_{i=1}^{5}(x_{i}-\overline{x})(y_{i}-\overline{y})}{\sum_{i=1}^{5}(x_{i}-\overline{x})^{2}}=0.9

3. Calculate the intercept

β0=yβ1x=4(0.9×3)=1.3\beta_{0}=\overline{y}-\beta_{1}\overline{x}=4-(0.9\times 3)=1.3

Regression Line

The regression line is:

y^=1.3+0.9x\widehat{y}=1.3+0.9x

Predictions and Residuals

Let's compute the predicted values and residuals for each observation:

Observation
x
y
1.3+0.9x
Residual

1

1

2

2.2

-0.2

2

2

3

3.1

-0.1

3

3

5

4

1

4

4

4

4.9

-0.9

5

5

6

5.8

0.2

Sum of Squared Residuals (RSS)

RSS=(0.2)2+(0.1)2+(1.0)2+(0.9)2+(0.2)2=1.9RSS=(−0.2)^2 +(−0.1)^2 +(1.0)^2 +(−0.9)^2 +(0.2)^2 = 1.9

Python Code for Simple Linear Regression

Using NumPy and Manual Calculation

import numpy as np

# Dataset
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 5, 4, 6])

# Calculate the means
x_mean = np.mean(x)
y_mean = np.mean(y)

# Calculate the slope (beta_1)
numerator = np.sum((x - x_mean) * (y - y_mean))
denominator = np.sum((x - x_mean) ** 2)
beta_1 = numerator / denominator

# Calculate the intercept (beta_0)
beta_0 = y_mean - beta_1 * x_mean

# Display the coefficients
print(f"Intercept (beta_0): {beta_0}")
print(f"Slope (beta_1): {beta_1}")

# Predicting y values
y_pred = beta_0 + beta_1 * x

# Calculate residuals and RSS
residuals = y - y_pred
rss = np.sum(residuals ** 2)

print(f"Predicted values: {y_pred}")
print(f"Residuals: {residuals}")
print(f"Sum of Squared Residuals (RSS): {rss}")

And the output is:

Intercept (beta_0): 1.2999999999999998
Slope (beta_1): 0.9
Predicted values: [2.2 3.1 4.  4.9 5.8]
Residuals: [-0.2 -0.1  1.  -0.9  0.2]
Sum of Squared Residuals (RSS): 1.9000000000000004

Last updated