Skip to content

Linear Curve Fitting

Published On:
Jun 26, 2018
Last Updated:
Aug 30, 2024

Fitting A Linear Curve (Line)

Fitting a linear curve (a line!) to a set of data is called linear regression. Typically, we want to minimize the square of the vertical error between each point and the line. The following graph shows four data points in green, and the calculated line of best fit in blue:

The linear curve fitting (using the least squares approach) to four data points.

We can write an equation for the error as follows:

err=di2=(y1f(x1))2+(y2f(x2))2+(y3f(x3))2=i=1n(yif(xi))2\begin{align} err & = \sum d_i^2 \\ & = (y_1 - f(x_1))^2 + (y_2 - f(x_2))^2 + (y_3 - f(x_3))^2 \\ & = \sum_{i = 1}^{n} (y_i - f(x_i))^2 \end{align}

where:
nn is the number of points of data (each data point is an x,yx, y pair)
f(x)f(x) is the function which describes our line of best fit

Since we want to fit a straight line, we can write f(x)f(x) as:

f(x)=ax+bf(x) = ax + b

Substituting into above:

err=i=1n(yi(axi+b))2err = \sum_{i = 1}^{n} (y_i - (ax_i + b))^2

How do we find the minimum of this error function? We use the derivative. If we can differentiate errerr, we have an equation for the slope. We know that the slope will be 0 when the error is at a minimum.

Because we are solving for two unknowns, aa and bb, we have to take the derivative of both separately:

erra=2i=1nxi(yiaxib)=0\frac{\partial err}{\partial a} = -2 \sum_{i=1}^{n} x_i(y_i - ax_i - b) = 0 errb=2i=1n(yiaxib)=0\frac{\partial err}{\partial b} = -2 \sum_{i=1}^{n} (y_i - ax_i - b) = 0

We now have two equations and two unknowns, we can solve this! Lets re-write the equations in the form C1a+C2b=C3C_1 a + C_2 b = C_3:

(xi2)a+(xi)b=xiyi(xi)a+(n)b=yi\begin{align} & (\sum x_i^2) a & + (\sum x_i) b &= \sum x_i y_i \\ & (\sum x_i) a & + (n) b &= \sum y_i \end{align}

We will put this into matrix form so we can easily solve it:

[xi2xixin][ab]=[xiyiyi]\begin{bmatrix} \sum x_i^2 & \sum x_i \\ \sum x_i & n \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} \sum x_i y_i \\ \sum y_i \end{bmatrix}

We solve this by re-arranging which involves taking the inverse of x):

x=A1B\mathbf{x} = \mathbf{A^{-1}} \mathbf{B}

Thus a linear curve of best fit is:

y=x[0]x+x[1]y = x[0] x + x[1]

See https://github.com/gbmhunter/BlogAssets/tree/master/Mathematics/CurveFitting/linear for Python code which performs these calculations.

Worked Example