Guide
This demo allows you to place a set of data points \((x_i, y_i)\) on a coordinate plane and compute a regression model that best fits the data. A regression model is a function that approximates the relationship between variables by minimizing the total error between the observed values and the predicted values.
A linear regression model has the form:
\( y = mx + b \)
where \(m\) is the slope and \(b\) is the y-intercept. These values are chosen using the method of least squares, which minimizes the sum of squared residuals:
\( \text{SS}_{res} = \sum (y_i - \hat{y}_i)^2 \)
To compute \(m\) and \(b\), we first calculate the following summations from the data:
\( n = \text{number of points} \)
\( \sum x,\ \sum y,\ \sum x^2,\ \sum xy \)
The slope is given by:
\( m = \frac{n\sum xy - (\sum x)(\sum y)}{n\sum x^2 - (\sum x)^2} \)
The y-intercept is:
\( b = \frac{\sum y - m\sum x}{n} \)
This produces the line of best fit, which minimizes the total squared vertical distance between the data points and the line.
A quadratic regression model has the form:
\( y = ax^2 + bx + c \)
The coefficients \(a\), \(b\), and \(c\) are found by solving the following system of equations derived from minimizing squared error:
\( \sum y = a\sum x^2 + b\sum x + cn \)
\( \sum xy = a\sum x^3 + b\sum x^2 + c\sum x \)
\( \sum x^2y = a\sum x^4 + b\sum x^3 + c\sum x^2 \)
These equations are constructed from the dataset and solved simultaneously to determine the best-fit parabola.
An exponential regression model has the form:
\( y = ae^{bx} \)
This model is transformed into a linear form by taking the natural logarithm of both sides:
\( \ln(y) = \ln(a) + bx \)
Let \(Y = \ln(y)\). Then:
\( Y = bx + \ln(a) \)
This allows us to apply linear regression to the transformed data \((x, \ln(y))\) to compute \(b\) and \(\ln(a)\), and then recover:
\( a = e^{\ln(a)} \)
The quality of a regression is measured using the coefficient of determination:
\( R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} \)
where \( \bar{y} \) is the mean of the observed values. An \(R^2\) value closer to 1 indicates a better fit.
Click on the graph to place up to 10 data points. Points snap to the nearest tenth. Exponential regression requires all \(y\)-values to be positive due to the logarithmic transformation.