Regressions

Overview

This demo allows you to place a set of data points \((x_i, y_i)\) on a coordinate plane and compute a regression model that best fits the data. A regression model is a function that approximates the relationship between variables by minimizing the total error between the observed values and the predicted values.

Linear Regression (Detailed)

A linear regression model has the form:

\( y = mx + b \)

where \(m\) is the slope and \(b\) is the y-intercept. These values are chosen using the method of least squares, which minimizes the sum of squared residuals:

\( \text{SS}_{res} = \sum (y_i - \hat{y}_i)^2 \)

To compute \(m\) and \(b\), we first calculate the following summations from the data:

\( n = \text{number of points} \)

\( \sum x,\ \sum y,\ \sum x^2,\ \sum xy \)

The slope is given by:

\( m = \frac{n\sum xy - (\sum x)(\sum y)}{n\sum x^2 - (\sum x)^2} \)

The y-intercept is:

\( b = \frac{\sum y - m\sum x}{n} \)

This produces the line of best fit, which minimizes the total squared vertical distance between the data points and the line.

Quadratic Regression

A quadratic regression model has the form:

\( y = ax^2 + bx + c \)

The coefficients \(a\), \(b\), and \(c\) are found by solving the following system of equations derived from minimizing squared error:

\( \sum y = a\sum x^2 + b\sum x + cn \)

\( \sum xy = a\sum x^3 + b\sum x^2 + c\sum x \)

\( \sum x^2y = a\sum x^4 + b\sum x^3 + c\sum x^2 \)

These equations are constructed from the dataset and solved simultaneously to determine the best-fit parabola.

Exponential Regression

An exponential regression model has the form:

\( y = ae^{bx} \)

This model is transformed into a linear form by taking the natural logarithm of both sides:

\( \ln(y) = \ln(a) + bx \)

Let \(Y = \ln(y)\). Then:

\( Y = bx + \ln(a) \)

This allows us to apply linear regression to the transformed data \((x, \ln(y))\) to compute \(b\) and \(\ln(a)\), and then recover:

\( a = e^{\ln(a)} \)

Goodness of Fit

The quality of a regression is measured using the coefficient of determination:

\( R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} \)

where \( \bar{y} \) is the mean of the observed values. An \(R^2\) value closer to 1 indicates a better fit.

Usage Notes

Click on the graph to place up to 10 data points. Points snap to the nearest tenth. Exponential regression requires all \(y\)-values to be positive due to the logarithmic transformation.