Please read and follow these instructions in order to try these past workshops on your own.
Linear regression in depth
Much of this I wrote about in a blog post
when I wanted to get a better understanding of linear regression. Check out the
video to hear more on it.
So, the simple formula for linear regression is:
Written another way, with common alphabetic letters is:
In the case of multiple linear regression, it is expanded to:
In R, this is done using lm. First, lets get a dataset and see the contents of it.
Here, we run the linear regression and see the results.
If we convert this into the equation, it is:
Let’s substitute in the numbers:
The above is for a simple, two variable linear regression. What about if we have
multiple predictor ($x$) variables? We do:
Converted to the formula:
Nearly all statistical techniques (at least the basic/common ones) can be
considered a special case of linear regression. Let’s take ANOVA as an example.
ANOVA is where the $y$ is a continuous variable and the $x$ is a discrete
variable (usually with at least three categories). In my opinion, using linear
regression is more useful from a scientific and interpretation perspective than
using ANOVA. Here’s how you do it:
Converted to an equation. Setosa isn’t here because it is considered the
‘intercept’ since if you set Versicolor and Virginica to zero, the remaining
‘category’ is Setosa. These are known as dummy variables.
The estimates from the linear regression as the same as the means of each group
(each estimate + the intercept).
Linear regression and correlation are also tightly linked, and are the same if
the variables have been scaled (mean-centered and standardized).