e-Statistics

Model Estimate and Residuals

The multiple linear regression model

$\displaystyle Y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_k x_{ik} + \epsilon_i, \quad i=1,\ldots,n,$

must be built by identifying specific variables with (i) the response variable

and (ii) predictors from $x_{i1}$ to $x_{ik}$ .

In order to start over again, you need to clear the model formula first.
From the above data the column must be selected for the response variable (dependent variable).
It builds a model formula for the predictors $x_{i1}$ up to $x_{ik}$ (independent variables) in a form

where we set predictor variables one by one for the model. A nonlinear transformation (e.g., log(x) or x^2) of the predictor x can be indicated by placing it in I(). For example, I(log(x)) or I(x^2).

The summary of multiple linear regression is obtained in the table below.

Summary table results. The standard error for the estimate $\beta_j$ gives rise to the null hypothesis

$\displaystyle H_0:\: \beta_j = 0$

for each $j = 0,\ldots,k$ . It can be constructed to find whether the response is dependent of the j-th predictor. Under the null hypothesis the test statistic $T_j = \displaystyle\frac{\hat{\beta}_j}{S_j}$ is distributed as the t-distribution with

degrees of freedom. Thus, we reject

at significance level $\alpha$ if $\vert T_j\vert > t_{\alpha/2,n-k-1}$ . By computing the p-value

we can equivalently reject

if $p^* < \alpha$ .

The prediction equation provides a fitted value

$\displaystyle \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \cdots + \hat{\beta}_2 x_{ik}$

Then we can plot the fitted value $\hat{y}_i$ against the standardized residuals $\frac{Y_i - \hat{y}_i}{\hat{\sigma}}$ . In model validation we look for a pattern, the indication of which suggests that the regression of choice is not a good model.