e-Statistics

Model Estimate and Residuals

The multiple linear regression model

$\displaystyle Y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_k x_{ik} + \epsilon_i,
\quad i=1,\ldots,n,
$

must be built by identifying specific variables with (i) the response variable $ Y_i$ and (ii) predictors from $ x_{i1}$ to $x_{ik}$.

  1. In order to start over again, you need to clear the model formula first.
  2. From the above data the column must be selected for the response variable $ Y_i$ (dependent variable).
  3. It builds a model formula for the predictors $ x_{i1}$ up to $x_{ik}$ (independent variables) in a form

    where we set predictor variables one by one for the model. A nonlinear transformation (e.g., log(x) or x^2) of the predictor x can be indicated by placing it in I(). For example, I(log(x)) or I(x^2).

The summary of multiple linear regression is obtained in the table below.

Summary table results. The standard error $ S_i$ for the estimate $ \beta_j$ gives rise to the null hypothesis

$\displaystyle H_0:\: \beta_j = 0
$

for each $ j = 0,\ldots,k$. It can be constructed to find whether the response is dependent of the j-th predictor. Under the null hypothesis the test statistic $ T_j = \displaystyle\frac{\hat{\beta}_j}{S_j}$ is distributed as the t-distribution with $ (n-k-1)$ degrees of freedom. Thus, we reject $ H_0$ at significance level $ \alpha$ if $ \vert T_j\vert > t_{\alpha/2,n-k-1}$. By computing the p-value $ p^*$ we can equivalently reject $ H_0$ if $ p^* < \alpha$.

The prediction equation provides a fitted value

$\displaystyle \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \cdots + \hat{\beta}_2 x_{ik}
$

Then we can plot the fitted value $\hat{y}_i$ against the standardized residuals $ \frac{Y_i - \hat{y}_i}{\hat{\sigma}}$. In model validation we look for a pattern, the indication of which suggests that the regression of choice is not a good model.


© TTU Mathematics