Probability and Statistics

Summary Statistics

To examine the example of lecture note No.6, download the data set at sample.txt . For case studies in assignment No.6 (computer project), see Quiz & Assignment.

Use the following command to scan the data set into a variable "X":

X <- scan(file.choose())

When prompted to open a file, choose the file name you have created for the data set.

Programming Note: file.choose() chooses the file interactively inside the scan(). The message “Read 75 items” confirms the sample size of data set. To find the data size later, use the length() command.

R provides various functions for sample statistics. The most important values concerning the data are the sample size (length), the sample mean (mean), the sample variance (var), and the sample standard deviation (sd). Other important quantities such as median and quartiles can be obtained by summary.

length(X)
mean(X)
var(X)
sd(X)
summary(X)

The function summary() produces rounded values for the minimum (Min.), the lower quartile (1st Qu.), the median (Median), the mean (Mean), the upper quartile (3rd Qu.), and the maximum (Max.) In order to increase the precision we need to specify the significant digits as in the following example:

summary(X, digits=5)

How to manipulate a summary table. You can write summary into the csv file "summary.csv" by

S = summary(X);
write(S, file="summary.csv", ncolumn=6, sep=",")

and read it into a spreadsheet program. Or, individual values in the list S of summary can be accessible by S[[1]], S[[2]], and so on. For example, IQR can be obtained by

IQR = S[[5]] - S[[2]]

Programming Note: The argument “ncolumn=6” specifies the number of columns to be 6 in the file.

Sample R code. You can download summary.R, and run it. For Problem 2 of assignment No.6, we can set the data as a vector of constant values, and store it directly into the variable X:

X <- c(21, 26, 32, 27, 23, 39, 30, 24, 28, 36)