e-Mathematics > Probability and Statistics [Admin]
[Login] for 3470-001 student.

Exploratory Data Analysis

The weights in grams of 40 miniature Baby Ruth candy bars, with the weights ordered, are prepared as the data set

candy.txt

Then the function scan() is used to read it into the variable X. To see the data inside, simply type the variable name.

X <- scan(file.choose())
X

Histogram. With the class intervals specified by breaks we can generate a histogram by

hist(X, col="blue", freq=FALSE,
     breaks=c(20.45, 21.35, 22.25, 23.15, 24.05, 24.95, 25.85, 26.75),
     main="Candy Bar Weights", xlab="Weights in grams")
Each interval $ (a,b]$ is left-open, covering the data points between $ a$ and $ b$, but excluding the data point which is exactly the value $ a$.

Programming Note: When “freq=FALSE” is omitted in hist(), the y-axis of histogram becomes the frequency of data in intervals. You may choose colored bars by using the optional argument “col="blue"”, and change a title for the figure as in “main="Candy Bar weights".”

hist(X, col="blue",
     breaks=c(20.45, 21.35, 22.25, 23.15, 24.05, 24.95, 25.85, 26.75))

Programming Note: A sequence such as class intervals can be generated by seq(). In the above example, the sequence for breaks can be created by

seq(20.45, 26.75, by=0.9)
Thus, the histogram can be called with seq() as follows.
hist(X, col="blue", freq=FALSE,
     breaks=seq(20.45, 26.75, by=0.9),
     main="Candy Bar Weights", xlab="Weights in grams")
It is important to specify the class interval by users. However, if it is omitted, the program code for hist() makes a reasonable choice for the class intervals. Try hist() without breaks at all as follows:
hist(X, col="blue", freq=FALSE,
     main="Candy Bar Weights", xlab="Weights in grams")
The resulting histogram may not be the best representation of data.

Stem and leaf plot. stem is used to draw the stem and leaf plot for variable.

stem(X)

Boxplot. boxplot is used to draw the boxplot for variable.

boxplot(X, horizontal=TRUE, col='green')

Programming Note: If we omit “horizontal=TRUE” then we get a boxplot vertically.

Sample R code. You can download explore.R, and run it.


© TTU Mathematics