Exploratory Data Analysis
The weights in grams of 40 miniature Baby Ruth candy bars, with the weights ordered, are prepared as the data set
Then the function scan() is used to read it into the variable X. To see the data inside, simply type the variable name.
X <- scan(file.choose()) X
Histogram. With the class intervals specified by breaks we can generate a histogram by
hist(X, col="blue", freq=FALSE, breaks=c(20.45, 21.35, 22.25, 23.15, 24.05, 24.95, 25.85, 26.75), main="Candy Bar Weights", xlab="Weights in grams")Each interval
![$ (a,b]$](img69.png)



Programming Note: When “freq=FALSE” is omitted in hist(), the y-axis of histogram becomes the frequency of data in intervals. You may choose colored bars by using the optional argument “col="blue"”, and change a title for the figure as in “main="Candy Bar weights".”
hist(X, col="blue", breaks=c(20.45, 21.35, 22.25, 23.15, 24.05, 24.95, 25.85, 26.75))
Programming Note: A sequence such as class intervals can be generated by seq(). In the above example, the sequence for breaks can be created by
seq(20.45, 26.75, by=0.9)Thus, the histogram can be called with seq() as follows.
hist(X, col="blue", freq=FALSE, breaks=seq(20.45, 26.75, by=0.9), main="Candy Bar Weights", xlab="Weights in grams")It is important to specify the class interval by users. However, if it is omitted, the program code for hist() makes a reasonable choice for the class intervals. Try hist() without breaks at all as follows:
hist(X, col="blue", freq=FALSE, main="Candy Bar Weights", xlab="Weights in grams")The resulting histogram may not be the best representation of data.
Stem and leaf plot. stem is used to draw the stem and leaf plot for variable.
stem(X)
Boxplot. boxplot is used to draw the boxplot for variable.
boxplot(X, horizontal=TRUE, col='green')
Programming Note: If we omit “horizontal=TRUE” then we get a boxplot vertically.
Sample R code. You can download explore.R, and run it.
© TTU Mathematics