Summary Statistics
A data set consists of either single column or multiple columns. Each column is identified with the variable name at the top, followed by the sample data. The corresponding sample size may vary with the choice of column if the column contains blank entries (or NA's).
The column data (indicated at the pull-down menu at the box on the left)

Mean.
The sample mean
=
is namely the “average” value of the observations.
Extreme values, often considered as outliers,
affect the sample mean.
Standard deviation (SD).
The sample variance
is the average squared deviation of each observed value from the sample mean
.
The square root of the sample variance
=
is referred as the sample standard deviation,
indicating the “scatteredness” of the data.
When the shape of sample distribution is symmetric and unimodal
the following “empirical rule” applies:
Approximated by a
normal density function,
68%, 95%, and 99.7% of data fall within the
interval
,
and
,
respectively.
Coefficient of variation (CV).
CV
can be used to compare the variability in a different unit of measurement.
Median.
The sample median is the value of the “middle” data point.
When the size is an odd number, the median is simply the middle value;
for example, the median of “2, 4, and 7” is 4.
When we have the data with even number
of the size,
the median is the mean of the two middle values.
Thus, the median of the numbers “2, 4, 7, 12” is (4+7)/2 = 5.5.
The sample median is known to be
less affected by extreme measurements in comparison to the mean.
Quartiles. The 25th sample percentile is the value indicating that 25% of the observations takes values smaller than this one. Similarly, we can define 50th percentile, 75th percentile, and so on. Note that 50th percentile is the median. We call 25th percentile the first quartile (Q1) and 75th percentile the third quartile (Q3). The interquartile range (IQR) is then defined as the difference
IQR = Q3 - Q1 =
between them.
© TTU Mathematics