e-Statistics

Histogram

This graphical presentation consists of bars of varying height which are proportional to the number of data within the corresponding band or bin, and it can be used to describe how the data are distributed. The shape of distribution is often characterized by "peaks" and "tails." Usually a histogram has one peak, and it is called unimodal. When it has two major peaks, it is said to be bimodal, suggesting a possibility of two distinct populations in the data. A symmetric histogram has two symmetric tails on both side, whereas a skewed histogram has a longer tail on one side than that on the other. We describe it right-skewed or left-skewed accordingly as we observe the longer right-hand tail or the longer left-hand tail

The vertical axis of histogram can be Frequency or Density, as indicated by the pull-down menu on the left. The frequency shows the number of observations within a band. When density, also know as "relative frequency," is selected, areas for each bar in the histogram represents the probability that an observation falls in the band. For this reason the density height must be formulated by

$\displaystyle ($density height$\displaystyle ) =
\frac{(\mbox{\it frequency})}{(\mbox{\it data size}) \times (\mbox{\it bandwidth})}.
$

The width of each band (or bin) is called band width (or bin width). If the band width is too large then the graph fails to convey the structure of the data. On the other hand if it is too small then it becomes too spiky.

If the interval between

a = and b =

is also specified, we obtain the frequency (or the proportion) of the observed values within the interval [a,b) by

The interval of choice will be [a,b] when $ b$ is the maximum of data. With the density height at the vertical axis, the area under the histgram in the interval [a,b) represents the proportion of data.


© TTU Mathematics