To calculate the mean for this data set we simply:
Sum = 222 + 120 + 222 + 176 + 473 + 538 + 456 = 2207 ppb lead
Avg = 2315 / 7 = 315
To calculate the median of these numbers, we can:
120
176
222
222
456
473
538
120
176
222
222 <--- the median
456
473
538
So the median for this data set is 222 ppb lead
To calculate the mode of this data, we can:
#     Occurences
------------------
120     1
176     1
222     2
456     1
473     1
538     1
#     Occurences
------------------
120     1
176     1
222     2 <------ the mode
456     1
473     1
538     1
So the mode for this data set is the same as the median: 222 ppb lead
Which measure seems the most appropriate for evaluating a public safety risk? Each measure has its advantages. The mode tells which value is most frequent, which is certainly important for accessing risk, since this is the value most commonly encountered. The median is useful to, since it gives an estimate for the 'middlemost' value that is not overly-influenced by extreme values. The median and the mode for this data set are the same (222 parts per billion of lead) and are quite lower than the mean (315 ppb lead). We might be tempted to believe that some extreme high values contributed to the difference between the median and mean values. Look again at the time-series of values:
Year Lead (in parts per billion)
---- ---------------------------
1995 222
1996 120
1997 222
1998 176
1999 473
2000 538
2001 456
Notice how all the higher values come in the later years? This does seem to suggest (not prove!) that there has been a shift in the levels of lead since 1999. This apparent shift could be just that - mere appearance - and the diffence between the pre and post plant values could simply be a random occurence. One way of thinking quantitatively about these differences is to use the variance and the standard deviation to determine how large a typical difference might be. We'll do this in the next section:
To calculate the variance of this data, we can:
222 - 315 = -93
120 - 315 = -195
222 - 315 = -93
176 - 315 = -139
473 - 315 = 158
538 - 315 = 223
456 - 315 = 141
-93 * -93 = 8649
-195 * -195 = 38025
-93 * -93 = 8649
-139 * -139 = 19321
158 * 158 = 24964
223 * 223 = 49729
141 * 141 = 19881
Sum = 8649 + 38025 + 8649 + 19321 + 24964 + 49729 + 19881 = 169218 ppb lead
Variance = 169218 / (7-1) = 28203 ppb lead
The variance can be difficult to interpret, especially since is units will be the square of the original variable's units. A more easily understood statistic is the standard deviation which we can calculate by:
Standard Deviation = sqrt(28203) = 168 ppb lead
Hmmmmmmm. The mean lead level for the first four years was (222+120+222+176)/4 = 185, and the mean lead level for the last 3 years (after the plant was in operation) was (473+538+456)/3 = 489. So the amount of lead in the years following the building of the plant was more than 2 and a half times it's pre-1999 levels. Furthermore, the size of the jump between pre and post plant means was 489 - 185 = 304 parts per billion of lead, or almost 2 whole standard deviations. This is unlikely to occur totally by chance, and we suggest that further reasearch into the Romullus and Remus lead pipe factory is in order.