Answers for Unit 39

  1. To calculate the mean for this data set we simply:


  2. To calculate the median of these numbers, we can:


  3. To calculate the mode of this data, we can:


    Which measure seems the most appropriate for evaluating a public safety risk? Each measure has its advantages. The mode tells which value is most frequent, which is certainly important for accessing risk, since this is the value most commonly encountered. The median is useful to, since it gives an estimate for the 'middlemost' value that is not overly-influenced by extreme values. The median and the mode for this data set are the same (222 parts per billion of lead) and are quite lower than the mean (315 ppb lead). We might be tempted to believe that some extreme high values contributed to the difference between the median and mean values. Look again at the time-series of values:

            Year     Lead (in parts per billion)
            ----     ---------------------------
            1995                 222
            1996                 120
            1997                 222
            1998                 176
            1999                 473
            2000                 538
            2001                 456
    

    Notice how all the higher values come in the later years? This does seem to suggest (not prove!) that there has been a shift in the levels of lead since 1999. This apparent shift could be just that - mere appearance - and the diffence between the pre and post plant values could simply be a random occurence. One way of thinking quantitatively about these differences is to use the variance and the standard deviation to determine how large a typical difference might be. We'll do this in the next section:


  4. To calculate the variance of this data, we can:


    The variance can be difficult to interpret, especially since is units will be the square of the original variable's units. A more easily understood statistic is the standard deviation which we can calculate by:


    Hmmmmmmm. The mean lead level for the first four years was (222+120+222+176)/4 = 185, and the mean lead level for the last 3 years (after the plant was in operation) was (473+538+456)/3 = 489. So the amount of lead in the years following the building of the plant was more than 2 and a half times it's pre-1999 levels. Furthermore, the size of the jump between pre and post plant means was 489 - 185 = 304 parts per billion of lead, or almost 2 whole standard deviations. This is unlikely to occur totally by chance, and we suggest that further reasearch into the Romullus and Remus lead pipe factory is in order.