Data classification II

In general, the classification method chosen should accomplish two things:

  1. maximize the between-class differences
  2. minimize the within-class differences In some case, quantile and equal interval methods meet these criteria, but often the arbitrary nature of these methods is inferior to those mentioned here.

    The most straightforward way of accomplishing the criteria above is to input the class breaks manually after careful study of the data. For ordered data, this is done by sorting the data from lowest value to highest, as done with the "percent living on active farms" data, and looking for large gaps, or natural breaks. It is here that the class breaks should be placed in order to meet the two criteria above. Some advanced GIS packages will do this procedure for you automatically, using a technique called Jenks Optimization, which will iteratively calculate best classification strategy.

    One other technique worth noting is nested means classification. In this method, the mathematical mean of the attribute values, m, is calculated and a class break is placed at m. This separates the data into two classes -- those values above and those values below the mean. Data is further classified by calculating the means of the values within these two categories, and inserting class breaks at each of those two points. This leaves four classes, the breaks for which have been determined not arbitrarily but by methods specific to the data set and its characteristics. One more level of means may be calculated, leaving eight classes. This nested means method is mathematically straightforward and can be a good compromise for skewed data.

    Compare natural breaks and nested means