*by Antony Unwin, **University of Augsburg, Germany*

There are many different methods for identifying outliers and a lot of them are available in **R**. But are outliers a matter of opinion? Do all methods give the same results?

Articles on outlier methods use a mixture of theory and practice. Theory is all very well, but outliers are outliers because they donâ€™t follow theory. Practice involves testing methods on data, sometimes with data simulated based on theory, better with `realâ€™ datasets. A method can be considered successful if it finds the outliers we all agree on, but do we all agree on which cases are outliers?

The Overview Of Outliers (O3) plot is designed to help compare and understand the results of outlier methods. It is implemented in the **OutliersO3** package and was presented at last yearâ€™s useR! in Brussels. Six methods from other **R** packages are included (and, as usual, thanks are due to the authors for making their functions available in packages).

An O3 plot of the stackloss dataset. There is one row for each variable combination (defined by the columns to the left) for which outliers were found, and one column for each case identified as an outlier (the columns to the right).

The starting point was a recent proposal of Wilkinsonâ€™s, his HDoutliers algorithm. The plot above shows the default O3 plot for this method applied to the stackloss dataset. (Detailed explanations of O3 plots are in the **OutliersO3** vignettes.) The stackloss dataset is a small example (21 cases and 4 variables) and there is an illuminating and entertaining article (Dodge, 1996) that tells you a lot about it.

Wilkinsonâ€™s algorithm finds 6 outliers for the whole dataset (the bottom row of the plot). Overall, for various combinations of variables, 14 of the cases are found to be potential outliers (out of 21!). There are no rows for 11 of the possible 15 combinations of variables because no outliers are found with them. If using a tolerance level of 0.05 seems a little bit lax, using 0.01 finds no outliers at all for any variable combination.

An O3plot comparing outliers identified by HDoutliers and mvBACON in the stackloss dataset.

Trying another method with tolerance level=0.05 (*mvBACON* from **robustX**) identifies 5 outliers, all ones found for more than one variable combination by *HDoutliers*. However, no outliers are found for the whole dataset and only one of the three variable combinations where outliers are found is a combination where *HDoutliers* finds outliers. Of course, the two methods are quite different and it would be strange if they agreed completely. Is it strange that they do not agree more?

There are four other methods available in **OutliersO3** and using all six methods on stackloss a tolerance level of 0.05 identifies the following numbers of outliers:

```
## HDo PCS BAC adjOut DDC MCD
## 14 4 5 0 6 5
```

An O3 plot of stackloss using the methods *HDoutliers*

, `FastPCS`

, `mvBACON`

, `adjOutlyingness`

, `DectectDeviatingCells`

, `covMCD`

. The darker the cell, the more methods agree. If they all agree, the cell is coloured red and if all but one agree then orange. No case is identified by all the methods as an outlier for any combination of variables when the tolerance level is set at 0.05 for all.

Each method uses what I have called the tolerance level in a rather different way. Sometimes it is called alpha and sometimes (1-alpha). As so often with **R**, you start wondering if more consistency would not be out of place, even at the expense of a little individuality. **OutliersO3** transforms where necessary to ensure that lower tolerance level values mean fewer outliers for all methods, but no attempt has been made to calibrate them equivalently. This is probably why `adjOutlyingness`

finds few or no outliers (results of this method are mildly random). The default value, according to `adjOutlyingness`

â€™s help page, is an alpha of 0.25.

The stackloss dataset is an odd dataset and small enough that each individual case can be studied in detail (cf. Dodgeâ€™s paper for just how much detail). However, similar results have been found with other datasets (milk, Election2005, diamonds, â€¦). The main conclusion so far is that different outlier methods identify different numbers of different cases for different combinations of variables as different from the bulk of the data (i.e. as potential outliers)â€”or are these datasets just outlying examples?

There are other outlier methods available in **R** and they will doubtless give yet more different results. The recommendation has to be to proceed with care. Outliers may be interesting in their own right, they may be errors of some kindâ€”and we may not agree whether they are outliers at all.

[Find the R code for generating the above plots here: OutliersUnwin.Rmd]