R-Implementation of the "Averaged Difference"-Algorithm for Spatial Outlier Detection conceived by Yufeng Kou and Chang-Tien Lu in the scientific paper "Spatial Weighted Outlier Detection" from 2006. The algorithm is suitable for the detection of point observations with distinct features from their surrounding neighbors.
The algorithm is demonstrated by the means of agricultural yield data and is generally suitable especially for use in the context of Precision Farming.
📍 disy Informationssysteme GmbH. https://www.disy.net/de/
🌱 iFAROS. https://www.ifaros-ictagri.com/
🔧 sp-package, for geometry types
🔧 data.table-package, as faster alternative for base::data.frame
🔧 FNN-package, for k-nearest-neighbor search algorithm
- Input: SpatialDataPointsDataFrame, georeferenced point data with attribute(s)
- Input: k, number of neighbours taken into account (as in k-Nearest-Neighbor)
- Output: data.table, containing index and the corresponding averaged difference in decreasing order
The function returns a list (data.table) with points indices and the averaged difference of the respective point. The data.table allows for the deletion of the top n outliers by their indices. The actual number of outliers to be deleted can be freely chosen by the user.
For the example shown, 1.200 points (of ~ 8.000 points) were deleted. The nearest neighbors considered (k) was (arbitrarily) set to 355. Chosen parameter values should orient on the absolute amount of data points and the "severity" of the visible measurement errors. Global outliers can be obtained for larger neighborhoods, while smaller neighborhoods are especially suitable to identify local outliers on a smaller spatial scale.
Execution time for 8.000 point observations: ~3 sec.
1 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.9899&rep=rep1&type=pdf