A definition for outliers

What exactly are outliers, and how are they defined in eye-tracking.

Data outliers, defined as observations that deviate significantly from the general pattern of a dataset, have been a subject of great interest in various fields, including statistics, data mining, and machine learning. Detecting and analyzing outliers plays a critical role in understanding data quality, identifying unusual phenomena, and ensuring accurate and reliable results in data-driven applications.

What is an outlier, exactly?

A data outlier refers to an observation or data point that significantly deviates from the expected or typical pattern exhibited by the rest of the dataset. Outliers can manifest as extreme values, anomalies, or irregularities that do not conform to the overall distribution or trends observed in the data. These exceptional observations possess distinct characteristics and have the potential to impact statistical analyses, modeling processes, and data-driven decision-making.

Reasons outliers occur

Outliers can occur due to various reasons, including measurement errors, data entry mistakes, natural variations, rare events, or genuine anomalies in the phenomenon under study. Their presence poses challenges in data analysis, as they may skew statistical measures, compromise the accuracy of predictive models, and introduce bias in data-driven insights. Therefore, identifying and understanding outliers is crucial for ensuring the validity, reliability, and robustness of data-driven investigations across multiple disciplines.

Detection

Detecting outliers typically involves employing statistical, computational, or machine learning techniques to identify observations that deviate significantly from the expected data distribution or exhibit unusual patterns. These techniques often rely on measures such as statistical distance, clustering algorithms, model residuals, or combinations of multiple indicators to flag potential outliers. Once outliers are identified, further analysis and interpretation are necessary to determine their nature, assess their impact on data analysis, and make informed decisions regarding their treatment, such as removal, transformation, or special handling in specific contexts.

Data outliers represent exceptional observations that deviate significantly from the regular patterns or distributions observed in a dataset. They have the potential to distort statistical analyses, impact modeling outcomes, and influence data-driven decision-making. Detecting, analyzing, and appropriately addressing outliers are crucial steps in ensuring data quality, maintaining the integrity of analytical processes, and obtaining accurate insights from datasets.

Abstract

Outlier scanpaths identification is a crucial preliminary step in designing visual software, digital media analysis, radiology training and clustering participants in eye-tracking experiments. However, the task is challenging due to the visual irregularity of the scanpath shapes and the difficulty in dimensionality reduction due to geometric complexity. Conventional approaches have used heat maps to exclude scanpaths that lack a similarity pattern. However, the typically-used packages, such as ScanMatch and MultiMatch often generate discordant results when outlier identification is done empirically. This paper introduces a novel outlier evaluation approach by integrating the fractal dimension (FD), capturing the geometrical complexity of patterns, as an additional parameter with the heat map.

This additional parameter is used to evaluate the degree of influence of a scanpath within a dataset. More specifically, the 2D Cartesian coordinates of a scanpath are fitted to a space filling 1D fractal curve to characterise its temporal FD. The FDs of the scanpaths are then compared to match their geometric complexity to one another. The findings indicate that the FD can be a beneficial additional parameter when evaluating the candidacy of poorly matching scanpaths as outliers and performs better at identifying unusual scanpaths than using other methods, including scanpath matching, Jaccard, or bounding box methods alone.

To view a published paper on this topic, please see:
Newport RA, Russo C, Al Suman A, Di Ieva A. Assessment of eye-tracking scanpath outliers using fractal geometry.
Heliyon. 2021 Jul;7(7):e07616.
‌https://pubmed.ncbi.nlm.nih.gov/34368482/