ADTK: Open-Source Time Series Anomaly Detection in Python
We recently released the open-source version of Anomaly Detection Toolkit and hope it will promote best practices in solving real-world anomaly detection problems.
Arundo was founded to solve industrial IoT challenges. A significant portion of the data collected by an industrial IoT system is time series data. As our data scientists create models that address challenging problems in industrial operations, anomaly detection in time series is one of the most common ones. From various customer projects, we have accumulated broad experiences on how to create effective and practical anomaly detection models.
WHAT IS ADTK?
Recently, we released the open-source version of ADTK (Anomaly Detection Toolkit), a Python toolkit which our data science team originally developed based on our internal experiences. There are many existing open-source packages for time series anomaly detection. However, most of them focus on implementing cutting edge algorithms. On the contrary, ADTK is a package that enables practitioners to implement pragmatic models conveniently, from the simplest methods like thresholding to complicated machine learning-based approaches. We hope ADTK will promote best practices in solving real-world anomaly detection problems.
THE WHY
ADTK helps with building rule-based/unsupervised models. There are two reasons behind it. First, anomalous events are usually extremely rare in historical data in a real-world problem, therefore a supervised model trained with them is hardly robust, since it relies on those limited observations. Secondly, most operators who want to detect anomalies from their industrial system have a fairly good understanding of what types of events are of interest. In other words, they “know” the rules for identifying those events. How to translate those (sometimes vague) rules into a programmable version is the key target of ADTK.
For example, some parts of a power generator could experience overheating, which may indicate the degradation of some key components. If the critical temperature is known based on some engineering knowledge of the equipment, a simple thresholding model would be an effective rule. However, equipment temperature could be correlated with ambient temperature, which introduces a yearly and daily seasonality and makes absolute thresholds ineffective.
Therefore, a component to eliminate seasonal patterns is necessary to chain with a thresholding component in the model. Sometimes, the rate of change or the shift of fluctuation could be more revealing than the temperature value itself. Then we need to transform the original time series into those KPIs before connecting to a time-independent outlier detection component.
More advanced, the temperature may be correlated with other measurements, e.g. pressure, vibration, etc. An unsupervised clustering method could be applied to identify anomalous combinations of those measurements that jointly imply equipment degradation. Finally, all rule chains should be ensembled properly in order to raise alerts accurately in real time.
BUILDING BLOCKS OF AN ANOMALY DETECTION TOOLKIT
The three types of building blocks mentioned in the previous example, outlier detector, time series transformer, and rule chain aggregator, are the key elements of ADTK. ADTK provides not only a set of detectors, transformers, and aggregators with a unified API, but also pipeline and pipenet to connect those building blocks into a sequential or non-sequential structure. This design maximizes the reusability of models. ADTK also offers a set of tools to process and visualize time series as well as time-dependent events, which we found convenient in our internal usage. Some examples can be found here.
SUMMARY
Arundo believes the release of ADTK will help data scientists and engineers working on time series anomaly detection in industrial sectors with their own problems. We also hope more members in the open-source community may join us and contribute to the package to make it better.