Indicators
This unit is a Processor type unit.
The ML Indicator unit estimates statistical indicators incrementally, designed to accept a large amount of data for the statistics.
The estimation is based on the use of a sliding window to store the state of the statistical indicators. The sliding window is like a circular cache where every slot is related to the indicator state for the aggregation period (for example 1d) and the number of slots corresponds to the indicator state for the whole training period (for example 7d).
Only the indicators that have an output name assigned will be calculated.
When an indicator cannot be calculated because there is not enough data or the operation cannot be resolved (ex. division by zero), it will take a null value.
When using a key field, the indicators will be calculated separately for every value found. Every key value will have its own training period.
When a key has no data for one whole cycle of the sliding window, if it appears again it will have to go under a new training period. If no key field is defined, the same applies to the event flow: if there are no events for a whole sliding window period, a new training period will start again when the next event appears.
Once an event has been detected via the 'in' or 'train' port, the aggregated indicators and the sliding window are updated.
For events coming from the 'train' port, the indicator estimation output is written to the 'train' out port.
For events coming from the 'in' port, the indicator estimation output is written to the 'train' out port during the training stage (while building the sliding window) and after that to the 'out' port.
If an error occurs, the event is enriched with new fields describing the problem, and the event is sent through the error port.
Configuration
After dragging this unit into the Flow canvas, double-click it to access its configuration options. The following table describes the configuration options of this unit:
Tab | Field | Description |
---|---|---|
General | Name | Enter a name for the unit. It must start with a letter, and cannot contain spaces. Only letters, numbers, and underscores are allowed. |
Description | Enter a description for the unit to describe the scope and use of the unit. | |
Timestamp field | Enter the name of an event field that contains the timestamp. | |
Key field | Enter the name of an event field that contains the keys to group sliding windows by optional | |
Feature field | Enter the name of an event field that contains the numeric value used to compute the statistics, which must be a double. | |
Period of aggregation | Select the purge size of indicators. This is how often to detect and generate indicators before being added to the sliding window. | |
Training period | Select the size of the sliding window. This is the duration of time events are stored before being trained (purge size). This must be a multiple of the aggregation period. | |
Mean | Enter the name for the output field containing the mean, to be added to the output event optional | |
Standard deviation | Enter the name for the output field containing the standard deviation, to be added to the output event optional | |
Z-score | Enter the name for the output field containing the z-score, to be added to the output event optional | |
Z-score modified | Enter the name for the output field containing the z-score modified, to be added to the output event optional | |
Skewness | Enter the name for the output field containing the skewness, to be added to the output event optional | |
Kurtosis | Enter the name for the output field containing the kurtosis, to be added to the output event optional | |
Median | Enter the name for the output field containing the median, to be added to the output event optional | |
IQR outlier | Enter the name for the output field containing the IQR-based outlier detection, to be added to the output event optional |
Input ports
Port | Description |
---|---|
reset | Events that enter through this port reset the internal accumulator to the initial value in the configuration. |
in | If new events enter through this port, the selected statistical indicators are evaluated and then the new value is added to the statistics to update their value. |
train | Events that enter through this port will be sent to the train out port. We recommend using the in port. |
Output ports
Port | Description |
---|---|
out | Outputs events when the sliding window is complete with all indicators. |
train | Outputs individual events with partial results for the selected statistic indicators (as opposed to indicators calculated with full sliding windows). |
error | Outputs input events that were not successfully acknowledged, enriched with standard error fields. |
Example
Imagine you trained data to show the amount of connections made every 5 minutes, grouped by reference domain, and you wish to calculate statistical indicators for each domain, such as average, standard deviation, etc.
First, add a Generator unit to start the Flow.
Next, we will collect past data to train the model and use them to fill the Indicator unit with the statistics.
We will use a Map unit to get values every 5 minutes in a window of 7 days.
Then, we will use a Devo Full Query unit to fill in the slots of the 7 day window, providing enough data to train the Indicator unit.
Link the init port of the Devo Full Query unit to the reset port of the Indicator to reset the values. Next, link the out port to the train port of the Indicator unit to send the training data.
Once this training has been completed, we can get the new data using a Map and Scheduler that collect data the last 5 minutes and calculate the new statistic values (average, standard deviation, etc), executing it every 5 minutes.
We will send this data to a Devo Full Query unit. We will link this to the in port of the Indicator unit to fill it with the new statistic values.
Then, we will send this data to a my.app data table using a Devo Sink unit linked to the out port of the Indicator unit.
Download this example
Try this example flow by downloading the following JSON file and uploading it to your domain using the Import option.
In order for this Flow to work, you must have previously trained the corresponding model.