Document toolboxDocument toolbox

Approximated estimation (estimation)

Description

Computes the approximated estimation of a set of distinct counts in dc data type.

Use the HyperLogLog++ (hllpp) aggregation operation to transform a field into dc data type, needed as input data for this operation. Keep in mind that you must group your data before applying an aggregation operation.

How does it work in the search window?

Select Create field in the search window toolbar, then select the Approximated estimation operation. You need to specify one argument:

Argument

Data type

Argument

Data type

Estimate mandatory

dc

The data type of the values in the new column is float.

Example

In the siem.logtrust.web.activity table, we want to get the approximated estimation of the distinct count values generated from the srcHost field. To do it, we will apply a Filter using the Approximated estimation operation but first, we need to create the required dc field.

Step 1: Create the distinct count field

The first step is creating a field showing the distinct count of the values in the srcHost field, in dc data type. To do it, we group data every 5 minutes and then use the HyperLogLog++ (hllpp) aggregation operation. Select the srcHost field in the Source argument and enter a name for the new field (srcHost_dc).

Step 2: Create a new field using the Approximated estimation operation

Select Create field on the query toolbar, then select Approximated estimation as the operation. Select the clientIpAddress_dc column as argument. Let's call the new column clientIpAddress_estimation.

Click Create field and you will see the following result:

How does it work in LINQ?

Use the operator select... as...  and add the operation syntax to create the new field. This is the syntax for the Approximated estimation operation:

  • estimation(dc)

Example

You can copy the following LINQ script and try the above example on thesiem.logtrust.web.activity table:

from siem.logtrust.web.activity group every 5m every 5m select hllpp(srcHost) as srcHost_dc, estimation(srcHost_dc) as srcHost_approximation