Document toolboxDocument toolbox

HyperLogLog++ Count Estimation (hllppcount)

Description

Applies the HyperLogLog++ algorithm to a set of data, which is used to calculate the estimated count of distinct elements for each grouping occurrence. The format of the aggregated values is float.

This operation returns the same results as the HyperLogLog++ operation, the only difference is the output data type.

How does it work in the search window?

Before being able to perform this operation, you have to group your data. Be aware that the columns used as arguments for the grouping operation will not be available to select as arguments for the aggregation operation.

After grouping the data, select Aggregation in the search window toolbar, then select the HyperLogLog++ Count Estimation operation. You need to specify one argument:

Argument

Data type

Argument

Data type

Source mandatory

Any

The data type of the aggregated values is float.

Example

In the siem.logtrust.web.activity table, we want to calculate the count of distinct elements in the bytesTransferred records each 5-minute period. Before aggregating the data, the table must be grouped in 5-minute intervals. Then we will perform the aggregation using the Hyperloglog++ count estimation operation.

The arguments needed for the Hyperloglog++ count estimation operation are:

  • Source → bytesTransferred column

Click Aggregate function and you will see the following result:

How does it work in LINQ?

Group your data using the following structure:

  • group every server period by column1, column2...
    every client period

Then, use select... as...  to add the new column that will show the aggregated values. This is the syntax for the HyperLogLog++ Count Estimation operation:

  • hllppcount(column)

See Build a query using LINQ to learn more about grouping and aggregating your data using the LINQ language.

Example

You can copy the following LINQ script and try the example above on the siem.logtrust.web.activity table:

from siem.logtrust.web.activity group every 5m every 5m select hllppcount(responseTime) as hllpp_Count_estimation