Approximated estimation (estimation)

Description

Computes the approximated estimation of a set of distinct counts in dc data type.

Use the HyperLogLog++ (hllpp) aggregation operation to transform a field into dc data type, needed as input data for this operation. Keep in mind that you must group your data before applying an aggregation operation.

How does it work in the search window?

Select Create column in the search window toolbar, then select the Approximated estimation operation. You need to specify one argument:

Argument

Data type

Argument

Data type

Estimate mandatory

dc

The data type of the values in the new column is float.

Example

In the demo.ecommerce.data table, we want to get the approximated estimation of the distinct count values generated from the clientIpAddress column. To do it, we will apply a Filter using the Approximated estimation operation but first, we need to create the required dc column.

Step 1: Create the distinct count column

The first step is creating a column showing the distinct count of the values in the clientIpAddress column, in dc data type. To do it, we group data every 5 minutes and then use the HyperLogLog++ (hllpp) aggregation operation. Select the clientIpAddress column in the Source argument and enter a name for the new column (clientIpAddress_dc).

Step 2: Create a new column using the Approximated estimation operation

Select Create column on the query toolbar, then select Approximated estimation as the operation. Select the clientIpAddress_dc column as argument. Let's call the new column clientIpAddress_estimation

Click Create column and you will see the following result:

How does it work in LINQ?

Use the operator select... as...  and add the operation syntax to create the new column. This is the syntax for the Approximated estimation operation:

  • estimation(dc)

Example

You can copy the following LINQ script and try the above example on the demo.ecommerce.data table:

from demo.ecommerce.data group every 5m every 5m select hllpp(clientIpAddress) as clientIpAddress_dc, estimation(clientIpAddress_dc) as clientIpAddress_estimation