Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Group events based on similar content in text fields.

Assume the input includes millions of logs, and you want to group events into few groups. The group by operator in LQL groups events based on an exact match of a text or values. With the LQL operator, 'system was shutdown' and 'system shut down' would be in different groups because the text match is not exact. With the formCluster operator, both events would be placed in the same group.

Operator usage in easy mode

  1. Click + on the parent node.

  2. Enter Form Clusters operator in the search field and select the operator from the Results to open the operator form.

  3. In the Table drop-down, enter or select the table to create clusters in the TABLE field.

  4. In the Fields for Clustering field, enter the list of column names to apply the clustering algorithm on.

  5. In the Number of Clusters field, enter the number of clusters that you wish to add.

  6. Click Run to view the result.

  7. Click Save to add the operator to the playbook.

  8. Click Cancel to discard the operator form.

Usage Details

LQL Command

formClusters(table, fieldsForClustering, numberOfClusters, fieldsForGrouping)

Input:

This is a clustering operator. It takes:
fieldsForClustering (String[]) - to apply clustering on
numberOfClusters - number of clusters to create
fieldsForGrouping (String*) - optional grouping, enforcing same fieldsForGrouping values will appear only in one cluster.

Output:

lhub_cluster_id: Group id. This operator splits the data into multiple groups and then assigns a numeric ID to each group, where the assigned group IDs are "cluster_1", "cluster_2", and so on.

Example

Input
table = github_logs

id

address

1

154 E. Dana st, Mountain View, CA, USA

2

154 Data, Mountain View, CA, 94041

3

Somewhere in Houston, TX

4

14054 San ramon st, Houston, TX, 77077

LQL command

formClusters(table, ["address"], 2)

Output

id

address

lhub_cluster_id

1

154 E. Dana st, Mountain View, CA, USA

cluster_1

2

154 Data, Mountain View, CA, 94041

cluster_1

3

Somewhere in Houston, TX

cluster_0

4

14054 San ramon st, Houston, TX, 77077

cluster_0

  • No labels