Document toolboxDocument toolbox

Cloudflare collector

Overview

Cloudflare is a Content Delivery Network and DDoS mitigation cloud service company. It primarily acts as a reverse proxy between a website's visitor and the Cloudflare customer's hosting provider.

Data sources

Data source

Description

Devo table

API endpoint

Description

Data source

Description

Devo table

API endpoint

Description

Cloudflare

Audit Logs

cdn.cloudflare.audit.events

GET https://api.cloudflare.com/client/v4/{entity_type}/{entity_id}/audit_logs?since={start_date}&before={end_date}Z&page={page_num}&per_page={page_limit}&direction={direction}, where:

  • {entity_type} is one of the two entity types allowed: organizations or accounts.

  • {entity_id} is the account or organization identifier

  • {start_date} to limit the returned results to logs newer than the specified date with RFC3339 format (YYYY-MM-DDTHH:mm:ssZ).

  • {end_date} to limit the returned results to logs older than the specified date with RFC3339 format (YYYY-MM-DDTHH:mm:ssZ).

  • {page_num} which page of results to return.

  • {page_limit} how many results to return per page.

  • {direction} is the direction of the chronological sorting (allowed values are asc or desc -default).

Get audit logs for an account or an organization, filter by who made the change, which zone was the change was made on, and the timeframe of the change.

 

GraphQL Analytics

cdn.cloudflare.firewall.samples

POST https://api.cloudflare.com/client/v4/graphql, where the body of the request use the following template:

{ "query": "query { viewer { zones (filter: {zoneTag: $zone_tag}) { <DATASET>( filter: { datetime_geq: $start_date, datetime_lt: $end_date }, limit: $limit, orderBy: [datetime_ASC] ) { datetime <FIELDS> } } } }", "variables": { "zoneTag": "<ZONE_TAG>", "filter": { "zone_tag": "<ZONE_TAG>", "start_date": "<START_DATE>", "end_date": "<END_DATE>", "limit": <LIMIT> } } }

where:

  • <DATASET> is the dataset (product) name you want to query against a zone. Right now, the only dataset allowed by the collector is for Firewall Activity Log: firewallEventsAdaptive. Check the following URL for API available datasets: Datasets (tables) · Cloudflare Analytics docs

  • <FIELDS> list of fields you want to fetch. List of fields used for firewallEventsAdaptive dataset:

- action - clientAsn - clientASNDescription - clientCountryName - clientIP - clientIPClass - clientRefererHost - clientRefererPath - clientRefererQuery - clientRefererScheme - clientRequestHTTPHost - clientRequestHTTPMethodName - rayName - clientRequestHTTPProtocol - clientRequestPath - clientRequestQuery - clientRequestScheme - edgeColoName - edgeResponseStatus - kind - matchIndex - originResponseStatus - originatorRayName - ruleId - source - userAgent - apiGatewayMatchedEndpoint - apiGatewayMatchedHost - contentScanHasFailed - contentScanNumMaliciousObj - contentScanObjResults - contentScanNumObj - contentScanObjSizes - contentScanObjTypes - date - datetime - datetimeFifteenMinutes - datetimeFiveMinutes - datetimeHour - datetimeMinute - description - httpApplicationVersion - leakedCredentialCheckResult - ref - rulesetId - sampleInterval - wafAttackScore - wafAttackScoreClass - wafMlAttackScore - wafMlSqliAttackScore - wafMlXssAttackScore - wafRceAttackScore - wafSqliAttackScore - wafXssAttackScore - zoneVersion
  • <ZONE_TAG> is the zone tag (or zone key/ID).

  • <START_DATE> is the initial date for the query (inclusive).

  • <END_DATE> is the final date for the query (exclusive).

  • <LIMIT> to limit the results.

  • Note: Earlier there were just 31 fields now we have added 24 fields. So, there are a total of 55 fields available now.

Query for a dataset in a specific zone and timeframe. The only dataset allowed right now by the collector is Firewall Activity Log: firewallEventsAdaptive.

The collector uses limit, orderBy and datetime filters for pagination. For a timeframe request, is limit is not reached no more request are needed. But if limit is reached, collector removes all events with the last datetime value from the result and performs a new timeframe request using this last datetime as start_dateand the same end_date. As start_date is inclusive, all the request removed from the previous request should be returned again. In case all the events returned by the request have the same datetime and also the maximum limit per request is reached, the collector will add all the events and use as start_date the last datetime plus one second. Take into account that this behavior can cause losing events for the requested timeframe.

The collector also performs a request to check allowed limits for each dataset on service setup: Limits · Cloudflare Analytics docs

In a small number of cases, the analytics provided on the Cloudflare GraphQL Analytics API are based on a sample — a subset of the dataset. In these cases, Cloudflare Analytics returns an estimate derived from the sampled value. For example, suppose that during an attack the sampling rate is 10% and 5,000 events are sampled. Cloudflare will estimate 50,000 total events (5,000 × 10) and report this value in Analytics.

See Sampling · Cloudflare Analytics docs for more details.

For more information on how the events are parsed, visit our page.

You need to provide certain additional credentials based on whether you want to ingest Audit Logs or GraphQL Analytics events.

For Audit logs, you need to provide the following for each entity:

"entity_type": "<ENTITY_TYPE>", "entity_name": "<ENTITY_NAME>", "entity_id": "<ENTITY_ID>"

For GraphQL Analytics, you need to provide us with this for each zone:

"zone_name": "<ZONE_NAME>",
"zone_id": "<ZONE_ID>"

Vendor setup

To configure the Cloudflare Collector Services you need to configure one of the allowed authentication methods:

  • API tokens

  • API keys

Authentication method

Details

Configuration properties

Link

Authentication method

Details

Configuration properties

Link

API Tokens

Cloudflare recommends API Tokens as the preferred way to interact with Cloudflare APIs. You can configure the scope of tokens to limit access to account and zone resources, and you can define the Cloudflare APIs to which the token authorizes access.

The following credentials properties are needed:

credentials: api_token: <API_TOKEN>

Create API token · Cloudflare API docs

API Keys

Unique to each Cloudflare user and used only for authentication. API keys do not authorize access to accounts or zones.

Use the Global API Key for authentication. Only use the Origin CA Key when you create origin certificates through the API.

The following credentials properties are needed:

Get API keys (legacy) · Cloudflare API docs

Accepted authentication methods

Depending on how did you obtain your credentials, you will have to either fill or delete the following properties on the JSON credentials configuration block.

Authentication method

api_token

api_key

user_email

Authentication method

api_token

api_key

user_email

API Tokens

REQUIRED

 

 

API Keys

 

REQUIRED

REQUIRED

Run the collector

Once the data source is configured, you can either send us the required information if you want us to host and manage the collector for you (Cloud collector), or deploy and host the collector in your own machine using a Docker image (On-premise collector).

 

Change log

Release

Released on

Release type

Details

Recommendations

Release

Released on

Release type

Details

Recommendations

v1.1.1

Sep 17, 2024

Bug Fixing

Bug fix:

  • Some minor fixes related to the auto-update feature

  • Updated the docker base image

Recommended Version

v1.1.0

Sep 17, 2024

IMPROVEMENT

Improvements:

  • Upgraded Docker base image to 1.3.0

  • Added a flag to get 100 pages by default for cloudflare_audit input.

  • Added another flag to allow to decide if they want limit on number of pages.

  • Added flag to include all the fields for cloudflare_graphql_analytics input.

  • Refactored the code to the new template

  • Upgraded DCSDk to version 1.24.4

    • Fixed error related a ValueError exception not well controlled.

    • Fixed error related with loss of some values in internal messages(collector_name, collector_id and job_id)

    • Improve Controlled stop when InputProcess is killed

    • Change internal queue management for protecting against OOMK

    • Extracted ModuleThread structure from PullerAbstract

    • Improve Controlled stop when both processes fails to instantiate

    • Upgrade DevoSDK dependency to version v5.4.0

    • Fixed error in persistence system

    • Applied changes to make DCSDK compatible with MacOS

    • Added new sender for relay in house + TLS

    • Added persistence functionality for gzip sending buffer

    • Added Automatic activation of gzip sending

    • Improved behaviour when persistence fails

    • Upgraded DevoSDK dependency

    • Fixed console log encoding

    • Restructured python classes

    • Improved behaviour with non-utf8 characters

    • Decreased default size value for internal queues (Redis limitation, from 1GiB to 256MiB)

    • New persistence format/structure (compression in some cases)

    • Removed dmsg execution (It was invalid for docker execution)

    • Added extra check for not valid message timestamps

    • Added extra check for improve the controlled stop

    • Changed default number for connection retries (now 7)

    • Fix for Devo connection retries

    • Updated DevoSDK to v5.1.10

    • Fix for SyslogSender related to UTF-8

    • Enhance of troubleshooting. Trace Standardization, Some traces has been introduced.

    • Introduced a mechanism to detect "Out of Memory killer" situation.

    • Updated DevoSDK to v5.1.9

    • Fixed some bug related to development on MacOS

    • Added an extra validation and fix when the DCSDK receives a wrong timestamp format

    • Added an optional config property for use the Syslog timestamp format in a strict way

    • Fixed error in pyproject.toml related to project scripts endpoint

    • Updated DevoSDK to v5.1.7

    • Introduced pyproject.toml

    • Added requirements-dev.txt

    • Added input metrics

    • Modified output metrics

    • Updated DevoSDK to v5.1.6

    • Standardized exception messages for traceability

    • Added more detail in queue statistics

Updates