Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel2
outlinefalse
typeflat
separatorbrackets
printabletrue

Purpose

The Microsoft Azure collector gets data from Azure cloud computing services. Common uses are:

...

Features

Details

Allow parallel downloading (multipod)

The vm_metrics service cannot work in multipod mode. If you want to use the event_hubs service in multipod mode, you must not include a vm_service in the same collector.Running environments

  • collector server

  • on-premise

Populated Devo events

  • table

Flattening pre-processing

  • no

Allowed source events obfuscation

  • yes

Collector services detail

This section is intended to explain how to proceed with specific actions for services.

...

titleEvent Hubs (event_hubs)

General principles

Understanding the following principles of Azure Event Hubs is crucial:

  1. Consumer Groups: A single event hub can have multiple consumer groups, each representing a separate view of the event stream.

  2. Checkpointing: The SDK supports checkpoint mechanisms to balance the load among consumers for the same event hub and consumer group. Supported
    mechanisms include:

    • Azure Blob Storage Checkpoint: Recommended to use one container per consumer group per event hub.

  3. Partition Restrictions: Azure Event Hubs limits the number of partitions based on the event hub tier. For quotas and limits, refer to the official documentation.

Configuration options

Devo supports various configurations to cater to different Azure setups.

Event Hubs authentication configuration

Event Hubs authentication can be via connection strings or client credentials (assigning the Azure Event Hubs Data Receiver role). Preference is given to connection string configuration when both are available.

...

Required parameters

...

Connection string configuration

...

  • event_hub_connection_string

  • event_hub_name

Yaml

Code Block
inputs:
  azure_event_hub:
    id: 100001
    enabled: true
    services:
      event_hubs:
        queues:
          queue_a:
            event_hub_name: event_hub_value
            event_hub_connection_string: event_hub_connection_string_value

Json

Code Block
"inputs": {
 "azure_event_hub": {
 "id": 100001,
 "enabled": true,
 "services": {
 "event_hubs": {
 "queues": {
 "queue_a": {
 "event_hub_name": "event_hub_value",
 "event_hub_connection_string": "event_hub_connection_string_value"
 }
 }
 }
 }
 }
 }

...

Client credentials configuration

...

  • event_hub_name

  • namespace

  • Credentials.client_id

  • Credentials.client_secret

  • Credentials.tenant_id

Yaml

Code Block
inputs:
  azure_event_hub:
    id: 100001
    enabled: true
    credentials:
      client_id: client_id_value
      client_secret: client_secret_value
      tenant_id: tenant_id_value
    services:
      event_hubs:
        queues:
          queue_a:
            namespace: namespace_value
            event_hub_name: event_hub_name_value
Code Block
"inputs": {
 "azure_event_hub": {
 "id": 100001,
 "enabled": true,
 "credentials": {
 "client_id": "client_id_value",
 "client_secret": "client_secret_value",
 "tenant_id": "tenant_id_value"
 },
 "services": {
 "event_hubs": {
 "queues": {
 "queue_a": {
 "namespace": "namespace_value",
 "event_hub_name": "event_hub_name_value"
 }
 }
 }
 }
 }
 }

Configuration considerations

...

Multi-pod mode

...

While multi-pod mode is supported and represents the highest throughput possible for the collector, it requires the user to configure the collector in a specific manner to ensure that the collector operates efficiently and does not send duplicate events to Devo (see below). In most cases, multi-pod mode is unnecessary.

  • High Throughput: Multi-pod mode allows potentially the highest throughput.

    • Multi-pod mode is recommended for scenarios in which the user has more partitions than can be supported on a single collector instance.

  • Consumer Client Thread Limit: The user should specify a client_thread_limit to ensure that the collector utilizes load balancing instead of explicitly assigning partition IDs to the consumer clients.

    • In load-balancing mode, having fewer consumer clients than partitions is allowable, but not as efficient as some consumer clients will fetch events from multiple partitions.

    • In load-balancing mode, having more consumer clients than partitions is allowable, but not as efficient as some consumer clients will not be assigned any partitions.

    • The most efficient design is to ensure that there are as many consumer clients as there are partitions distributed amongst the pods. The easiest way to achieve this is to set the client_thread_limit to 1 and creating as many pods as there are partitions.

  • Azure Blob Storage Checkpointing: Required for multi-pod mode.

    • Warning: Running in multi-pod with local checkpointing will result in duplicate events being sent to Devo because the load balancing operation will have no visibility of the other pods' checkpoints.

...

Standard mode

...

  • Both checkpointing options are supported. In standard mode, the collector will automatically create one consumer client thread per partition per event hub.

  • If the event hubs you wish to fetch data from have too many partitions that can be supported on a single instance (i.e. you have 100 event hubs each with 32 partitions, therefore the collector attempts to create 3200 consumer clients), then you should create multiple collector instances and configure each one to fetch from a subset of the desired events hubs.

Internal process and deduplication method

The collector uses the event_hubs service to pull events from the Azure Event Hubs. Each queue in the event_hubs service represents an event hub that is polled for events.

...

Collector deduplication mechanisms

...

Events are deduplicated using the duplicated_messages_mechanism parameter. There are two methods available:

  • Local Deduplication: Ensures that subsequent duplicate events from the same event hub are not sent to Devo. This method operates individually within each consumer client.

  • Global Deduplication: Utilizes a shared cache across all event hub consumers for a given collector. As events are ingested into Devo, the collector checks if the event has already been consumed by another event hub consumer. The event will not be sent to Devo if it has already been consumed. The global cache
    tracks the last 1000 events for each consumer client.

If the global deduplication method is selected, the collector will automatically employ the local deduplication method as well.

...

Checkpointing mechanisms

...

The collector offers two distinct methods for checkpointing, each designed to prevent the re-fetching of events from Azure Event Hubs. These mechanisms ensure efficient event processing by maintaining a record of the last processed event in each partition.

...

Local Persistence Checkpointing

  • Overview: By default, the collector employs local persistence checkpointing. This method is designed to keep track of the last event offset within each partition of
    an Event Hub, ensuring events are processed once without duplication.

  • How It Works: As the collector consumes messages from an Event Hub, it records the offset of the last processed event locally. On subsequent pulls from the Event
    Hub, the collector resumes processing from the next event after the last recorded offset, effectively skipping previously processed events.

  • Use Case: Ideal for single-instance deployments where all partitions of an Event Hub are managed by a single collector instance.

Azure Blob Storage Checkpointing

...

Overview: As an alternative to local persistence, the collector can be configured to use Azure Blob Storage for checkpointing. This approach leverages Azure's cloud storage to maintain event processing state.

...

Configuration:

  • Option 1: Specify both an Azure Blob Storage account and container name. This method requires the collector to have appropriate access permissions to the specified Blob Storage account.

  • Option 2: Provide an Azure Blob Storage connection string and container name. This method is straightforward and recommended if you have the connection
    string readily available.

...

Benefits:

  • Multi-pod Support: Enables the collector to operate in a distributed environment, such as Kubernetes, where multiple instances (pods) of the collector can run concurrently. Checkpointing data stored in Azure Blob Storage ensures that each instance has access to the current state of event processing, facilitating efficient load balancing and event partition management.

  • Durability: Utilizes Azure Blob Storage's durability and availability features to safeguard checkpointing data against data loss or corruption.

...