Document toolboxDocument toolbox

Kafka Sender

Description

This unit is a Processor unit type.

The Kafka Sender unit writes input events into a Kafka queue.

An event comes in through the in port. A Kafka record is created and sent to a Kafka queue. Kafka API is asynchronous, meaning feedback about the success or error will not be provided immediately. Fill in the 'ackInfo' settings to be notified when feedback is in.

Successful events will be emitted through the send port, enriched with the fields defined in 'ackInfo'.
If an error occurs, the input event is enriched with new fields describing the problem, and the event is sent through the error port.

When an input event enters through the flush port a signal is sent to flush the Kafka producer. The event sent to the flushed output port without modification.
When an input event enters through the metrics input port, Kafka metrics are obtained. For every metric obtained, the input event is enriched with fields containing the metric name and value, and sent to the metrics output port.

Configuration

After dragging this unit into the Flow canvas, double-click it to access its configuration options. The following table describes the configuration options of this unit:

Tab

Field

Description

Tab

Field

Description

General

Name

Enter a name for the unit. It must start with a letter, and cannot contain spaces. Only letters, numbers, and underscores are allowed.

Description

Enter a description detailing the scope of the unit.

Kafka

Client ID

An id string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging.

Bootstrap servers

A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.

The client will make use of all servers, irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should use the following format:
host1:port1,host2:port2,...

Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).

Acks

The number of acknowledgments the producer requires the leader to have received before considering a request as complete.

This controls the durability of records that are sent. The following settings are allowed:

0 - If set to zero, the producer will not wait for any acknowledgment from the server. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1.


1 - The leader will write the record to its local log but will respond without awaiting full acknowledgment from all followers. In this case, should the leader fail immediately after acknowledging the record but before the followers have replicated it, the record will be lost.


all - The leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee. This is equivalent to the acks=-1 setting.

Linger

Amount of time to wait (in seconds) before sending requests in the absence of a full batch size. Leave at 0 (default) for no delay.

The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out. However, in some circumstances the client may want to reduce the number of requests even under moderate load. This setting acomplishes this by adding a small amount of artificial delay—that is, rather than immediately sending out a record, the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together.

This can be thought of as analogous to Nagle's algorithm in TCP. 

For example, entering a linger of 5ms would have the effect of reducing the number of requests sent but would add up to 5ms of latency to records sent in the absence of a load.

Once the batch size of records for a partition is fulfilled, it will be sent immediately regardless of this setting. However, if there are fewer than this many bytes accumulated for this partition, the unit will 'linger' for the specified time waiting for more records to come in. 

Batch size

Enter the default batch size in bytes to avoid batching records larger than this size. Enter 0 to disable batching entirely.

The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server.
Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent.

A small batch size will make batching less common and may reduce throughput.

A very large batch size will use more memory to allocate a buffer of the specified batch size in anticipation of additional records.

Buffer memory

Enter the total bytes of memory to use to buffer records waiting to be sent to the server.

If records are sent faster than they can be delivered to the server, the producer will block for maxBlocks and will throw an exception.

This setting should correspond roughly to the total memory, but is not a hardbound since not all memory is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests.

Compression type

Enter the compression type for all data generated by the producer.

The default is none (i.e. no compression). Valid values are none, gzip, snappy, or lz4.

Compression is of full batches of data, so the efficacy of batching will also impact the compression ratio (more batching means better compression).

Max block

The max block controls how many seconds during which the KafkaProducer.send() and KafkaProducer.partitionsFor() will block. These methods can be blocked either because the buffer is full or metadata is unavailable. Blocking in the user-supplied serializers or partitioner will not be counted against this timeout.

Retries

The number of retries on error.

Setting a value greater than zero will cause the client to resend any failed record with a potentially transient error. Note that this retry is no different than if the client resent the record upon receiving the error.

Request timeout

The maximum amount of time the client will wait for the response of a request (in seconds).

If the response is not received before the timeout elapses, the client will resend the request if necessary or fail the request if retries are exhausted.

Fields

There are static (unit level) and dynamic (event level) settings.
Everything Kafka needs to create a producer must be a static setting.

Everything that can be defined by a Kafka record can be a dynamic setting.
topic & partition can be both static and dynamic.

timestamp & key can be dynamic, or not sent.

value must always be dynamic.

Topic

The topic for sent records.

Topic field

Enter the name of an input event field containing the topic for the sent record.

Partition

Enter the number of partitions for clusters of sent records.

Partition field

Enter the name of an input event field containing the partition value for the sent record.

Timestamp field

Enter the name of an input event field containing the timestamp value for the sent record.

Key field

Enter the name of an input event field containing a key for the sent record.

Value field

Enter the name of an input event field containing the value for the sent record.

Metric name field

Enter a name for the output event field containing the name of the metric.

Metric value field

Enter a name for the output event field containing the value of the metric.

Ack info

Checksum field

Enter a name for the output event field containing a checksum of sent value.

Partition field

Enter a name for the output event field containing the partition to which the record is assigned.

Offset field

Enter a name for the output event field containing the offset where the record is stored.

Timestamp field

Enter a name for the output event field containing the timestamp assigned.

Input ports

Port

Description

Port

Description

in

Events to be sent as Kafka records.

flush

Events to request the Kafka producer to be flushed.

metrics

Events to request Kafka metrics. 

Output ports

Port

Description

Port

Description

send

Outputs events successfully sent to Kafka, enriched with the fields defined in the Ack info tab.

error

Outputs events that could not be sent to Kafka, enriched with standard error fields.

flushed

Outputs events that requested a flush, without any modification

metrics

Outputs events that requested metrics, repeated once for every obtained metric, and enriched with fields for metric name and value.

Â