Skip to content

Releases: IBMStreams/streamsx.kafka

Kafka Toolkit v1.7.0

04 Dec 10:47
Compare
Choose a tag to compare

What's new in this toolkit release

This release has following changes and new features:

  • The default value of the commitCount parameter of the MessageHubConsumer has changed from 500 to 2000.
  • The toolkit contains SPL types for standard messages (#43).
    • MessageType.StringMessage
    • MessageType.BlobMessage
    • MessageType.ConsumerMessageMetadata
    • MessageType.TopicPartition

Kafka Toolkit v1.6.0

30 Nov 08:32
Compare
Choose a tag to compare

This Release of the Kafka toolkit contains following enhancements:

KafkaProducer

1. Metric reporting

The Kafka producer in the client collects various performance metrics. A subset has been exposed as custom metrics to the operator (issue #112).

Custom Metric name Description
connection-count The current number of active connections.
compression-rate-avg The average compression rate of record batches (as percentage, 100 means no compression).
topic:compression-rate The average compression rate of record batches for a topic (as percentage, 100 means no compression).
record-queue-time-avg The average time in ms record batches spent in the send buffer.
record-queue-time-max The maximum time in ms record batches spent in the send buffer.
record-send-rate The average number of records sent per second.
record-retry-total The total number of retried record sends
topic:record-send-total The total number of records sent for a topic.
topic:record-retry-total The total number of retried record sends for a topic
topic:record-error-total The total number of record sends that resulted in errors for a topic
records-per-request-avg The average number of records per request.
requests-in-flight The current number of in-flight requests awaiting a response.
request-rate The number of requests sent per second
request-size-avg The average size of requests sent.
request-latency-avg The average request latency in ms
request-latency-max The maximum request latency in ms
batch-size-avg The average number of bytes sent per partition per-request.
outgoing-byte-rate The number of outgoing bytes sent to all servers per second
bufferpool-wait-time-total The total time an appender waits for space allocation.
buffer-available-bytes The total amount of buffer memory that is not being used (either unallocated or in the free list).

2. Default producer configs

Previous releases of the toolkit have used the Kafka default producer configs unless otherwise configured by the user. For optimum throughput these settings had to be tuned. Now, the important producer configs have default values, which result in higher throughput to the broker and reliability (issue #113):

Property name Kafka default New operator default
retries 0 10. When 0 is provided as retries and consistentRegionPolicy parameter is Transactional retries is adjusted to 1.
compression.type none lz4
linger.ms 0 100
batch.size 16384 32768
max.in.flight.requests.per.connection 5 1 when guaranteeOrdering parameter is true, limited to 5 when provided and consistentRegionPolicy parameter is Transactional, or 10 in all other cases.

3. New optional operator parameter guaranteeOrdering

If set to true, the operator guarantees that the order of records within a topic partition is the same as the order of processed tuples when it comes to retries. This implies that the operator sets the max.in.flight.requests.per.connection producer config automatically to 1 if retries are enabled, i.e. when the retries config is unequal 0, what is the operator default value.

If unset, the default value of this parameter is false, which means that the order can change due to retries.

4. Queue time control

In previous releases including 1.5.1, the producer operator was easily damageable when the producer did not come up transferring the data to the broker nodes. Then records stayed to long in the accumulator (basically a buffer), what caused timeouts with subsequent restarts (issue #128).

The producer now has an adaptive control that monitors several producer metrics and flushes the producer on occasion to ensure that the maximum queue time of records stays below typically 5 seconds. This enhancement addresses the robustness of the producer operator.

KafkaConsumer

1. Metric reporting

The Kafka consumer in the client collects various performance metrics. A subset has been exposed as custom metrics to the operator:

Custom Metric name Description
connection-count The current number of active connections.
incoming-byte-rate The number of bytes read off all sockets per second
topic-partition:records-lag The latest lag of the partition
records-lag-max The maximum lag in terms of number of records for any partition in this window
fetch-size-avg The average number of bytes fetched per request
topic:fetch-size-avg The average number of bytes fetched per request for a topic
commit-rate The number of commit calls per second
commit-latency-avg The average time taken for a commit request

One of the most interesting metric is the record lag for every consumed topic partition. The lag is the difference between the offset of the last inserted record and current reading position.

Kafka Toolkit v1.5.1

17 Oct 12:53
Compare
Choose a tag to compare

This toolkit release is a bugfix release that resolves following issues:

  • #124 - KafkaConsumer does not assign to all Partitions after restart when in CR
  • #123 - KafkaConsumer should keep the generated group.id accross PE restarts
  • #122 - resolve security vulnerabilities in third-party libs
  • #121 - KafkaConsumer throws NullPointerException when topic is missing

Kafka Toolkit v1.5.0

08 Oct 11:19
f030e15
Compare
Choose a tag to compare

This release of the Kafka toolkit contains following new features and enhancements:

  • enable Kafka consumer groups when in consistent region, #72. Please have a look at the sample KafkaConsumerGroupWithConsistentRegion.
  • Compatibility with Kafka brokers at version 0.10.2, 0.11, 1.0, 1.1, and 2.0
  • new custom metrics nAssignedPartitions, isGroupManagementActive, nPartitionRebalances, drainTimeMillis, and drainTimeMillisMax
  • Operators generate a client ID that allows to identify the Job and Streams operator, when no client ID is specified. #109 . The pattern for the client Id is {C|P}-J<job-ID>-<operator name>, where C denotes a consumer operator, P a producer.

Solved issues in this release:

  • #102 - KafkaConsumer crashes when fused with other Kafka consumers
  • #104 - toolkit build fails with IBM Java from QSE
  • #105 - Message resources for non en_US locale should return en_US message when message is not available in specific language
  • #115 - KafkaProducer: adapt transaction timeout to consistent region period
  • #116 - KafkaProducer: exactly-once delivery semanitc is worse than at-least-once

The online version of the SPL documentation for this toolkit is available here.

Kafka Toolkit v1.4.2

30 Jun 16:10
Compare
Choose a tag to compare

This bugfix release fixes following issue:

#100 - KafkaProducer registers for governance as input/source instead of output/sink

Kafka Toolkit v1.4.1

28 Jun 09:21
Compare
Choose a tag to compare

This bugfix release fixes following issues:

  • #98 - KafkaConsumer can silently stop consuming messages

Kafka Toolkit v1.4.0

22 Jun 09:57
Compare
Choose a tag to compare

Release v1.4.0 of the Kafka toolkit contains following improvements and fixes for the KafkaConsumer operator:

  • Monitoring of memory consumption in addition to monitoring the internal queue fill (issue #91)
  • New custom metrics:
    • nLowMemoryPause - Number times message polling was paused due to low memory.
    • nQueueFullPause - Number times message polling was paused due to full queue.
    • nPendingMessages - Number of pending messages to be submitted as tuples.
  • Committing offsets turned from synchronous to asynchronously. This can help improve throughput
  • Offsets are committed now after tuple submission. In previous versions, offsets have been committed immediately after messages have been buffered internally - before tuple submission. (issue #76)
  • New operator parameter commitCount to specify the commit period in terms of number of submitted tuples when the operator does not participate in a consistent region.
  • Offset commit on drain when the operator is part of a consistent region. (issue #95)
  • corrected issue #96 - KafkaConsumer: final de-assignment via control port does not work

Kafka Toolkit v1.3.3

07 May 10:48
Compare
Choose a tag to compare

This release contains following fixes/enhancements:

  • internationalization (translated messages) for de_DE, es_ES, fr_FR, it_IT, ja_JP, ko_KR, pt_BR, ru_RU, zh_CN, and zh_TW
  • fixed issue #88 - Consumer op is tracing per tuple at debug level

Known issues

  • When Kafka's group management is enabled (KafkaConsumer not in consistent region and startPosition parameter unset or Default), the KafkaConsumer can silently stop consuming messages when committing Kafka offsets fails. #98. As a workaround, the consumer property enable.auto.commit=true can be used in a property file or app Option.

Kafka Toolkit v1.3.2

13 Apr 11:22
Compare
Choose a tag to compare

This release is a bugfix release that contains following fixes:

  • #84 - Add a KafkaTransactionSample to the samples folder
  • #83 - Producer behavior when control topic cannot be auto-created? (avoid NullPointerException)
  • #82 - Transactional consistent region policy may commit duplicate transactions

Known issues

  • When Kafka's group management is enabled (KafkaConsumer not in consistent region and startPosition parameter unset or Default), the KafkaConsumer can silently stop consuming messages when committing Kafka offsets fails. #98. As a workaround, the consumer property enable.auto.commit=true can be used in a property file or app Option.

Kafka Toolkit v1.3.1

26 Mar 12:28
Compare
Choose a tag to compare

This release fixes following issues:

  • #75 - KafkaConsumer: InterruptedException not handled properly
  • #79 - Consumer commits same offset multiple times
  • SPL Documentation: #77, #78, #80

Restrictions

  • The Transactional value for the consistentRegionPolicy parameter requires that a control topic with name __streams_control_topic exists in the Kafka broker to store committed transaction sequence numbers. In order to ensure exactly once delivery semantics, this topic must be created with only one partition. If the broker is configured for automatic topic creation, the topic should be created manually to avoid that it is created automatically with multiple partitions as configured by broker property num.partitions.
    See also issue #82.

Known issues

  • When Kafka's group management is enabled (KafkaConsumer not in consistent region and startPosition parameter unset or Default), the KafkaConsumer can silently stop consuming messages when committing Kafka offsets fails. #98. As a workaround, the consumer property enable.auto.commit=true can be used in a property file or app Option.