Skip to main content

KIP-932 - Queues for Kafka - Analysis of EDA Compatibility

·1988 words·10 mins· loading · loading · ·
kafka eda kip queues analysis
Luke Taylor
Author
Luke Taylor
I like a lot of things

Overview
#

After reading Gunnar Morling’s blog post investigating queues for Kafka , it got me wondering if the way queues are implemented is at odds with EDA principles and best practices. This is a fairly formal analysis, mostly due to the number of formal concerns.

This analysis examines whether KIP-932 “Queues for Kafka” contradicts or undermines event-driven architecture (EDA) principles. The key question is whether separating consumption by key represents a fundamental departure from EDA principles or simply introduces an additional consumption model within Kafka’s event streaming paradigm.

Key Finding: KIP-932 does not fundamentally change Kafka’s nature as an event streaming platform but rather supplements it with queue-like processing capabilities. It maintains the core EDA principle of events as the source of truth while adding flexibility in consumption patterns.

High Value Use Cases
#

These represent scenarios where KIP-932 provides substantial benefits

  1. Partition Count Limitations: Services that need more consumer parallelism than the current partition count allows, enabling scaling beyond the traditional “one consumer per partition” constraint
  2. Cost Optimization: Reduces the operational cost of maintaining large numbers of partitions in cloud-hosted Kafka deployments (especially relevant for Confluent Cloud pricing models)
  3. Processing Parallelism: Enables efficient parallel processing of events within a partition, improving throughput without sacrificing key-based ordering guarantees
  4. Dynamic Scaling: Allows horizontal scaling of consumers in response to load without reconfiguring partition counts, especially valuable in container/Kubernetes environments with tools like KEDA - this is basically an unlock to make KEDA amazing!

KIP-932 Overview
#

KIP-932 introduces cooperative consumption through “share groups,” allowing multiple consumers to collectively process messages from the same partition. This feature addresses a significant limitation in Kafka’s original consumer group model, which ties scaling to partition count.

Core Features of KIP-932
#

  1. Share Groups: Multiple consumers can process records from the same partition
  2. Record-level Acknowledgment: Consumers ack individual records rather than offsets
  3. Key-based Consumption: Enables routing messages with the same key to the same consumer
  4. Work Sharing: Allows horizontal scaling of consumers independent of partition count
  5. Cooperative Processing: Consumers work together on partitions rather than exclusively owning them

Compatibility with Event-Driven Architecture Principles
#

EDA Principle KIP-932 Compatibility Analysis
Events as First-class Citizens ✅ Compatible Events remain the fundamental unit and source of truth
Event Log as Source of Truth ✅ Compatible The log structure remains unchanged; new consumption model added on top
Decoupling of Producers/Consumers ✅ Compatible Maintains or enhances decoupling by allowing flexible consumption
Immutability of Events ✅ Compatible Events remain immutable in the log
Temporal Ordering ✅ Compatible Preserves temporal ordering within partitions
Event Replay Capability ✅ Compatible Log replay capabilities remain intact

Architectural Implications
#

1. What KIP-932 Changes
#

  • Consumption Model: Introduces queue-like consumption patterns without changing event production
  • Scaling Model: Decouples consumer scaling from partition count
  • Message Processing Guarantees: Enables record-level acknowledgment rather than just offset-based
  • Consumer Coordination: Allows cooperative work on partitions instead of exclusive ownership

2. What KIP-932 Preserves
#

  • Log-based Storage: Events are still stored in an immutable, append-only log
  • Event-first Paradigm: Events remain the primary integration mechanism
  • Producer Independence: Producers are unaffected and continue working exactly as before
  • Event Replay: The ability to replay events from any point remains intact
  • Temporal Order: Order of events within partitions remains preserved

Queue vs. EDA: Feature Comparison
#

Feature Traditional Queue Traditional Kafka Kafka with KIP-932
Message Retention Removed after processing Configurable retention Configurable retention
Consumption Model Competing consumers Consumer groups tied to partitions Flexible: Traditional or cooperative
Processing Acknowledgment Message-level ack Offset-based Both offset and record-level available
Message Replay Limited/None Full replay capability Full replay capability
Scalability Limited by competing consumer model Limited by partition count Independent of partition count

Assessment of Architectural Impact
#

KIP-932 represents an evolutionary rather than revolutionary change to Kafka’s architecture. It adds queue-like features while preserving the core event streaming foundation. The primary architectural implication is increased flexibility in how events are consumed, not a fundamental change to how events are produced, stored, or conceptualized.

Key Analysis Points:
#

  1. Adding vs. Replacing: KIP-932 adds capabilities rather than replacing existing ones
  2. Opt-in Feature: Traditional consumer groups remain fully supported
  3. Log Foundation: The underlying log-based architecture remains unchanged
  4. Event Immutability: Events remain immutable in the log, preserving a core EDA principle
  5. Temporal Ordering: Event ordering within partitions remains preserved

Recommendations for Maintaining EDA Principles
#

For teams concerned about maintaining pure EDA principles while adopting KIP-932:

  1. Maintain Event-First Thinking: Continue to model domain changes as events
  2. Use Share Groups Judiciously: Apply queue-like processing only where scaling or specific message distribution is needed
  3. Preserve Event Sourcing Patterns: Continue using events as the source of truth
  4. Document Consumption Models: Clearly separate traditional consumer groups from share groups in documentation
  5. Establish Architecture Guidelines: Create clear guidelines for when each consumption model is appropriate

Technical Implementation Considerations
#

KIP-932 aligns with Kafka’s core principles while extending consumption capabilities. These are specific considerations when implementing:

Technical Implementation Focus Areas
#

Area Specific Consideration Implementation Approach
Key-Based Processing KIP-932 extends Kafka’s key-based partitioning model to consumption, maintaining the same ordering guarantees within key groups Leverage existing key-partition design patterns; KIP-932 works naturally with current key-partitioning strategies
Consumer Code Changes Moving from offset commits to record acknowledgment requires specific API usage Use the ShareGroup API explicitly rather than attempting to modify current Consumer Group implementations
Unkeyed Topics Share groups provide limited benefits for unkeyed topics where message ordering across the topic is important Reserve share groups for keyed topics where the primary concern is scaling processing of independent keys
Rebalance Handling Share group rebalancing behavior differs from consumer groups and requires specific handling Implement explicit tests for rebalance scenarios; behavior is well-defined but different
Client Library Support Adoption will depend on client library implementation across languages Verify Share Group API support in your programming language’s client libraries before planning implementation

Organizational Adoption Focus
#

Area Specific Focus Implementation Approach
Usage Guidelines Define clear criteria for when share groups are appropriate (e.g., when partition count limits are reached, when scaling is needed for throughput) Document specific use cases with concrete examples, focusing on situations where consumer scaling shouldn’t be constrained by partition count
Team Knowledge Ensure engineers understand that share groups maintain Kafka’s ordering guarantees for keys Focused training on how share groups enhance rather than change Kafka’s fundamental event-ordering properties
Implementation Consistency Standardize how teams implement record acknowledgment and error handling Create organization-specific client wrappers with standardized acknowledgment patterns; focus especially on error scenarios
Organizational Alignment Address potential disagreements on adoption by focusing on metrics and use cases Establish objective criteria for adoption such as CPU utilization improvements, throughput gains, or reduced partition count
Quick Wins Identify existing bottlenecks that are perfect candidates for share groups Target services with known key hotspots or those that require partition counts that exceed reasonable management overhead

Specific Application Scenarios for KIP-932
#

KIP-932 addresses very specific technical challenges that occur in real-world Kafka deployments:

Implementation Checklist
#

For teams preparing to adopt Share Groups, focus on these specific technical aspects:

  1. Consumer Parallelism Analysis: Use monitoring tools to identify consumer groups that would benefit from additional parallelism beyond partition count; look for services with high lag or slower message processing times where adding more consumers would improve throughput
  2. Consumer Logic Review: Examine current consumer implementation to ensure idempotent processing, if required
  3. Client Library Verification: Confirm your client library implementation has proper support for the ShareGroup API and record-level acknowledgment
  4. Partition Count Optimization: Calculate optimal partition count based on producer througput rather than consumer parallelism requirements; this enables right-sizing partition counts to data volume rather than scaling needs
  5. Order-Sensitivity Assessment: Identify whether your processing has ordering requirements beyond key-level ordering (which share groups preserve) or if global topic ordering is needed (where share groups provide fewer benefits), or where key-level ordering is not required (share groups not recommended)

Technical Implementation Guide
#

Specific guidance for engineering teams implementing share groups:

  1. Record Acknowledgment Pattern: Implement consistent record acknowledgment patterns with proper error handling to ensure processing reliability
  2. Key Distribution Analysis: Analyze your message key distribution to understand potential benefits; topics with diverse keys will benefit most from share groups
  3. Monitoring Instrumentation: Add specific metrics for share group operations - tracking record processing times, acknowledgment rates, and consumer resource utilization
  4. Scaling Automation: Integrate with container orchestration platforms like Kubernetes to enable dynamic scaling based on message processing metrics
  5. State Management: Review your application’s state management approach, as consumers now process specific keys rather than entire partitions

Technical Implementation Details
#

Specific technical aspects of KIP-932 implementation that engineers should understand:

Share Group Protocol Specifics
#

Key technical details of the share group protocol implementation:

  • Record Delivery Mechanism: Share groups use record-level acknowledgment, enabling multiple consumers to process messages from the same partition concurrently
  • Broker-side Management: The broker maintains the state of acknowledged records, extending Kafka’s traditional broker responsibilities (a departure from Kafka’s traditional “dumb brokers, smart clients” philosophy)
  • Message Distribution: The protocol distributes messages to consumers based on key affinity while enabling processing parallelism
  • Rebalance Protocol: Leverages the cooperative rebalancing protocol (introduced in KIP-429 ) to minimize disruption during consumer scaling or failover, and will need to consider future improvements from KIP-848 which is not yet available (at the time of writing) in Confluent Cloud
  • Consumer Coordination: Share group consumers coordinate processing through the broker rather than through direct partition ownership

Monitoring Metrics That Matter
#

Specific metrics to implement for share group monitoring:

  • Record Processing Latency: Track processing time for records to identify throughput bottlenecks by implementing custom metrics in your consumer application
  • Message Consumption Rate: Monitor the rate at which messages are being consumed by the share group
  • Record Acknowledgment Rate: Track record acknowledgment rates to identify processing issues
  • Consumer Resource Utilization: Monitor CPU, memory, and network usage per consumer to optimize scaling
  • Rebalance Frequency & Duration: Measure rebalance operations which may affect processing latency
  • Unacknowledged Record Count: Track records that remain unacknowledged beyond expected processing timeframes

Architectural Solutions Enabled by KIP-932
#

Specific architectural patterns that share groups enable:

  1. Parallel Processing Model: Process messages from the same partition in parallel while maintaining key-based ordering
  2. Consumer Scaling Beyond Partition Limits: Scale consumers beyond the traditional partition count limitation
  3. Partition Count Optimization: Size partition count based on producer throughput and storage requirements rather than consumer scaling needs
  4. Dynamic Consumer Scaling: Scale consumers up/down independently of partition structure

Comparison with RabbitMQ
#

System Feature Specific Technical Comparison to KIP-932
RabbitMQ Competing Consumers KIP-932 maintains strict per-key ordering guarantees while enabling parallel consumption, combining queue-like processing with Kafka’s immutable log model for replay capabilities

Conclusion: Technical Reality
#

KIP-932 “Queues for Kafka” represents a natural evolution of Kafka’s consumption model that addresses real operational challenges without compromising its fundamental principles. The key points to understand:

  1. Alignment with Core Principles: Share groups extend Kafka’s key-based partitioning model to consumption, maintaining the ordering guarantees that are central to Kafka’s design while adding flexibility

  2. Performance Optimization: Share groups enable more efficient resource utilization by allowing consumer scaling independent of partition count constraints

  3. Technical Continuity: The feature preserves event immutability, temporal ordering, log persistence, and replay capabilities - core EDA principles remain fully intact

  4. Implementation Considerations: Share groups introduce record-level acknowledgment and new broker responsibilities, representing a shift from Kafka’s traditional “dumb brokers, smart clients” approach

  5. Operational Benefits: Direct benefits include right-sizing partition counts based on data needs rather than scaling constraints, enabling more flexible consumer scaling models, and optimizing resource utilization

The value proposition is clear: KIP-932 adds capabilities that address real operational constraints while preserving Kafka’s core architectural strengths. It enhances Kafka’s consumption model without requiring teams to compromise on EDA principles, making it a pragmatic enhancement that respects Kafka’s fundamental design philosophy. For additional insights, Gunnar Morling provides an excellent overview in his analysis .

References
#