Overview #
After reading Gunnar Morling’s blog post investigating queues for Kafka , it got me wondering if the way queues are implemented is at odds with EDA principles and best practices. This is a fairly formal analysis, mostly due to the number of formal concerns.
This analysis examines whether KIP-932 “Queues for Kafka” contradicts or undermines event-driven architecture (EDA) principles. The key question is whether separating consumption by key represents a fundamental departure from EDA principles or simply introduces an additional consumption model within Kafka’s event streaming paradigm.
Key Finding: KIP-932 does not fundamentally change Kafka’s nature as an event streaming platform but rather supplements it with queue-like processing capabilities. It maintains the core EDA principle of events as the source of truth while adding flexibility in consumption patterns.
High Value Use Cases #
These represent scenarios where KIP-932 provides substantial benefits
- Partition Count Limitations: Services that need more consumer parallelism than the current partition count allows, enabling scaling beyond the traditional “one consumer per partition” constraint
- Cost Optimization: Reduces the operational cost of maintaining large numbers of partitions in cloud-hosted Kafka deployments (especially relevant for Confluent Cloud pricing models)
- Processing Parallelism: Enables efficient parallel processing of events within a partition, improving throughput without sacrificing key-based ordering guarantees
- Dynamic Scaling: Allows horizontal scaling of consumers in response to load without reconfiguring partition counts, especially valuable in container/Kubernetes environments with tools like KEDA - this is basically an unlock to make KEDA amazing!
KIP-932 Overview #
KIP-932 introduces cooperative consumption through “share groups,” allowing multiple consumers to collectively process messages from the same partition. This feature addresses a significant limitation in Kafka’s original consumer group model, which ties scaling to partition count.
Core Features of KIP-932 #
- Share Groups: Multiple consumers can process records from the same partition
- Record-level Acknowledgment: Consumers ack individual records rather than offsets
- Key-based Consumption: Enables routing messages with the same key to the same consumer
- Work Sharing: Allows horizontal scaling of consumers independent of partition count
- Cooperative Processing: Consumers work together on partitions rather than exclusively owning them
Compatibility with Event-Driven Architecture Principles #
EDA Principle | KIP-932 Compatibility | Analysis |
---|---|---|
Events as First-class Citizens | ✅ Compatible | Events remain the fundamental unit and source of truth |
Event Log as Source of Truth | ✅ Compatible | The log structure remains unchanged; new consumption model added on top |
Decoupling of Producers/Consumers | ✅ Compatible | Maintains or enhances decoupling by allowing flexible consumption |
Immutability of Events | ✅ Compatible | Events remain immutable in the log |
Temporal Ordering | ✅ Compatible | Preserves temporal ordering within partitions |
Event Replay Capability | ✅ Compatible | Log replay capabilities remain intact |
Architectural Implications #
1. What KIP-932 Changes #
- Consumption Model: Introduces queue-like consumption patterns without changing event production
- Scaling Model: Decouples consumer scaling from partition count
- Message Processing Guarantees: Enables record-level acknowledgment rather than just offset-based
- Consumer Coordination: Allows cooperative work on partitions instead of exclusive ownership
2. What KIP-932 Preserves #
- Log-based Storage: Events are still stored in an immutable, append-only log
- Event-first Paradigm: Events remain the primary integration mechanism
- Producer Independence: Producers are unaffected and continue working exactly as before
- Event Replay: The ability to replay events from any point remains intact
- Temporal Order: Order of events within partitions remains preserved
Queue vs. EDA: Feature Comparison #
Feature | Traditional Queue | Traditional Kafka | Kafka with KIP-932 |
---|---|---|---|
Message Retention | Removed after processing | Configurable retention | Configurable retention |
Consumption Model | Competing consumers | Consumer groups tied to partitions | Flexible: Traditional or cooperative |
Processing Acknowledgment | Message-level ack | Offset-based | Both offset and record-level available |
Message Replay | Limited/None | Full replay capability | Full replay capability |
Scalability | Limited by competing consumer model | Limited by partition count | Independent of partition count |
Assessment of Architectural Impact #
KIP-932 represents an evolutionary rather than revolutionary change to Kafka’s architecture. It adds queue-like features while preserving the core event streaming foundation. The primary architectural implication is increased flexibility in how events are consumed, not a fundamental change to how events are produced, stored, or conceptualized.
Key Analysis Points: #
- Adding vs. Replacing: KIP-932 adds capabilities rather than replacing existing ones
- Opt-in Feature: Traditional consumer groups remain fully supported
- Log Foundation: The underlying log-based architecture remains unchanged
- Event Immutability: Events remain immutable in the log, preserving a core EDA principle
- Temporal Ordering: Event ordering within partitions remains preserved
Recommendations for Maintaining EDA Principles #
For teams concerned about maintaining pure EDA principles while adopting KIP-932:
- Maintain Event-First Thinking: Continue to model domain changes as events
- Use Share Groups Judiciously: Apply queue-like processing only where scaling or specific message distribution is needed
- Preserve Event Sourcing Patterns: Continue using events as the source of truth
- Document Consumption Models: Clearly separate traditional consumer groups from share groups in documentation
- Establish Architecture Guidelines: Create clear guidelines for when each consumption model is appropriate
Technical Implementation Considerations #
KIP-932 aligns with Kafka’s core principles while extending consumption capabilities. These are specific considerations when implementing:
Technical Implementation Focus Areas #
Area | Specific Consideration | Implementation Approach |
---|---|---|
Key-Based Processing | KIP-932 extends Kafka’s key-based partitioning model to consumption, maintaining the same ordering guarantees within key groups | Leverage existing key-partition design patterns; KIP-932 works naturally with current key-partitioning strategies |
Consumer Code Changes | Moving from offset commits to record acknowledgment requires specific API usage | Use the ShareGroup API explicitly rather than attempting to modify current Consumer Group implementations |
Unkeyed Topics | Share groups provide limited benefits for unkeyed topics where message ordering across the topic is important | Reserve share groups for keyed topics where the primary concern is scaling processing of independent keys |
Rebalance Handling | Share group rebalancing behavior differs from consumer groups and requires specific handling | Implement explicit tests for rebalance scenarios; behavior is well-defined but different |
Client Library Support | Adoption will depend on client library implementation across languages | Verify Share Group API support in your programming language’s client libraries before planning implementation |
Organizational Adoption Focus #
Area | Specific Focus | Implementation Approach |
---|---|---|
Usage Guidelines | Define clear criteria for when share groups are appropriate (e.g., when partition count limits are reached, when scaling is needed for throughput) | Document specific use cases with concrete examples, focusing on situations where consumer scaling shouldn’t be constrained by partition count |
Team Knowledge | Ensure engineers understand that share groups maintain Kafka’s ordering guarantees for keys | Focused training on how share groups enhance rather than change Kafka’s fundamental event-ordering properties |
Implementation Consistency | Standardize how teams implement record acknowledgment and error handling | Create organization-specific client wrappers with standardized acknowledgment patterns; focus especially on error scenarios |
Organizational Alignment | Address potential disagreements on adoption by focusing on metrics and use cases | Establish objective criteria for adoption such as CPU utilization improvements, throughput gains, or reduced partition count |
Quick Wins | Identify existing bottlenecks that are perfect candidates for share groups | Target services with known key hotspots or those that require partition counts that exceed reasonable management overhead |
Specific Application Scenarios for KIP-932 #
KIP-932 addresses very specific technical challenges that occur in real-world Kafka deployments:
Implementation Checklist #
For teams preparing to adopt Share Groups, focus on these specific technical aspects:
- Consumer Parallelism Analysis: Use monitoring tools to identify consumer groups that would benefit from additional parallelism beyond partition count; look for services with high lag or slower message processing times where adding more consumers would improve throughput
- Consumer Logic Review: Examine current consumer implementation to ensure idempotent processing, if required
- Client Library Verification: Confirm your client library implementation has proper support for the ShareGroup API and record-level acknowledgment
- Partition Count Optimization: Calculate optimal partition count based on producer througput rather than consumer parallelism requirements; this enables right-sizing partition counts to data volume rather than scaling needs
- Order-Sensitivity Assessment: Identify whether your processing has ordering requirements beyond key-level ordering (which share groups preserve) or if global topic ordering is needed (where share groups provide fewer benefits), or where key-level ordering is not required (share groups not recommended)
Technical Implementation Guide #
Specific guidance for engineering teams implementing share groups:
- Record Acknowledgment Pattern: Implement consistent record acknowledgment patterns with proper error handling to ensure processing reliability
- Key Distribution Analysis: Analyze your message key distribution to understand potential benefits; topics with diverse keys will benefit most from share groups
- Monitoring Instrumentation: Add specific metrics for share group operations - tracking record processing times, acknowledgment rates, and consumer resource utilization
- Scaling Automation: Integrate with container orchestration platforms like Kubernetes to enable dynamic scaling based on message processing metrics
- State Management: Review your application’s state management approach, as consumers now process specific keys rather than entire partitions
Technical Implementation Details #
Specific technical aspects of KIP-932 implementation that engineers should understand:
Share Group Protocol Specifics #
Key technical details of the share group protocol implementation:
- Record Delivery Mechanism: Share groups use record-level acknowledgment, enabling multiple consumers to process messages from the same partition concurrently
- Broker-side Management: The broker maintains the state of acknowledged records, extending Kafka’s traditional broker responsibilities (a departure from Kafka’s traditional “dumb brokers, smart clients” philosophy)
- Message Distribution: The protocol distributes messages to consumers based on key affinity while enabling processing parallelism
- Rebalance Protocol: Leverages the cooperative rebalancing protocol (introduced in KIP-429 ) to minimize disruption during consumer scaling or failover, and will need to consider future improvements from KIP-848 which is not yet available (at the time of writing) in Confluent Cloud
- Consumer Coordination: Share group consumers coordinate processing through the broker rather than through direct partition ownership
Monitoring Metrics That Matter #
Specific metrics to implement for share group monitoring:
- Record Processing Latency: Track processing time for records to identify throughput bottlenecks by implementing custom metrics in your consumer application
- Message Consumption Rate: Monitor the rate at which messages are being consumed by the share group
- Record Acknowledgment Rate: Track record acknowledgment rates to identify processing issues
- Consumer Resource Utilization: Monitor CPU, memory, and network usage per consumer to optimize scaling
- Rebalance Frequency & Duration: Measure rebalance operations which may affect processing latency
- Unacknowledged Record Count: Track records that remain unacknowledged beyond expected processing timeframes
Architectural Solutions Enabled by KIP-932 #
Specific architectural patterns that share groups enable:
- Parallel Processing Model: Process messages from the same partition in parallel while maintaining key-based ordering
- Consumer Scaling Beyond Partition Limits: Scale consumers beyond the traditional partition count limitation
- Partition Count Optimization: Size partition count based on producer throughput and storage requirements rather than consumer scaling needs
- Dynamic Consumer Scaling: Scale consumers up/down independently of partition structure
Comparison with RabbitMQ #
System | Feature | Specific Technical Comparison to KIP-932 |
---|---|---|
RabbitMQ | Competing Consumers | KIP-932 maintains strict per-key ordering guarantees while enabling parallel consumption, combining queue-like processing with Kafka’s immutable log model for replay capabilities |
Conclusion: Technical Reality #
KIP-932 “Queues for Kafka” represents a natural evolution of Kafka’s consumption model that addresses real operational challenges without compromising its fundamental principles. The key points to understand:
-
Alignment with Core Principles: Share groups extend Kafka’s key-based partitioning model to consumption, maintaining the ordering guarantees that are central to Kafka’s design while adding flexibility
-
Performance Optimization: Share groups enable more efficient resource utilization by allowing consumer scaling independent of partition count constraints
-
Technical Continuity: The feature preserves event immutability, temporal ordering, log persistence, and replay capabilities - core EDA principles remain fully intact
-
Implementation Considerations: Share groups introduce record-level acknowledgment and new broker responsibilities, representing a shift from Kafka’s traditional “dumb brokers, smart clients” approach
-
Operational Benefits: Direct benefits include right-sizing partition counts based on data needs rather than scaling constraints, enabling more flexible consumer scaling models, and optimizing resource utilization
The value proposition is clear: KIP-932 adds capabilities that address real operational constraints while preserving Kafka’s core architectural strengths. It enhances Kafka’s consumption model without requiring teams to compromise on EDA principles, making it a pragmatic enhancement that respects Kafka’s fundamental design philosophy. For additional insights, Gunnar Morling provides an excellent overview in his analysis .