Blog/When to Use Real-Time Analytics

Analytics6 min readNovember 2, 2025

When to Use Real-Time Analytics

Real-time analytics costs 3x more and is much harder to build. We helped companies decide when it makes sense. Learn when you need it and which tools work.

Everyone wants real-time data. But it costs 3x more and is much harder to build. We helped companies decide when real-time makes sense. Here's what we learned.

Most Analytics Don't Need Real-Time

Companies chase real-time because it sounds impressive. But most decisions work fine with data that's a few hours old.

Ask these questions first:

Business Questions

What decisions are made with this data?
- If decisions are made daily/weekly, real-time isn't needed
- If decisions are made hourly, near-real-time (5-15 min delay) might suffice
- If decisions are made continuously, real-time might be necessary
What's the cost of delay?
- Low: Batch processing is fine
- Medium: Near-real-time (minutes delay)
- High: True real-time (seconds delay)
What's the cost of building real-time?
- 3-5x more expensive than batch
- Requires specialized expertise
- More complex to maintain and debug

Common Real-time Use Cases:

Fraud detection
Real-time pricing
Live dashboards for operations
Alerting and monitoring
Personalization engines

Common Non-Real-time Use Cases:

Monthly financial reports
Marketing campaign analysis
Product usage analytics
Customer segmentation

When Real-time Makes Sense

Operational Dashboards

Scenario: Operations team needs to see current system state

Requirements:

Latency: < 1 minute
Data freshness: < 5 minutes
Query patterns: Simple aggregations, filtering

Architecture:

Stream processing (Kafka + Kafka Streams/KSQL)
Real-time database (Redis, TimescaleDB)
Dashboard (Grafana, custom)

Fraud Detection

Scenario: Detect fraudulent transactions before completion

Requirements:

Latency: < 1 second
Data freshness: Real-time
Query patterns: Complex ML models, rule engines

Architecture:

Event streaming (Kafka)
Stream processing (Flink, Spark Streaming)
ML model serving (TensorFlow Serving, SageMaker)
Feature store (Feast)

Real-time Personalization

Scenario: Personalize user experience based on current behavior

Requirements:

Latency: < 100ms
Data freshness: < 30 seconds
Query patterns: Feature lookups, recommendations

Architecture:

Event collection (Segment, Snowplow)
Stream processing (Kinesis, Kafka)
Feature store (Redis, DynamoDB)
Serving layer (API with low latency)

Architecture Patterns

Lambda Architecture

Separate batch and stream processing:

Components:

Batch layer: Processes all historical data, creates authoritative datasets
Speed layer: Processes recent data for real-time views
Serving layer: Combines batch and speed layer results

When to use:

Need both historical accuracy and real-time views
Can tolerate eventual consistency

Trade-offs:

Complex to maintain (two codebases)
Eventual consistency between layers
Higher operational overhead

Kappa Architecture

Single stream processing pipeline:

Components:

Stream layer: Processes all data as streams
Serving layer: Queries stream results

When to use:

Can reprocess historical data through stream pipeline
Prefer simpler architecture
Okay with stream processing limitations

Trade-offs:

Reprocessing can be slow
Less mature tooling
Harder to handle late-arriving data

Hybrid Approach

Use real-time only where needed:

Components:

Batch layer: Most data processing
Real-time layer: Only for specific use cases
Serving layer: Routes queries to appropriate layer

When to use:

Most analytics are batch-friendly
Only specific features need real-time
Want to minimize complexity and cost

Trade-offs:

Some complexity from managing two systems
Need to route queries correctly
Generally the most practical approach

Technology Choices

Stream Processing

Apache Kafka:

Industry standard
Excellent ecosystem
Requires operational expertise

Amazon Kinesis:

Fully managed
Good AWS integration
Less flexible than Kafka

Google Pub/Sub:

Simple to use
Good GCP integration
Less feature-rich than Kafka

Processing Frameworks

Apache Flink:

Best for complex event processing
Strong exactly-once guarantees
Steep learning curve

Apache Spark Streaming:

Familiar API (if you know Spark)
Good for batch + stream unification
Higher latency than Flink

Kafka Streams:

Simple if you're already using Kafka
Embedded library (no cluster needed)
Limited scalability

Storage

Redis:

Very fast
Limited data structures
In-memory (costly at scale)

TimescaleDB:

SQL interface
Good for time-series
Less flexible than NoSQL

DynamoDB:

Fully managed
Serverless scaling
Limited query patterns

Building Real-time Analytics: Step by Step

Start with Events

Collect events from your application:

// Example: Track page views
analytics.track('page_view', {
  userId: user.id,
  page: '/products',
  timestamp: Date.now()
});

Tools:

Segment (hosted)
Snowplow (self-hosted)
Custom Kafka producers

Stream Processing

Process events in real-time:

# Example: Count page views per minute
from kafka import KafkaConsumer
from collections import defaultdict

consumer = KafkaConsumer('page_views')
counts = defaultdict(int)

for message in consumer:
    event = json.loads(message.value)
    minute = event['timestamp'] // 60000
    counts[(event['page'], minute)] += 1

    # Update dashboard
    update_dashboard(counts)

Storage

Store aggregated results:

Options:

In-memory: Redis (for frequently accessed data)
Time-series DB: TimescaleDB (for historical queries)
Warehouse: Snowflake/BigQuery (for complex analytics)

Serving Layer

Expose data to applications:

Patterns:

REST API for dashboards
GraphQL for flexible queries
WebSockets for live updates
gRPC for high-performance services

Common Challenges

Late-Arriving Data

Events can arrive out of order or late.

Solutions:

Use event time, not processing time
Implement watermarks
Have windows that can be updated retroactively

State Management

Stream processing often requires maintaining state.

Solutions:

Use stateful stream processing (Flink, Kafka Streams)
Store state in external store (Redis, DynamoDB)
Keep state minimal and partition correctly

Exactly-Once Semantics

Prevent duplicate processing.

Solutions:

Idempotent operations
Transactional processing
Deduplication at consumption

Monitoring and Debugging

Real-time systems are harder to debug.

Solutions:

Comprehensive logging
Metrics for throughput and latency
Ability to replay events
Test with sample data streams

Cost Considerations

Real-time analytics cost more:

Cost Drivers:

Stream processing infrastructure
Low-latency storage
Higher compute requirements
Operational complexity

Cost Optimization:

Only process what you need in real-time
Use sampling for high-volume streams
Archive old data to cheaper storage
Right-size resources based on actual load

When to Avoid Real-time

Don't build real-time if:

Batch is sufficient: Your use case doesn't require low latency
Cost concerns: You can't justify 3-5x cost increase
Limited resources: You don't have team expertise
Unclear requirements: You're not sure what real-time means for your use case

Alternative: Start with near-real-time (5-15 minute batches). It's simpler, cheaper, and sufficient for most cases.

Conclusion

Real-time analytics are powerful but expensive. Before building real-time systems, verify you actually need them. When you do, start simple: collect events, process streams, store results, and serve to applications. Scale complexity as requirements grow.

Most analytics don't need to be real-time. Start with batch, add real-time only where it provides clear business value.

Blog/When to Use Real-Time Analytics

Analytics6 min readNovember 2, 2025

When to Use Real-Time Analytics

Real-time analytics costs 3x more and is much harder to build. We helped companies decide when it makes sense. Learn when you need it and which tools work.

Everyone wants real-time data. But it costs 3x more and is much harder to build. We helped companies decide when real-time makes sense. Here's what we learned.

Most Analytics Don't Need Real-Time

Companies chase real-time because it sounds impressive. But most decisions work fine with data that's a few hours old.

Ask these questions first:

Business Questions

What decisions are made with this data?
- If decisions are made daily/weekly, real-time isn't needed
- If decisions are made hourly, near-real-time (5-15 min delay) might suffice
- If decisions are made continuously, real-time might be necessary
What's the cost of delay?
- Low: Batch processing is fine
- Medium: Near-real-time (minutes delay)
- High: True real-time (seconds delay)
What's the cost of building real-time?
- 3-5x more expensive than batch
- Requires specialized expertise
- More complex to maintain and debug

Common Real-time Use Cases:

Fraud detection
Real-time pricing
Live dashboards for operations
Alerting and monitoring
Personalization engines

Common Non-Real-time Use Cases:

Monthly financial reports
Marketing campaign analysis
Product usage analytics
Customer segmentation

When Real-time Makes Sense

Operational Dashboards

Scenario: Operations team needs to see current system state

Requirements:

Latency: < 1 minute
Data freshness: < 5 minutes
Query patterns: Simple aggregations, filtering

Architecture:

Stream processing (Kafka + Kafka Streams/KSQL)
Real-time database (Redis, TimescaleDB)
Dashboard (Grafana, custom)

Fraud Detection

Scenario: Detect fraudulent transactions before completion

Requirements:

Latency: < 1 second
Data freshness: Real-time
Query patterns: Complex ML models, rule engines

Architecture:

Event streaming (Kafka)
Stream processing (Flink, Spark Streaming)
ML model serving (TensorFlow Serving, SageMaker)
Feature store (Feast)

Real-time Personalization

Scenario: Personalize user experience based on current behavior

Requirements:

Latency: < 100ms
Data freshness: < 30 seconds
Query patterns: Feature lookups, recommendations

Architecture:

Event collection (Segment, Snowplow)
Stream processing (Kinesis, Kafka)
Feature store (Redis, DynamoDB)
Serving layer (API with low latency)

Architecture Patterns

Lambda Architecture

Separate batch and stream processing:

Components:

Batch layer: Processes all historical data, creates authoritative datasets
Speed layer: Processes recent data for real-time views
Serving layer: Combines batch and speed layer results

When to use:

Need both historical accuracy and real-time views
Can tolerate eventual consistency

Trade-offs:

Complex to maintain (two codebases)
Eventual consistency between layers
Higher operational overhead

Kappa Architecture

Single stream processing pipeline:

Components:

Stream layer: Processes all data as streams
Serving layer: Queries stream results

When to use:

Can reprocess historical data through stream pipeline
Prefer simpler architecture
Okay with stream processing limitations

Trade-offs:

Reprocessing can be slow
Less mature tooling
Harder to handle late-arriving data

Hybrid Approach

Use real-time only where needed:

Components:

Batch layer: Most data processing
Real-time layer: Only for specific use cases
Serving layer: Routes queries to appropriate layer

When to use:

Most analytics are batch-friendly
Only specific features need real-time
Want to minimize complexity and cost

Trade-offs:

Some complexity from managing two systems
Need to route queries correctly
Generally the most practical approach

Technology Choices

Stream Processing

Apache Kafka:

Industry standard
Excellent ecosystem
Requires operational expertise

Amazon Kinesis:

Fully managed
Good AWS integration
Less flexible than Kafka

Google Pub/Sub:

Simple to use
Good GCP integration
Less feature-rich than Kafka

Processing Frameworks

Apache Flink:

Best for complex event processing
Strong exactly-once guarantees
Steep learning curve

Apache Spark Streaming:

Familiar API (if you know Spark)
Good for batch + stream unification
Higher latency than Flink

Kafka Streams:

Simple if you're already using Kafka
Embedded library (no cluster needed)
Limited scalability

Storage

Redis:

Very fast
Limited data structures
In-memory (costly at scale)

TimescaleDB:

SQL interface
Good for time-series
Less flexible than NoSQL

DynamoDB:

Fully managed
Serverless scaling
Limited query patterns

Building Real-time Analytics: Step by Step

Start with Events

Collect events from your application:

// Example: Track page views
analytics.track('page_view', {
  userId: user.id,
  page: '/products',
  timestamp: Date.now()
});

Tools:

Segment (hosted)
Snowplow (self-hosted)
Custom Kafka producers

Stream Processing

Process events in real-time:

# Example: Count page views per minute
from kafka import KafkaConsumer
from collections import defaultdict

consumer = KafkaConsumer('page_views')
counts = defaultdict(int)

for message in consumer:
    event = json.loads(message.value)
    minute = event['timestamp'] // 60000
    counts[(event['page'], minute)] += 1

    # Update dashboard
    update_dashboard(counts)

Storage

Store aggregated results:

Options:

In-memory: Redis (for frequently accessed data)
Time-series DB: TimescaleDB (for historical queries)
Warehouse: Snowflake/BigQuery (for complex analytics)

Serving Layer

Expose data to applications:

Patterns:

REST API for dashboards
GraphQL for flexible queries
WebSockets for live updates
gRPC for high-performance services

Common Challenges

Late-Arriving Data

Events can arrive out of order or late.

Solutions:

Use event time, not processing time
Implement watermarks
Have windows that can be updated retroactively

State Management

Stream processing often requires maintaining state.

Solutions:

Use stateful stream processing (Flink, Kafka Streams)
Store state in external store (Redis, DynamoDB)
Keep state minimal and partition correctly

Exactly-Once Semantics

Prevent duplicate processing.

Solutions:

Idempotent operations
Transactional processing
Deduplication at consumption

Monitoring and Debugging

Real-time systems are harder to debug.

Solutions:

Comprehensive logging
Metrics for throughput and latency
Ability to replay events
Test with sample data streams

Cost Considerations

Real-time analytics cost more:

Cost Drivers:

Stream processing infrastructure
Low-latency storage
Higher compute requirements
Operational complexity

Cost Optimization:

Only process what you need in real-time
Use sampling for high-volume streams
Archive old data to cheaper storage
Right-size resources based on actual load

When to Avoid Real-time

Don't build real-time if:

Batch is sufficient: Your use case doesn't require low latency
Cost concerns: You can't justify 3-5x cost increase
Limited resources: You don't have team expertise
Unclear requirements: You're not sure what real-time means for your use case

Alternative: Start with near-real-time (5-15 minute batches). It's simpler, cheaper, and sufficient for most cases.

Conclusion

Most analytics don't need to be real-time. Start with batch, add real-time only where it provides clear business value.