N9INE
Services
Case StudiesBlogAbout
hello@n9ine.com

STOP GUESSING. START KNOWING.

Book a Free Consultation

One Insight a Month Worth More Than Most Consulting Calls

Real case studies, proven frameworks, and actionable data strategies — no fluff, just what works. Join data leaders who read this before making decisions.

Drop us a line

hello@n9ine.com

LinkedIn

Connect with us

© 2026 N9ine Data Analytics. All rights reserved.

Blog/Data Warehouse Design: Performance Guide
Data Engineering7 min readNovember 2, 2025

Data Warehouse Design: Performance Guide

Bad warehouse design kills performance. We redesigned warehouses at 50+ companies. Learn dimensional modeling, star vs snowflake schemas, and optimization strategies.

Bad warehouse design kills performance. We redesigned warehouses at 50+ companies. Your schema choice affects query speed, costs, and how easily you can add new features.

Start with Dimensional Modeling

Most warehouses use dimensional modeling. It separates data into two types:

Fact tables: Business events (sales, orders, clicks) Dimension tables: Descriptive details (customers, products, dates)

This structure gives you:

  • Fast queries through smart joins
  • Clear business meaning
  • Easy changes when requirements shift

Schema Patterns

Star Schema

Structure:

  • One central fact table
  • Multiple dimension tables connected directly

Example:

Sales Fact
├── Customer Dimension
├── Product Dimension
├── Date Dimension
└── Store Dimension

Pros:

  • Simple to understand
  • Fast queries (fewer joins)
  • Easy to implement

Cons:

  • Some denormalization (data redundancy)
  • Larger dimension tables

Best for:

  • Most analytics use cases
  • Medium to large datasets
  • Teams new to dimensional modeling

Snowflake Schema

Structure:

  • One central fact table
  • Normalized dimension tables (dimensions have sub-dimensions)

Example:

Sales Fact
├── Customer Dimension
│   ├── Geography Dimension
│   └── Demographics Dimension
├── Product Dimension
│   ├── Category Dimension
│   └── Brand Dimension
└── Date Dimension

Pros:

  • No data redundancy
  • Easier to maintain dimension changes
  • Lower storage requirements

Cons:

  • More joins (slower queries)
  • More complex to understand
  • Harder to implement

Best for:

  • Very large datasets where storage matters
  • Highly normalized source systems
  • Advanced teams comfortable with complexity

Our recommendation: Start with star schema. Normalize to snowflake only if you have specific storage or maintenance requirements.

Fact Table Design

Granularity

Choose the right granularity (level of detail):

Transaction-level:

  • One row per transaction
  • Most detailed, most flexible
  • Largest storage requirements

Aggregated:

  • One row per time period + dimensions
  • Smaller storage, faster queries
  • Less flexible

Recommendation: Store at the lowest granularity you might need. Aggregate for specific use cases, but keep detailed data available.

Measures and Metrics

Additive Measures:

  • Can be summed across any dimension
  • Examples: revenue, quantity, cost

Semi-additive Measures:

  • Can be summed across some dimensions, not others
  • Example: balances (sum across accounts, not across time)

Non-additive Measures:

  • Cannot be summed
  • Example: ratios, percentages, averages

Design rule: Store additive and semi-additive measures in fact tables. Calculate non-additive measures in queries or views.

Surrogate Keys

Use synthetic keys (auto-incrementing IDs) instead of natural keys:

Why:

  • Handle changes to natural keys
  • Improve join performance
  • Separate business logic from technical keys

Implementation:

  • Fact tables: Composite key (surrogate keys from all dimensions)
  • Dimension tables: Surrogate key + natural key + change tracking

Dimension Table Design

Slowly Changing Dimensions (SCD)

Dimensions change over time. Choose the right strategy:

Type 1: Overwrite

  • New value replaces old value
  • No history maintained

Use when: Historical accuracy doesn't matter

Type 2: Add New Row

  • New row with new surrogate key
  • Old row kept with end date
  • Complete history maintained

Use when: Historical accuracy is critical

Type 3: Add New Column

  • Keep current and previous value
  • Limited history (only one previous)

Use when: You need to track one previous value

Most common: Type 2 for customer/product dimensions, Type 1 for reference data.

Degenerate Dimensions

Some transaction attributes don't belong in dimension tables:

Examples:

  • Order number
  • Invoice number
  • Transaction ID

Store in: Fact table itself (as degenerate dimensions)

Why:

  • High cardinality (too many distinct values)
  • Only used for filtering, not grouping
  • No descriptive attributes

Performance Optimization

Partitioning

Partition large fact tables:

Common strategies:

  • Date partitioning: By day/month/year
  • Range partitioning: By value ranges
  • Hash partitioning: For even distribution

Best practice: Partition by the dimension you filter on most often (usually date).

Clustering

Group related data together:

Columnar databases (Snowflake, BigQuery) automatically cluster, but you can improve:

Snowflake:

  • Cluster by dimensions used in WHERE clauses
  • Limit to 4-5 columns

BigQuery:

  • Partition by date
  • Cluster by up to 4 dimensions

Indexing

Traditional warehouses (Redshift, Postgres):

  • Index foreign keys in fact tables
  • Index dimension natural keys
  • Consider bitmap indexes for low-cardinality columns

Modern warehouses (Snowflake, BigQuery):

  • Automatic indexing
  • Focus on partitioning and clustering

Materialized Views

Pre-aggregate common queries:

Use for:

  • Frequently accessed aggregations
  • Complex calculations
  • Cross-table joins

Trade-offs:

  • Faster queries
  • Storage cost
  • Maintenance overhead (refresh schedules)

Best practice: Materialize aggregations by day, week, month for common time periods.

Design Patterns

Conformed Dimensions

Use the same dimension tables across fact tables:

Example:

  • Same Date dimension for Sales and Inventory facts
  • Same Customer dimension for Orders and Support facts

Benefits:

  • Consistent reporting
  • Easier cross-functional analysis
  • Reduced maintenance

Factless Fact Tables

Track events without measures:

Use cases:

  • Student attendance (student, date, class)
  • Event tracking (user, event_type, timestamp)
  • Status changes (entity, status, timestamp)

Implementation: Fact table with only dimension keys, no measures.

Aggregate Fact Tables

Pre-summarize detailed facts:

Example:

  • Daily sales (from transaction-level facts)
  • Monthly customer metrics (from daily facts)

Benefits:

  • Faster queries for common aggregations
  • Lower compute costs

Trade-off:

  • Storage overhead
  • Need to refresh aggregations

Common Mistakes

Over-Normalization

Normalizing too much hurts performance.

Solution: Denormalize dimensions for query performance. Storage is cheap, query time is expensive.

Too Much Granularity

Storing more detail than you'll ever use.

Solution: Start with business requirements. Add detail only if needed.

Ignoring Query Patterns

Designing without understanding how data will be queried.

Solution: Review common queries. Design schema to speed up those queries.

Poor Partitioning

Partitioning by wrong columns or too many partitions.

Solution: Partition by most common filter (usually date). Limit partition count (aim for <1000 partitions).

No Documentation

Schema becomes incomprehensible over time.

Solution: Document business logic, transformations, and data lineage.

Evolution and Maintenance

Adding New Dimensions

Process:

  1. Create new dimension table
  2. Add foreign key to fact table
  3. Populate dimension with historical data
  4. Update ETL to populate going forward

Handling Schema Changes

New measures:

  • Add columns to fact table
  • Backfill if needed
  • Update queries gradually

New dimensions:

  • Create dimension table
  • Add to fact table
  • Populate and maintain

Data Quality

Enforce at load time:

  • Foreign key constraints (where supported)
  • Not null constraints for required fields
  • Data type validation

Monitor continuously:

  • Referential integrity checks
  • Completeness checks
  • Distribution monitoring

Tools and Best Practices

dbt for Modeling

Use dbt to define models as code:

Benefits:

  • Version control
  • Testing
  • Documentation
  • Reusability

Pattern:

  • Staging models: Clean source data
  • Intermediate models: Transformations
  • Mart models: Business-ready datasets

Documentation

Document everything:

  • Business definitions
  • Data sources
  • Transformations
  • Known issues
  • Usage examples

Conclusion

Good warehouse design balances performance, flexibility, and maintainability. Start with star schema, partition wisely, and improve based on actual query patterns. Remember: perfect is the enemy of good. Design for your current needs, but plan for evolution.

The best warehouse design is one that:

  • Answers business questions quickly
  • Adapts to changing requirements
  • Is understandable by your team
  • Can scale with your data

Focus on these principles, and you'll build warehouses that serve your organization well for years to come.

All postsBook a consultation