N9INE
Services
Case StudiesBlogAbout
hello@n9ine.com

STOP GUESSING. START KNOWING.

Book a Free Consultation

One Insight a Month Worth More Than Most Consulting Calls

Real case studies, proven frameworks, and actionable data strategies — no fluff, just what works. Join data leaders who read this before making decisions.

Drop us a line

hello@n9ine.com

LinkedIn

Connect with us

© 2026 N9ine Data Analytics. All rights reserved.

Blog/MLOps Guide: Production ML in 2025
Data Engineering13 min readNovember 12, 2025

MLOps Guide: Production ML in 2025

Deploy and manage ML models in production. Real deployment patterns, monitoring strategies, and lessons from building 50+ production ML systems.

You built a machine learning model. It works great in your notebook. Accuracy is 94%. You're ready to ship it.

Then reality hits. How do you deploy this? How do you monitor it? What happens when performance drops? How do you update it without breaking everything?

This is where MLOps comes in. MLOps is the practice of deploying, monitoring, and maintaining machine learning models in production. We've set up MLOps pipelines for dozens of companies. Here's what actually works.

What Is MLOps?

MLOps stands for Machine Learning Operations. It's the set of practices that help you:

  • Deploy models to production reliably
  • Monitor model performance over time
  • Retrain and update models safely
  • Track model versions and experiments
  • Manage the full ML lifecycle

Think of it as DevOps for machine learning. DevOps helps you ship code. MLOps helps you ship models.

The problem: Building a model is maybe 20% of the work. Getting it to production and keeping it working? That's the other 80%.

The solution: MLOps gives you the tools and processes to handle that 80% systematically.

Why MLOps Matters

Most ML projects fail in production. Not because the model is bad. Because the infrastructure around it breaks.

Common failures:

  • Models work in development but fail in production
  • Performance degrades over time (model drift)
  • Updates break existing systems
  • No visibility into what's happening
  • Can't reproduce results

What MLOps fixes:

  • Reliable deployment pipelines
  • Continuous monitoring and alerts
  • Version control for models and data
  • Automated retraining workflows
  • Rollback capabilities

We've seen companies lose months of work because they didn't have MLOps. We've also seen teams ship models in days because they did.

The MLOps Lifecycle

MLOps covers the entire lifecycle of a model. From training to retirement.

1. Development

This is where you build and experiment with models. You're in Jupyter notebooks, trying different algorithms, tuning hyperparameters.

Tools:

  • Jupyter notebooks
  • Experiment tracking (MLflow, Weights & Biases)
  • Version control (Git)
  • Local development environments

What to track:

  • Model code and configurations
  • Training data versions
  • Hyperparameters
  • Metrics and results
  • Environment dependencies

2. Training

Once you have a model that works, you need to train it reliably and reproducibly.

Key practices:

  • Automated training pipelines
  • Data versioning
  • Reproducible environments
  • Experiment tracking
  • Model versioning

Example workflow:

  1. New data arrives
  2. Trigger training pipeline
  3. Train model with tracked parameters
  4. Evaluate on test set
  5. Compare to previous models
  6. Register if better

3. Deployment

Getting your model into production where it can make real predictions.

Deployment patterns:

Batch inference:

  • Run predictions on schedule
  • Process large datasets
  • Lower latency requirements
  • Example: Daily customer churn predictions

Real-time inference:

  • Predictions on demand
  • Low latency required
  • API endpoints
  • Example: Fraud detection on transactions

Edge deployment:

  • Model runs on device
  • No network required
  • Example: Mobile app recommendations

What you need:

  • Model serving infrastructure
  • API endpoints
  • Load balancing
  • Health checks
  • Rollback capabilities

4. Monitoring

Once deployed, you need to watch what's happening. Models degrade over time.

What to monitor:

Model performance:

  • Prediction accuracy
  • Latency and throughput
  • Error rates
  • Resource usage

Data quality:

  • Input data distribution
  • Missing values
  • Outliers
  • Schema changes

Model drift:

  • Concept drift (relationships change)
  • Data drift (input distribution changes)
  • Performance degradation

Infrastructure:

  • CPU, memory, disk usage
  • API response times
  • Error rates
  • Request volumes

Example alert: Your fraud detection model's accuracy drops from 94% to 87% over two weeks. You get an alert. You investigate. Turns out the transaction patterns changed. Time to retrain.

5. Retraining

Models need updates. New data arrives. Patterns change. Performance degrades.

When to retrain:

  • Scheduled (daily, weekly, monthly)
  • Performance drops below threshold
  • New data available
  • Significant data drift detected

Retraining workflow:

  1. Trigger retraining (manual or automatic)
  2. Train new model version
  3. Evaluate on holdout set
  4. Compare to current production model
  5. Deploy if better
  6. Rollback if worse

Automation: Set up pipelines that retrain automatically when conditions are met. Saves time and keeps models fresh.

6. Retirement

Eventually, models become obsolete. They need to be retired.

When to retire:

  • Replaced by better model
  • Business requirements changed
  • No longer needed
  • Too expensive to maintain

Retirement process:

  1. Stop serving predictions
  2. Archive model artifacts
  3. Document retirement reason
  4. Update monitoring (remove alerts)
  5. Clean up infrastructure

MLOps Tools and Platforms

You have options. From open-source tools to managed platforms.

Experiment Tracking

MLflow:

  • Open-source
  • Tracks experiments, models, artifacts
  • Model registry
  • Deployment tools
  • Works with any framework

Weights & Biases (W&B):

  • Cloud-based
  • Experiment tracking
  • Model versioning
  • Team collaboration
  • Free tier available

Neptune:

  • Experiment tracking
  • Model registry
  • Team collaboration
  • Integrates with popular frameworks

Model Serving

TensorFlow Serving:

  • Serves TensorFlow models
  • High performance
  • REST and gRPC APIs
  • Version management

TorchServe:

  • Serves PyTorch models
  • REST APIs
  • Model versioning
  • Multi-model serving

KServe (formerly KFServing):

  • Kubernetes-native
  • Supports multiple frameworks
  • Auto-scaling
  • Canary deployments

Seldon Core:

  • Kubernetes-based
  • A/B testing
  • Multi-armed bandits
  • Advanced routing

Managed Platforms

AWS SageMaker:

  • End-to-end ML platform
  • Training, deployment, monitoring
  • Fully managed
  • Integrates with AWS services

Google Vertex AI:

  • Unified ML platform
  • AutoML capabilities
  • Model serving
  • Monitoring and explainability

Azure Machine Learning:

  • Complete ML lifecycle
  • MLOps pipelines
  • Model registry
  • Deployment options

Databricks:

  • Unified analytics platform
  • MLflow integration
  • Model serving
  • Feature store

Open-Source MLOps Stacks

Kubeflow:

  • Kubernetes-native
  • End-to-end pipelines
  • Model serving
  • Experiment tracking
  • Steeper learning curve

MLflow:

  • Experiment tracking
  • Model registry
  • Model serving
  • Works with any infrastructure

Prefect / Airflow:

  • Workflow orchestration
  • Pipeline management
  • Scheduling
  • Not ML-specific but widely used

Building Your MLOps Pipeline

Here's how we typically set up MLOps for companies.

Step 1: Version Control

Start with version control for everything.

What to version:

  • Model code
  • Training scripts
  • Configuration files
  • Data schemas
  • Environment files

Tools:

  • Git for code
  • DVC (Data Version Control) for data
  • MLflow for model artifacts

Example structure:

project/
  models/
    train.py
    predict.py
  data/
    raw/
    processed/
  config/
    training.yaml
    serving.yaml
  notebooks/
  tests/

Step 2: Experiment Tracking

Track all your experiments. You'll thank yourself later.

What to track:

  • Hyperparameters
  • Metrics (accuracy, F1, etc.)
  • Training data version
  • Model artifacts
  • Environment info

Setup:

import mlflow

mlflow.set_experiment("fraud_detection")

with mlflow.start_run():
    # Log parameters
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("epochs", 100)
    
    # Train model
    model = train_model(params)
    
    # Log metrics
    mlflow.log_metric("accuracy", 0.94)
    mlflow.log_metric("f1_score", 0.91)
    
    # Log model
    mlflow.sklearn.log_model(model, "model")

Step 3: Automated Training

Set up pipelines that train models automatically.

Pipeline steps:

  1. Load and validate data
  2. Preprocess data
  3. Train model
  4. Evaluate model
  5. Register if better

Example with Prefect:

from prefect import flow, task

@task
def load_data():
    # Load training data
    return data

@task
def train_model(data):
    # Train model
    return model

@task
def evaluate_model(model, test_data):
    # Evaluate
    return metrics

@flow
def training_pipeline():
    data = load_data()
    model = train_model(data)
    metrics = evaluate_model(model, test_data)
    
    if metrics["accuracy"] > threshold:
        register_model(model)

Step 4: Model Deployment

Deploy models to serve predictions.

For batch inference:

  • Scheduled jobs
  • Process data in batches
  • Write results to database

For real-time inference:

  • API endpoints
  • Load balancing
  • Auto-scaling
  • Health checks

Example API with FastAPI:

from fastapi import FastAPI
import mlflow

app = FastAPI()
model = mlflow.sklearn.load_model("models:/fraud_detection/1")

@app.post("/predict")
def predict(transaction: dict):
    prediction = model.predict([transaction])
    return {"fraud_probability": prediction[0]}

Step 5: Monitoring

Set up monitoring for everything.

What to monitor:

  • Prediction latency
  • Error rates
  • Model performance (if you have labels)
  • Data drift
  • Resource usage

Example monitoring:

import time
from prometheus_client import Counter, Histogram

prediction_latency = Histogram('prediction_latency_seconds')
prediction_errors = Counter('prediction_errors_total')

@app.post("/predict")
def predict(transaction: dict):
    start = time.time()
    try:
        prediction = model.predict([transaction])
        latency = time.time() - start
        prediction_latency.observe(latency)
        return {"fraud_probability": prediction[0]}
    except Exception as e:
        prediction_errors.inc()
        raise

Step 6: Automated Retraining

Set up pipelines that retrain models automatically.

Triggers:

  • Scheduled (daily, weekly)
  • Performance threshold
  • Data drift detected
  • New data available

Workflow:

  1. Check if retraining needed
  2. Train new model
  3. Evaluate
  4. Compare to production
  5. Deploy if better
  6. Rollback if worse

Common MLOps Patterns

Different use cases need different patterns.

Pattern 1: Simple Batch Pipeline

Use case: Daily predictions on historical data

Setup:

  • Scheduled training job
  • Batch inference job
  • Results to database
  • Basic monitoring

Tools:

  • Cron or scheduler
  • Simple scripts
  • Database for results

Example: Daily customer churn predictions. Train weekly. Predict daily. Store results in database.

Pattern 2: Real-Time API

Use case: Low-latency predictions on demand

Setup:

  • Model serving API
  • Load balancer
  • Auto-scaling
  • Real-time monitoring

Tools:

  • FastAPI or Flask
  • Kubernetes or cloud functions
  • Monitoring tools

Example: Fraud detection API. Predictions in <100ms. Handles 1000 requests/second.

Pattern 3: A/B Testing

Use case: Testing new models against current

Setup:

  • Two model versions
  • Traffic splitting
  • Performance comparison
  • Gradual rollout

Tools:

  • Seldon Core
  • KServe
  • Custom routing

Example: New recommendation model. 10% traffic to new model. Compare metrics. Roll out if better.

Pattern 4: Continuous Training

Use case: Models that retrain automatically

Setup:

  • Automated retraining pipeline
  • Performance monitoring
  • Auto-deployment
  • Rollback on failure

Tools:

  • MLflow
  • Kubeflow
  • Managed platforms

Example: Fraud detection model. Retrains weekly. Auto-deploys if better. Alerts on degradation.

Best Practices

What we've learned from 50+ implementations.

1. Start Simple

Don't over-engineer. Start with the basics:

  • Version control
  • Basic monitoring
  • Simple deployment
  • Manual retraining

Add complexity as you need it.

2. Monitor Everything

You can't fix what you can't see. Monitor:

  • Model performance
  • Data quality
  • Infrastructure
  • Business metrics

3. Automate Gradually

Start manual. Automate what you do repeatedly. Don't automate everything at once.

4. Version Everything

Code, data, models, configs. Version it all. You'll need to reproduce results.

5. Test Before Deploying

Test models like you test code:

  • Unit tests for preprocessing
  • Integration tests for pipelines
  • Performance tests for serving
  • A/B tests in production

6. Plan for Rollback

Things break. Have a way to roll back quickly. Keep previous model versions ready.

7. Document Decisions

Why did you choose this model? What were the trade-offs? Document it. Future you will thank you.

8. Start with Batch

Real-time is harder. Start with batch inference. Move to real-time when you need it.

Common Pitfalls

Things that go wrong and how to avoid them.

Pitfall 1: Training-Serving Skew

Problem: Model works in training but fails in production.

Cause: Different data preprocessing, missing features, environment differences.

Solution:

  • Use same preprocessing code
  • Log inputs and outputs
  • Test with production-like data
  • Monitor data distributions

Pitfall 2: No Monitoring

Problem: Model performance degrades and you don't know.

Solution:

  • Set up monitoring from day one
  • Alert on performance drops
  • Track data distributions
  • Monitor business metrics

Pitfall 3: Manual Everything

Problem: Retraining takes days. Deployments are risky.

Solution:

  • Automate training pipelines
  • Automate deployments
  • Use CI/CD for models
  • Test before deploying

Pitfall 4: No Version Control

Problem: Can't reproduce results. Don't know which model is running.

Solution:

  • Version code, data, models
  • Use model registries
  • Tag everything
  • Document versions

Pitfall 5: Ignoring Data Quality

Problem: Model fails because input data is bad.

Solution:

  • Validate inputs
  • Monitor data quality
  • Handle missing values
  • Check for drift

Real-World Examples

Here's how we've set up MLOps for different companies.

Example 1: E-commerce Recommendations

Requirements:

  • Real-time product recommendations
  • Update daily with new products
  • Handle 10M+ requests/day

Setup:

  • Batch training pipeline (daily)
  • Real-time serving API
  • A/B testing framework
  • Performance monitoring

Tools:

  • MLflow for tracking
  • FastAPI for serving
  • Kubernetes for orchestration
  • Prometheus for monitoring

Result: Recommendations update daily. API serves 10M+ requests with 50ms average latency. Click-through rate improved 23%.

Example 2: Fraud Detection

Requirements:

  • Real-time fraud detection
  • Retrain weekly
  • High accuracy needed
  • Low false positives

Setup:

  • Weekly retraining pipeline
  • Real-time inference API
  • Performance monitoring
  • Alert on accuracy drops

Tools:

  • MLflow for model registry
  • Seldon for serving
  • Custom monitoring dashboard
  • Automated retraining

Result: Model retrains weekly with zero downtime. Accuracy maintained at 94-96%. False positive rate under 2%.

Example 3: Customer Churn Prediction

Requirements:

  • Daily batch predictions
  • Monthly retraining
  • Integration with CRM

Setup:

  • Scheduled training (monthly)
  • Batch inference (daily)
  • Results to database
  • CRM integration

Tools:

  • Airflow for scheduling
  • Simple Python scripts
  • Database for results
  • Basic monitoring

Result: Daily predictions for 50K+ customers. Monthly retraining improved accuracy by 8%. Sales team response time cut in half.

Getting Started

Ready to set up MLOps? Here's where to start.

Week 1: Basics

  1. Set up version control (Git)
  2. Start tracking experiments (MLflow)
  3. Document your current process
  4. Identify what to monitor

Week 2: Deployment

  1. Deploy model to staging
  2. Set up basic monitoring
  3. Test with production-like data
  4. Plan rollback strategy

Week 3: Automation

  1. Automate training pipeline
  2. Set up scheduled retraining
  3. Automate deployments
  4. Add more monitoring

Week 4: Optimization

  1. Review what you've built
  2. Identify bottlenecks
  3. Add missing pieces
  4. Document everything

Tools to Consider

If you're just starting:

  • MLflow (experiment tracking)
  • FastAPI (serving)
  • Basic monitoring (logs, metrics)

If you're scaling:

  • Kubeflow or managed platform
  • Advanced monitoring (drift detection)
  • Feature stores
  • Automated pipelines

If you're enterprise:

  • Managed platform (SageMaker, Vertex AI)
  • Full MLOps platform
  • Enterprise features (RBAC, SSO)
  • Dedicated team

The Bottom Line

MLOps isn't optional. If you're putting models in production, you need MLOps.

Start simple:

  • Version control
  • Basic monitoring
  • Simple deployment
  • Manual retraining

Add complexity as needed:

  • Automated pipelines
  • Advanced monitoring
  • A/B testing
  • Feature stores

What matters:

  • Reliability
  • Visibility
  • Reproducibility
  • Speed of iteration

Teams with MLOps ship models 10x faster. They catch issues before users do. They iterate weekly instead of quarterly.

Start today. Set up version control and basic monitoring. Add automation next week. Your future self will thank you.

All postsBook a consultation