Introduction to MLOps

Machine Learning Operations (MLOps) is the discipline of operationalizing machine learning models — taking them from research experiments to reliable, scalable, and maintainable production systems. It combines DevOps principles with machine learning-specific requirements to address the unique challenges of deploying and managing ML systems.

The gap between developing a model in a notebook and deploying it in production is vast. A model that achieves 99% accuracy in a controlled environment may fail catastrophically in production due to data drift, infrastructure issues, or unexpected inputs. MLOps provides the practices, tools, and culture to bridge this gap.

💡 The MLOps Reality: According to industry surveys, only 54% of ML models ever make it to production. Of those, 1 in 3 fails within the first year. MLOps addresses the root causes: reproducibility gaps, infrastructure complexity, monitoring challenges, and organizational silos.

1. The ML Lifecycle: From Experiment to Production

MLOps Lifecycle Data Collection & Versioning Experiment Tracking Model Development CI/CD Pipeline Model Deployment Monitoring & Alerting Retrain Continuous loop: Data → Experiment → Develop → Deploy → Monitor → Retrain MLOps automates and standardizes each stage for reliable, repeatable ML delivery Levels: Level 0 (Manual) → Level 1 (CI/CD Automation) → Level 2 (Full Automation)
Figure 1: The MLOps lifecycle — from data collection to continuous retraining.

2. Data Versioning and Management

Data is the most critical and often most problematic component of ML systems. MLOps requires rigorous data management practices.

Data Versioning: Reproducible ML dataset v1.0 dataset v1.1 dataset v2.0 Model Training data: v1.1 code: abc123 Model v3 Tools: DVC, Delta Lake, HuggingFace Datasets, LakeFS
Figure 2: Data versioning — tracking dataset versions for reproducibility.

Data Management Best Practices

# DVC (Data Version Control) example
# Track data files
dvc add data/training.csv
git add data/training.csv.dvc data/.gitignore
git commit -m "Add dataset v1"

# Push to remote storage
dvc push

# Reproduce pipeline with specific data version
git checkout v1.0
dvc checkout

3. Experiment Tracking

ML development involves countless experiments with different hyperparameters, architectures, and data versions. Experiment tracking provides organization and reproducibility.

Experiment Tracking Dashboard Run ID Date Model Accuracy Data Version Status exp_0012024-01-15ResNet500.923v1.2 exp_0022024-01-16ResNet1010.941v1.2 exp_0032024-01-17ViT-B/160.952v2.0 exp_0042024-01-18ResNet1520.918v2.0 Tools: MLflow, Weights & Biases, Neptune.ai, TensorBoard, Comet
Figure 3: Experiment tracking — organizing and comparing model experiments.
# MLflow experiment tracking
import mlflow

mlflow.set_experiment("churn_prediction")

with mlflow.start_run():
    # Log parameters
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("n_estimators", 100)
    
    # Log metrics
    mlflow.log_metric("accuracy", 0.92)
    mlflow.log_metric("f1_score", 0.89)
    
    # Log artifacts
    mlflow.sklearn.log_model(model, "model")
    mlflow.log_artifact("confusion_matrix.png")

4. Model Versioning and Registry

The model registry is the single source of truth for production models, tracking version history, metadata, and deployment status.

Model Registry Lifecycle Registered Staging Production Archived Model Registry Metadata: • Performance metrics • Training data version • Hyperparameters • Approval status • Deployment history
Figure 4: Model registry — lifecycle stages from registration to archiving.
# Model registry with MLflow
from mlflow.tracking import MlflowClient

client = MlflowClient()
model_name = "churn_classifier"

# Register model
model_version = client.create_model_version(
    name=model_name,
    source="runs://model",
    run_id=""
)

# Transition stage
client.transition_model_version_stage(
    name=model_name,
    version=model_version.version,
    stage="Staging"
)

# Promote to production
client.transition_model_version_stage(
    name=model_name,
    version=model_version.version,
    stage="Production"
)

5. CI/CD for Machine Learning

CI/CD pipelines automate testing, validation, and deployment of ML models, ensuring reliability and reducing manual errors.

ML CI/CD Pipeline Code Commit Unit Tests Data Validation Model Training Model Validation Deploy Model Validation Checks: • Performance threshold (accuracy > 0.9) • Fairness metrics • Latency constraints • Resource requirements • Model size limits • Backward compatibility • Security scanning • Drift detection baseline
Figure 5: ML CI/CD pipeline — automated testing and validation before deployment.
# GitHub Actions workflow for ML
# .github/workflows/ml-pipeline.yml
name: ML Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Run unit tests
        run: pytest tests/
      - name: Data validation
        run: python scripts/validate_data.py
      - name: Model training & validation
        run: python scripts/train.py --validate-only

6. Model Deployment Strategies

Model Deployment Strategies Batch Inference Scheduled processing Offline predictions e.g., nightly recommendations Online / REST API Real-time predictions Microservice architecture e.g., fraud detection Streaming Event-driven inference Kafka, Kinesis e.g., real-time personalization Edge / On-Device Local inference Privacy, low latency e.g., mobile, IoT Choose based on latency requirements, scale, and infrastructure constraints
Figure 6: Model deployment strategies — batch, online, streaming, and edge.

Deployment Patterns

# Model serving with FastAPI
from fastapi import FastAPI
from pydantic import BaseModel
import joblib

app = FastAPI()
model = joblib.load("model.pkl")

class PredictionRequest(BaseModel):
    features: list

@app.post("/predict")
async def predict(request: PredictionRequest):
    prediction = model.predict([request.features])
    return {"prediction": prediction.tolist()}

# Run with: uvicorn app:app --host 0.0.0.0 --port 8000

7. Model Monitoring

Monitoring is critical for maintaining model performance in production. Key monitoring dimensions:

Model Monitoring Dashboard Prediction Distribution Within expected range Data Drift ⚠️ Drift detected - 0.15 PSI Model Performance F1: 0.89 → 0.87 (-2%) Latency p95: 87ms (OK) Data Quality 98% Missing values: 2% System Health CPU: 45% | Memory: 3.2GB Requests/sec: 1250
Figure 7: Model monitoring — tracking performance, drift, and system health.

Monitoring Types

# Drift detection with evidently
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=reference, current_data=current)
report.save_html("drift_report.html")

# Population Stability Index (PSI)
def calculate_psi(expected, actual, bins=10):
    expected_percents = np.histogram(expected, bins=bins)[0] / len(expected)
    actual_percents = np.histogram(actual, bins=bins)[0] / len(actual)
    psi = np.sum((actual_percents - expected_percents) * 
                 np.log(actual_percents / expected_percents))
    return psi

8. Feature Store

A feature store centralizes feature engineering, ensuring consistency between training and serving, and enabling feature reuse across models.

Feature Store Architecture Raw Data (Batch) Stream Data (Real-time) Feature Store Offline Store Online Store Training Serving Tools: Feast, Tecton, Databricks Feature Store, Vertex AI Feature Store Key benefit: Training/serving consistency eliminates offline-online skew
Figure 8: Feature store — centralized feature management for training and serving.
# Feast feature store example
from feast import FeatureStore

# Initialize feature store
store = FeatureStore(repo_path="feature_repo")

# Get training data
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "user_features:age",
        "user_features:location",
        "transaction_features:avg_amount"
    ]
).to_df()

# Get online features for serving
features = store.get_online_features(
    features=[
        "user_features:age",
        "user_features:location"
    ],
    entity_rows=[{"user_id": 12345}]
).to_dict()

9. LLMOps: MLOps for Large Language Models

LLMs introduce unique challenges that extend traditional MLOps practices.

🔄 LLMOps Considerations:
  • Prompt Management: Version control for prompts, templates, and chains
  • Cost Monitoring: Tracking token usage and API costs
  • Hallucination Detection: Monitoring for factual inaccuracies
  • Safety Filtering: Preventing harmful or inappropriate outputs
  • RAG Evaluation: Assessing retrieval-augmented generation quality
# LLM monitoring with LangSmith
import langsmith

client = langsmith.Client()

# Log LLM interaction
client.create_run(
    name="chat_completion",
    inputs={"prompt": "What is MLOps?"},
    outputs={"response": "MLOps is..."},
    metadata={"model": "gpt-4", "tokens": 125}
)

# Evaluate with custom criteria
from langsmith.evaluation import evaluate

evaluate(
    lambda inputs: llm.predict(inputs["prompt"]),
    data=test_dataset,
    evaluators=[accuracy_evaluator, safety_evaluator]
)

10. Infrastructure and Scalability

Infrastructure Options

# Kubernetes deployment for model serving
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-serving
spec:
  replicas: 3
  selector:
    matchLabels:
      app: model
  template:
    metadata:
      labels:
        app: model
    spec:
      containers:
      - name: model-container
        image: model:v1.2.3
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "4Gi"
            cpu: "2"

11. Governance and Compliance

Key Governance Requirements

12. MLOps Tools and Platforms

MLOps Tooling Landscape Experiment Tracking MLflow, W&B Model Registry MLflow, Vertex Feature Store Feast, Tecton CI/CD GitHub Actions, Jenkins Model Serving Seldon, KServe Monitoring Evidently, WhyLabs Integrated platforms: Databricks, Vertex AI, SageMaker, Azure ML
Figure 9: MLOps tooling landscape — specialized tools for each stage.

13. Best Practices and Anti-Patterns

Best Practices

Anti-Patterns to Avoid

Conclusion

MLOps transforms machine learning from experimental craft to reliable engineering discipline. By applying DevOps principles to ML-specific challenges, organizations can deploy models faster, maintain them more reliably, and scale ML impact across the enterprise.

The journey to mature MLOps is incremental — starting with manual processes, adding automation, and ultimately achieving fully automated ML pipelines. Regardless of where you start, the principles of versioning, testing, monitoring, and reproducibility apply at every stage.

🎯 Next Steps: Explore Generative AI to understand the cutting edge of AI capabilities, or dive deeper into AI Ethics to ensure responsible deployment of ML systems.