ML System Patterns That Actually Scale

After deploying ML at scale across multiple companies, these are the patterns that consistently work - and the ones that don't.

The Production ML Stack

┌────────────────────────────────────────┐
│           Model Serving                │
│    (Low latency, high throughput)      │
├────────────────────────────────────────┤
│          Feature Store                 │
│    (Consistent train/serve features)   │
├────────────────────────────────────────┤
│        Training Pipeline               │
│    (Reproducible, versioned)           │
├────────────────────────────────────────┤
│         Data Platform                  │
│    (Single source of truth)            │
└────────────────────────────────────────┘

Pattern 1: Feature Store

The #1 source of training-serving skew is feature computation:

WRONG: Duplicate feature logic
training/features.py
def compute_features_training(user):
    return user.purchases[-30:].mean()

serving/features.py  
def compute_features_serving(user):
    return user.recent_purchases.average()  # Subtly different!

Solution: Single feature definition, multiple materializations:

@feature(
    entities=["user_id"],
    online=True,  # Materialize to Redis
    offline=True,  # Materialize to warehouse
)
def avg_purchase_30d(user_purchases: DataFrame) -> float:
    """Single definition, used everywhere."""
    return user_purchases.last("30d")["amount"].mean()

Pattern 2: Model Versioning

Every model in production needs:

@dataclass
class ModelArtifact:
    # Identity
    model_id: str
    version: str
    
    # Reproducibility
    training_data_hash: str
    code_commit: str
    hyperparameters: dict
    
    # Provenance
    trained_at: datetime
    trained_by: str
    training_metrics: dict
    
    # Deployment
    serving_signature: dict
    resource_requirements: dict

Pattern 3: Shadow Deployment

Never deploy directly to production:

Request → Load Balancer
              │
              ├──→ Production Model (serves response)
              │
              └──→ Shadow Model (logs only)
                        │
                        └──→ Compare metrics offline

Only promote shadow → production when metrics prove out.

Pattern 4: Graceful Degradation

ML systems fail. Plan for it:

class ResilientPredictor:
    def predict(self, features):
        try:
            # Try ML model
            return self.ml_model.predict(features, timeout=50ms)
        except TimeoutError:
            # Fall back to simpler model
            return self.fallback_model.predict(features)
        except Exception:
            # Fall back to business rules
            return self.rule_based_fallback(features)

Pattern 5: Monitoring That Matters

Input Drift

def monitor_input_drift(current_batch, reference_distribution):
    drift_score = kolmogorov_smirnov_test(
        current_batch, 
        reference_distribution
    )
    if drift_score > THRESHOLD:
        alert("Input distribution shift detected")

Output Drift

def monitor_predictions(predictions, window="1h"):
    # Prediction distribution
    pred_mean = predictions.mean()
    pred_std = predictions.std()
    
    # Compare to baseline
    if abs(pred_mean - BASELINE_MEAN) > 2  BASELINE_STD:
        alert("Prediction distribution shift")

Business Metrics

def monitor_business_impact(model_cohort, control_cohort): # The metric that actually matters conversion_lift = ( model_cohort.conversion_rate - control_cohort.conversion_rate ) if conversion_lift < MINIMUM_LIFT: alert("Model not providing expected lift")

Anti-Patterns to Avoid

1. Notebook → Production

This is not deployment model.save("model.pkl") Put it on the server somehow???

2. No Rollback Plan
Always have:
Previous model version ready

One-click rollback procedure

Automated rollback on metric degradation

3. Optimizing for Offline Metrics Only

Offline accuracy: 94% Online conversion: Dropped 15%
Why? Latency increased from 20ms to 200ms.
Always measure end-to-end business impact.
The Production Checklist
Before any model goes live:
[ ] Training-serving feature parity verified

[ ] Model versioned with full provenance

[ ] Shadow deployment completed

[ ] Fallback mechanisms tested

[ ] Monitoring dashboards ready

[ ] Rollback procedure documented

[ ] On-call runbook updated
Production ML is 10% models, 90% engineering. Plan accordingly.
Next: Feature Engineering at Scale - Lessons from 1B+ Daily Predictions*

ML System Patterns That Actually Scale

ML System Patterns That Actually Scale

The Production ML Stack

Pattern 1: Feature Store

WRONG: Duplicate feature logic

training/features.py

serving/features.py

Pattern 2: Model Versioning

Pattern 3: Shadow Deployment

Pattern 4: Graceful Degradation

Pattern 5: Monitoring That Matters

Input Drift

Output Drift

Business Metrics

Anti-Patterns to Avoid

1. Notebook → Production

This is not deployment

Put it on the server somehow???

2. No Rollback Plan

3. Optimizing for Offline Metrics Only

The Production Checklist

[Subscribe]

Posterior Updates