tadata
Back to home

Feature Stores: The Missing Infrastructure for ML

#machine-learning#feature-store#data-engineering#mlops

Feature stores solve one of the most underestimated problems in ML: getting consistent, fresh, and correct features to models in both training and serving. Without one, teams rebuild the same transformations, introduce training-serving skew, and waste months on plumbing.

The Core Problem

WITHOUT Feature Store:              WITH Feature Store:

Training Pipeline                   Training Pipeline
  └── SQL query A                     └── Feature Store (offline)
Serving Pipeline                    Serving Pipeline
  └── Python transform B              └── Feature Store (online)
       (different logic!)                  (same logic, same data)

Result: Training-Serving Skew       Result: Consistency guaranteed

Online vs Offline: Two Serving Patterns

DimensionOffline StoreOnline Store
PurposeTraining data, batch scoringReal-time inference
LatencySeconds to minutes< 10 ms
StorageData lake / warehouse (Parquet, Delta)Key-value store (Redis, DynamoDB)
Data VolumeMonths/years of historyLatest values only
Access PatternFull scan, time-travel queriesPoint lookup by entity key
FreshnessBatch (hourly/daily)Near real-time (streaming)
Cost ProfileStorage-heavy, compute on readCompute-heavy, storage is small

Tool Comparison

FeatureFeastTectonHopsworksVertex Feature Store
TypeOpen sourceCommercialOpen coreManaged (GCP)
Online StoreRedis, DynamoDB, etc.ManagedRonDBBigtable
Offline StoreBigQuery, Redshift, etc.ManagedHudi on S3BigQuery
StreamingVia Spark/FlinkNativeNativeDataflow
Feature TransformationLimited (push-based)Native (Python SDK)Native (PySpark/SQL)Dataflow pipelines
MonitoringBasicBuilt-in drift detectionBuilt-inBasic
RegistryFile or DB-backedManagedManagedManaged
CostFree + infra$$$$ (enterprise)Free tier + managedGCP pricing
Best ForSimple needs, multi-cloudHigh-scale real-time MLFull ML platformGCP-native teams

Feature Store Architecture

Data Sources                    Feature Store                  Consumers
+----------------+         +---------------------+        +----------------+
| Event Streams  |----->   | Transformation      |        | Training       |
| (Kafka, Kinesis)|        | Engine              |        | Pipelines      |
+----------------+         |   |         |        |        +----------------+
                           |   v         v        |
+----------------+         | +-------+ +-------+  |        +----------------+
| Databases      |----->   | |Offline| |Online |  |------> | Real-time      |
| (Postgres, etc)|         | |Store  | |Store  |  |        | Serving        |
+----------------+         | +-------+ +-------+  |        +----------------+
                           |         |            |
+----------------+         | +------------------+ |        +----------------+
| Data Warehouse |----->   | |Feature Registry  | |------> | Batch          |
| (BigQuery, etc)|         | |& Metadata        | |        | Scoring        |
+----------------+         +---------------------+        +----------------+

Adoption Decision Matrix

SignalScore (1-5)Weight
Number of ML models in production___3x
Teams sharing features across models___3x
Training-serving skew incidents___2x
Time spent on feature engineering plumbing___2x
Real-time serving requirements___2x
Data freshness requirements (streaming)___1x

Score interpretation: Weighted total > 40: strong need. 25-40: evaluate. < 25: premature.

When NOT to Build a Feature Store

  • You have fewer than 3 models in production
  • All your models are batch-only (no real-time serving)
  • A single team owns all ML and does not share features
  • Your features are simple lookups with no transformation

Resources