3.2 Data Observability and Monitoring
3.2.1 Observability Concepts
Data observability borrows from DevOps observability but focuses on the health of data itself, not just the systems that process it.
DevOps Observability monitors metrics like CPU/RAM usage and response time to quickly detect anomalies, identify problems, prevent downtime, and ensure reliable software products.
Data Observability monitors the health of data and data systems, ensuring high-quality data that is accurate, complete, discoverable, and available in a timely manner. Upstream changes โ such as source systems changing their data structure โ should be expected and mitigated proactively.
Key questions to ask (from Barr Moses, CEO of Monte Carlo):
- Is the data up-to-date?
- Is the data complete?
- Are fields within expected ranges?
- Is the null rate higher or lower than it should be?
- Has the schema changed?
5 Pillars of Data Observability:
- Distribution / Internal Quality: Checks metrics such as NULL percentage, unique element percentage, summary statistics, and whether data falls within expected ranges. Ensures data is trusted based on your expectations.
- Freshness: How up-to-date the data is within the final asset (table, BI report) โ when it was last updated and how frequently. Stale data results in wasted time and money.
- Volume: Monitors the amount of data ingested, looking for unexpected spikes or drops. Sudden drops can indicate lost data or system outages; sudden increases may signal unexpected usage surges.
- Lineage: According to Barr, โWhen data breaks, the first question is always โwhere?โโ Data lineage traces the data journey from source to destination, visualizing transformations and storage locations to identify the source of errors.
- Schema: Monitors changes in data structure or types to prevent pipeline failures.
3.2.2 Monitoring Data Quality
Focus your monitoring efforts on the metrics that matter most. The core dimensions to track are volume, distribution, null values, and freshness. Identify the most important metrics by checking what stakeholders care about and talking with source system owners.
3.2.3 Observability Tools
Great Expectations
Great Expectations (GX) is an open-source Python library for validating, documenting, and profiling data. It lets you define expectations - declarative assertions about what your data should look like.
When expectations fail, GX generates detailed reports showing exactly which rows or columns violated the rules, making it easy to catch data quality issues before they reach downstream consumers. GX stores all metadata - expectations, validation results, checkpoints, and data docs - in configurable backend stores, keeping your validation logic versioned and reproducible.
Core Components
CloudWatch
CloudWatch is AWSโs built-in monitoring service for tracking infrastructure and application metrics.
It collects system-level metrics (CPU, memory, disk, network) automatically from AWS resources, and supports custom metrics for application-specific measurements like transaction counts or API response times. CloudWatch Alarms let you define thresholds on any metric and trigger notifications or automated actions when those thresholds are breached. It retains metrics data for up to 15 months, enabling long-term trend analysis and capacity planning.
3.2.4 Data Contracts
A data contract is a formal agreement between a data producer and its consumers that defines the structure, semantics, quality guarantees, and service-level expectations for a dataset. Data contracts shift quality enforcement upstream - instead of consumers discovering problems after the fact, producers commit to delivering data that meets a defined standard.
What a Data Contract Specifies
| Element | Description |
|---|---|
| Schema | Column names, data types, nullability constraints, and valid value ranges |
| Freshness SLA | Maximum acceptable delay between data generation and availability (e.g., โwithin 2 hours of midnight UTCโ) |
| Volume expectations | Expected row count ranges - an empty table or a 10x spike may indicate a problem |
| Ownership | The team responsible for maintaining the contract and responding to violations |
| Breaking change policy | How schema changes are communicated - e.g., deprecation windows, versioning rules |
Why Data Contracts Matter
Without contracts, data pipelines are fragile. A source team renames a column, changes a data type, or stops populating a field - and downstream dashboards break silently. Data contracts make these dependencies explicit and enforceable. When a producer violates the contract, automated validation catches it before the bad data propagates downstream.
Data contracts are closely related to data observability - the contract defines what โhealthyโ looks like, and observability tools monitor for violations.