Reservation Data Quality Framework
Validate Omni Hotels reservation data against a production rule suite — powering reliable revenue management and personalized guest experiences across 50+ properties.
Bad Data, Broken Stays
A single invalid check-out date cascades into wrong ADR calculations, broken forecasting models, and misrouted loyalty rewards. Omni processes 4,000+ reservations per night across 50 properties. At 2% error rate, that's 80 corrupted records per night compounding into the data warehouse.
Opera PMS → Snowflake → Analytics
Opera Property Management System (the industry-standard hotel PMS) pushes reservation records nightly via ETL into Snowflake. Revenue management, marketing, and loyalty systems all read from this warehouse. Data quality must be enforced at ingestion — before downstream systems consume corrupt data.
Great Expectations: Rules as Code
Great Expectations defines data quality rules declaratively ("expect_column_values_to_be_between") and executes them as unit tests against each batch. Rules are version-controlled alongside pipeline code — breaking data is caught in CI just like breaking code.
Completeness, Validity, Uniqueness, Timeliness, Consistency
The CVUTC framework maps each rule to a business risk: Completeness gaps break loyalty profiles. Validity failures skew revenue reports. Uniqueness violations cause double-billing. Timeliness failures corrupt forecasting models. Consistency failures break date-range analytics.
Where Errors Actually Come From
72% of hotel data errors originate at front desk entry (walk-in guests, rate overrides, manual corrections). 18% come from OTA (booking.com, Expedia) field mapping mismatches. 10% are system sync glitches (duplicate res_id on failover). Each error type needs a different remediation strategy.
From Report to Fix in 4 Hours
Quality failures generate Jira tickets, tagged by dimension and source system, routed to the owning team. Revenue management failures alert the RM director directly. SLA: critical failures (rate = $0, checkout < checkin) fixed within 4 hours. Non-critical within 24. Trend dashboard shows improvement over time.
Interactive Demo
Run the validation suite against 10 real Omni Hotels reservations with intentional quality issues.
| res_id | guest_name | phone | check_in | check_out | room_type | rate | loyalty | property |
|---|
| Column | Expectation | Parameter | Failures |
|---|
Classroom
Six data quality concepts — from CVUTC dimensions to SLA-based remediation frameworks.
Key Engineering Points
Four decisions that separate production-grade hospitality data quality from checkbox exercises.
Opera PMS is the Source of Truth
Opera is deployed at 70%+ of large hotel chains. Its export format has idiosyncrasies: rate fields default to 0 (not NULL) on comp rooms, phone fields use property-specific formats, loyalty tier names vary by brand. Any GE ruleset must be calibrated to Opera's specific behavior — not generic SQL patterns.
Email as a Marketing Revenue Multiplier
Omni's email marketing generates $8.50 per valid address per year in direct booking revenue. At 4% invalid email rate across 2.3M loyalty members, that's $782K/year in unreachable guests. Email validation in the ETL pipeline pays for the entire data quality program within 3 months.
Date Consistency Errors Compound
A checkout-before-checkin error doesn't just affect one record. Revenue management models use rolling averages of stay duration. One -3 night stay (checkin April 12, checkout April 9) corrupts the average, distorting rate optimization algorithms that rely on accurate length-of-stay distribution. The cascade is invisible unless you measure it.
Statistical vs. Rule-Based Validation
Rule-based checks (GE expectations) catch known violation types. Statistical checks (Z-score on rate distribution, Kullback-Leibler divergence on daily reservation volumes) catch unknown shifts — like an OTA suddenly sending room types in a new format that passes all explicit rules but breaks downstream category models.
Production Code
Four patterns covering the full data quality stack: validation, profiling, anomaly detection on metrics, and alerting.
Great Expectations Suite for Hotel Reservations (Python)
import great_expectations as gx
from great_expectations.core.batch import RuntimeBatchRequest
context = gx.get_context()
suite = context.add_or_update_expectation_suite("omni_reservations_suite")
def build_expectation_suite(validator):
# Uniqueness
validator.expect_column_values_to_be_unique("res_id")
# Completeness — structurally required fields
for col in ["guest_name", "check_in", "check_out", "property", "room_type"]:
validator.expect_column_values_to_not_be_null(col)
# Validity — rate range ($89–$1500 covers standard to Presidential)
validator.expect_column_values_to_be_between("rate", min_value=89, max_value=1500)
# Validity — email regex
validator.expect_column_values_to_match_regex(
"email", regex=r"^[\w.+-]+@[\w-]+\.[\w.]+$"
)
# Validity — loyalty tier set
validator.expect_column_values_to_be_in_set(
"loyalty",
value_set=["Select Guest", "Gold", "Platinum"]
)
# Consistency — checkout after checkin (custom SQL expectation)
validator.expect_column_pair_values_to_be_equal(
column_A="check_out", column_B="check_in",
or_equal=False,
meta={"description": "check_out must be after check_in"}
)
return validator.get_expectation_suite()
Automated Data Profiling with pandas-profiling (Python)
import pandas as pd
from ydata_profiling import ProfileReport
import snowflake.connector
def profile_reservations(date_range: tuple[str, str]) -> ProfileReport:
conn = snowflake.connector.connect(
user="etl_service", account="omni-hotels.us-east-1",
warehouse="ANALYTICS_WH", database="OPERA_PROD"
)
df = pd.read_sql(f"""
SELECT res_id, guest_name, email, phone, check_in, check_out,
room_type, rate, loyalty_tier, property_code
FROM reservations
WHERE created_date BETWEEN '{date_range[0]}' AND '{date_range[1]}'
""", conn)
profile = ProfileReport(
df,
title=f"Omni Reservations {date_range[0]} to {date_range[1]}",
explorative=True,
correlations={"auto": {"calculate": True}},
missing_diagrams={"heatmap": True, "dendrogram": True}
)
# Key stats: missing rates, value distributions, correlations
completeness = (1 - df.isnull().mean()) * 100
print(completeness.sort_values()) # spot bottom fields
print(df["rate"].describe()) # outlier check
return profile
Statistical Alert on Daily Quality Score Regression (Python)
import numpy as np
from dataclasses import dataclass
@dataclass
class QualityAlert:
date: str; score: float; z_score: float; prev_scores: list[float]
class QualityRegressionDetector:
"""Detect unusual drops in daily DQ score using rolling Z-Score.
Fires if today's score is >2.5 sigma below the rolling mean."""
def __init__(self, window: int = 14, threshold: float = 2.5):
self.window = window; self.threshold = threshold
self._history: list[tuple[str, float]] = []
def update(self, date: str, score: float) -> QualityAlert | None:
self._history.append((date, score))
if len(self._history) <= self.window:
return None
recent = [s for _, s in self._history[-self.window - 1:-1]]
mu = np.mean(recent); sigma = np.std(recent, ddof=1)
if sigma == 0: return None
z = (score - mu) / sigma
if z < -self.threshold:
return QualityAlert(date=date, score=score, z_score=z, prev_scores=recent)
return None
# Usage: called nightly after GE checkpoint
detector = QualityRegressionDetector(window=14)
for date, score in daily_scores: # from DQ metrics table
alert = detector.update(date, score)
if alert: send_pagerduty(alert) # page duty manager
Quarantine + Remediation Router (Python)
from enum import Enum
from dataclasses import dataclass, field
class Severity(str, Enum):
CRITICAL = "critical" # blocks load, pages duty manager
HIGH = "high" # quarantines row, creates Jira P1
MEDIUM = "medium" # loads with quality flag, Jira P2
LOW = "low" # logs only, next-sprint backlog
DIMENSION_SLA = {
"CONSISTENCY": Severity.CRITICAL, # checkout < checkin
"RANGE": Severity.HIGH, # rate = $0 or $5000+
"UNIQUE": Severity.HIGH, # duplicate res_id
"NOT NULL": Severity.MEDIUM, # missing required field
"REGEX": Severity.MEDIUM, # invalid email/phone
"SET": Severity.LOW, # unexpected loyalty tier
"DATE PAST": Severity.LOW, # future date in historical extract
}
@dataclass
class RemediationAction:
res_id: str; rule_type: str; severity: Severity; action: str
def route_failures(failures: list[dict]) -> list[RemediationAction]:
actions = []
for f in failures:
sev = DIMENSION_SLA.get(f["rule_type"], Severity.LOW)
action_map = {
Severity.CRITICAL: "BLOCK_LOAD|PAGE_MANAGER",
Severity.HIGH: "QUARANTINE|CREATE_JIRA_P1",
Severity.MEDIUM: "LOAD_FLAGGED|CREATE_JIRA_P2",
Severity.LOW: "LOAD_FLAGGED|LOG_ONLY",
}
actions.append(RemediationAction(
res_id=f["res_id"], rule_type=f["rule_type"],
severity=sev, action=action_map[sev]
))
return actions
About This Demo
A Great Expectations-style validation suite running against real Omni Hotels reservation patterns.
🏨 Omni Hotels Data Quality Framework
Eight validation rules across five CVUTC dimensions, animated rule-by-rule execution, SVG quality score ring, and per-column failure attribution. Mirrors production Opera PMS → Snowflake pipeline patterns.
Stack: JavaScript · Great Expectations patterns · Django 5.1 · Hospitality Data Engineering