Data Contracts: Keeping Systems Honest—Validation, Schema Design, and Integration Trust

Operator Summary: Data contracts define what systems expect from each other—required fields, valid formats, business rules—and enforce those expectations at integration boundaries. Brands implementing formal data contracts reduce integration errors by 70–85% and cut debugging time by 60% according to integration platform research. The formula: Schema definition + validation rules + error handling + monitoring = systems that fail fast with useful errors instead of silently corrupting data across your stack.

Why Commerce Integrations Fail (It’s Always Data Quality)

“90% of integration failures trace back to unexpected data,” notes API design expert Mike Amundsen. His research across 500+ integration projects shows that data format mismatches, missing required fields, and invalid values cause more downtime than auth failures, network issues, and rate limits combined.

The Silent Data Corruption Problem

Scenario: Order flows from Shopify → operations platform → 3PL:

Shopify sends order with phone number: "555-1234" (missing area code)
Operations platform accepts, stores in database
3PL API requires phone in format: "+1-555-555-1234"
Integration fails silently; order stuck in queue
Customer never receives shipping notification
48 hours later, customer service discovers issue

Cost: Lost customer, manual investigation (30 min), expedited shipping ($25), brand damage.

Root cause: No validation at Shopify → operations platform boundary. System accepted invalid data, propagated it downstream, and failed late.

With data contract: Operations platform validates phone format on receipt, rejects Shopify order immediately with error: "Phone number must include area code". Merchant fixes in Shopify; order processes successfully.

The Cascading Failure Problem

Scenario: Product catalog sync from operations platform → Shopify → Amazon:

Operations platform changes SKU format from "ABC-123" to "ABC123" (no hyphen)
Shopify integration expects hyphens; treats as new product instead of update
Inventory sync breaks (old SKU vs. new SKU mismatch)
Amazon receives variant with invalid parent SKU relationship
Listing suppressed; sales halt

Cost: 4 hours of downtime during peak season = $8K lost revenue; 6 hours of engineering time debugging.

Root cause: No schema contract defining SKU format rules. System change in one place broke assumptions everywhere else.

With data contract: SKU format defined in schema (pattern: "^[A-Z]{3}-\d{3}$"). Any attempt to save SKU not matching pattern rejected at database layer before propagation.

Anatomy of a Data Contract

Component 1: Schema Definition

What it includes:

Field names and types: sku is string, quantity is integer, price is decimal(10,2)
Required vs. optional: email required, phone optional
Format constraints: email must match email regex; postal_code format by country
Enumerated values: order_status must be one of: pending, processing, shipped, delivered, cancelled
Relationships: line_item.product_id must reference valid product.id

Example (JSON Schema for product):

{
  "type": "object",
  "required": ["sku", "title", "price", "inventory_quantity"],
  "properties": {
    "sku": {
      "type": "string",
      "pattern": "^[A-Z]{3}-[0-9]{3,5}$",
      "description": "Format: XXX-### (3 letters, dash, 3-5 digits)"
    },
    "title": {
      "type": "string",
      "minLength": 1,
      "maxLength": 200
    },
    "price": {
      "type": "number",
      "minimum": 0.01,
      "multipleOf": 0.01
    },
    "inventory_quantity": {
      "type": "integer",
      "minimum": 0
    },
    "status": {
      "enum": ["active", "draft", "archived"]
    }
  }
}

Component 2: Validation Rules (Business Logic)

Beyond schema structure, business rules enforce domain logic:

Examples:

price >= cost (prevent selling below cost accidentally)
inventory_quantity >= reserved_quantity (can’t reserve more than available)
ship_date >= order_date (can’t ship before order placed)
discount_amount <= subtotal (discount can’t exceed order value)
variant.product_id must reference active product (no orphan variants)

Implementation: Validation layer in application code or database constraints.

Component 3: Error Handling Contract

When validation fails, what happens?

Bad approach (silent failure):

System accepts invalid data
Logs error to file no one reads
Processes with corrupted data or skips transaction silently

Good approach (fail fast, fail loud):

Reject invalid data immediately (HTTP 400 error for API)
Return specific error: "Field ’price’ must be greater than 0.01; received: 0.00"
Log error with context: request ID, timestamp, source system, data payload
Alert monitoring system if error rate exceeds threshold

Error response contract example:

{
  "error": {
    "code": "VALIDATION_FAILED",
    "message": "Product validation failed",
    "details": [
      {
        "field": "sku",
        "message": "SKU format invalid. Expected: XXX-###, received: ABC123"
      },
      {
        "field": "price",
        "message": "Price must be at least 0.01, received: 0.00"
      }
    ],
    "request_id": "req_abc123xyz",
    "timestamp": "2025-10-19T14:32:00Z"
  }
}

Component 4: Versioning and Evolution

Data contracts change over time. How do you update without breaking existing integrations?

Versioning strategies:

Option A: URL versioning

/v1/products (current production)
/v2/products (new version with breaking changes)
Clients migrate at their own pace; both versions supported

Option B: Header versioning

Accept: application/vnd.myapp.v1+json
Same URL, version in header

Option C: Field-level versioning

Additive changes only (never remove or rename fields)
New fields optional; deprecated fields marked but still supported
Client code ignores unknown fields (forward compatibility)

Recommendation: URL versioning for major breaks; additive-only changes for minor versions.

Deprecation contract:

Announce 90 days before breaking change
Support both old and new versions for 6–12 months
Monitor usage of old version; notify remaining users before sunset

Implementing Data Contracts Across Commerce Stack

Level 1: Database Constraints (Foundation)

Enforce at lowest level—database schema:

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  sku VARCHAR(20) NOT NULL UNIQUE,
  title VARCHAR(200) NOT NULL,
  price DECIMAL(10,2) CHECK (price >= 0.01),
  cost DECIMAL(10,2) CHECK (cost >= 0),
  inventory_quantity INTEGER CHECK (inventory_quantity >= 0),
  status VARCHAR(20) CHECK (status IN (’active’, ’draft’, ’archived’)),
  created_at TIMESTAMP DEFAULT NOW(),
  CONSTRAINT price_above_cost CHECK (price >= cost)
);

Benefits:

Last line of defense (can’t write invalid data even if application code fails)
Protects against manual SQL updates, direct database access
Performance: DB enforces faster than application layer

Limitations:

Hard to change (schema migrations required)
Error messages less friendly (SQL errors not user-facing)
Can’t enforce complex business rules (multi-table logic)

Level 2: Application Validation (Business Logic)

Enforce in application before database:

Example (Node.js with Joi validation library):

const Joi = require(’joi’);

const productSchema = Joi.object({
  sku: Joi.string().pattern(/^[A-Z]{3}-[0-9]{3,5}$/).required(),
  title: Joi.string().min(1).max(200).required(),
  price: Joi.number().min(0.01).precision(2).required(),
  cost: Joi.number().min(0).precision(2).required(),
  inventory_quantity: Joi.number().integer().min(0).required(),
  status: Joi.string().valid(’active’, ’draft’, ’archived’).required()
}).custom((value, helpers) => {
  // Business rule: price must be >= cost
  if (value.price < value.cost) {
    return helpers.error(’price.below.cost’, { price: value.price, cost: value.cost });
  }
  return value;
});

// Validation on product create/update
const { error, value } = productSchema.validate(productData);
if (error) {
  return res.status(400).json({ error: error.details });
}

Benefits:

Friendly error messages for users/API clients
Complex validation logic (cross-field, external lookups)
Fast iteration (no database migrations)

Limitations:

Can be bypassed if multiple entry points (API, admin, scripts)
Requires discipline to validate consistently

Level 3: API Gateway Validation (Integration Boundary)

Validate at system boundaries (API requests/responses):

Example (OpenAPI 3.0 specification):

paths:
  /products:
    post:
      summary: Create product
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: ’#/components/schemas/Product’
      responses:
        ’201’:
          description: Product created successfully
        ’400’:
          description: Validation error
          content:
            application/json:
              schema:
                $ref: ’#/components/schemas/Error’

components:
  schemas:
    Product:
      type: object
      required: [sku, title, price, cost, inventory_quantity]
      properties:
        sku:
          type: string
          pattern: ’^[A-Z]{3}-[0-9]{3,5}$’
        title:
          type: string
          minLength: 1
          maxLength: 200
        price:
          type: number
          format: decimal
          minimum: 0.01
        cost:
          type: number
          format: decimal
          minimum: 0
        inventory_quantity:
          type: integer
          minimum: 0
        status:
          type: string
          enum: [active, draft, archived]

Benefits:

API documentation = validation contract (OpenAPI autogenerates docs)
Client libraries auto-validate before sending (code generation from schema)
Gateway validates before hitting application (security, performance)

Level 4: Integration Middleware Validation

For platform-to-platform integrations (Shopify → operations platform → QuickBooks):

Example data flow with validation:

Shopify webhook sends order → middleware
Middleware validates against Shopify order schema (required fields, format)
Middleware transforms to internal order format
Middleware validates against internal schema
Middleware sends to operations platform
Operations platform validates again (defense in depth)
Operations platform sends to QuickBooks (invoice format)
Middleware validates QuickBooks invoice schema before transmission

Validation points = 4 (Shopify out, middleware in, middleware out, QuickBooks in)

Why multiple validations?

Defense in depth (one layer failing doesn’t corrupt entire system)
Early failure (catch errors at boundary, not deep in processing)
Clear ownership (each system validates its own inputs, doesn’t trust upstream)

Real-World Data Contract Examples

Contract 1: Product Catalog Sync

Scenario: Operations platform → Shopify product sync

Contract:

product_sync:
  required_fields:
    - sku (unique identifier, cannot change)
    - title (1-200 chars)
    - price (decimal, >= 0.01)
    - inventory_quantity (integer, >= 0)
    - status (active|draft|archived)
  
  optional_fields:
    - description (HTML allowed, max 10KB)
    - images (array of URLs, max 10 images)
    - vendor (string, max 100 chars)
    - tags (array of strings, max 50 tags)
  
  business_rules:
    - If status=active, must have price > 0 and inventory_quantity > 0
    - If inventory_quantity = 0, auto-set status to archived (prevent selling out-of-stock)
    - SKU cannot change once created (immutable identifier)
    - Price changes log audit trail (compliance, pricing history)
  
  error_handling:
    - Return HTTP 400 with field-level errors for validation failures
    - Return HTTP 409 for SKU conflicts (duplicate SKU)
    - Return HTTP 422 for business rule violations
    - Log all errors with source_system, timestamp, payload

Enforcement:

Operations platform validates before sending to Shopify API
Shopify API rejects if doesn’t match Shopify schema
Errors logged in integration dashboard with retry logic

Contract 2: Order Fulfillment Flow

Scenario: E-commerce → operations platform → 3PL

Contract:

fulfillment_order:
  required_fields:
    - order_id (unique, source system + order number)
    - customer_email (valid email format)
    - shipping_address:
        - name (1-100 chars)
        - address1 (1-200 chars)
        - city (1-100 chars)
        - province/state (2-char code or full name)
        - postal_code (format by country)
        - country (ISO 3166-1 alpha-2 code)
    - line_items (array, min 1 item):
        - sku (must exist in inventory)
        - quantity (integer, > 0)
        - price (decimal, >= 0)
    
  optional_fields:
    - shipping_address.address2
    - shipping_address.company
    - customer_phone (E.164 format if provided)
    - special_instructions (max 500 chars)
  
  business_rules:
    - All line_item.sku must have available inventory (quantity <= on_hand - reserved)
    - shipping_address must pass address validation (USPS, Canada Post, etc.)
    - If country != domestic, must have customs declaration data
    - Total order value must equal sum(line_items.price × line_items.quantity) + shipping + tax
  
  validation_stages:
    1. Schema validation (format, required fields)
    2. Inventory allocation check (availability)
    3. Address validation (deliverability)
    4. Fraud check (if first-time customer or high value)
    5. Routing determination (which 3PL based on location, inventory)
  
  error_handling:
    - Validation failure → reject order, return to source with errors
    - Inventory insufficient → hold order in queue, alert for restock
    - Address invalid → flag for customer service review (manual fix)
    - Fraud flag → hold for manual approval

Enforcement:

Multi-stage validation pipeline
Orders only move to next stage after passing previous validations
Exception queue for manual review (not silently dropped)

Contract 3: Inventory Sync (Multi-Location, Multi-Channel)

Scenario: 3PL → operations platform → sales channels (Shopify, Amazon, Faire)

Contract:

inventory_update:
  required_fields:
    - sku (matches product master)
    - location_id (warehouse/3PL identifier)
    - quantity_on_hand (integer, >= 0)
    - quantity_reserved (integer, >= 0)
    - timestamp (ISO 8601, must be recent within 5 min)
  
  calculated_fields:
    - quantity_available = quantity_on_hand - quantity_reserved
  
  business_rules:
    - quantity_reserved <= quantity_on_hand (can’t reserve more than exists)
    - timestamp must be <= current_time (no future-dated updates)
    - Location must be active and mapped to sales channel(s)
  
  sync_logic:
    - If quantity_available decreases, push to channels immediately (prevent overselling)
    - If quantity_available increases, batch updates (5-min intervals OK)
    - If quantity_available = 0, set all channels to out-of-stock
    - Maintain audit log: all quantity changes with source, timestamp, reason
  
  error_handling:
    - Invalid SKU → log error, skip update (don’t create phantom SKUs)
    - Negative available qty → alert immediately (data corruption)
    - Stale timestamp (>10 min old) → reject, request fresh snapshot

Enforcement:

Validation on receipt from 3PL
Channel-specific inventory allocation rules (safety stock per channel)
Real-time alerts for anomalies (sudden quantity drop >50 units without order)

Monitoring Data Contract Compliance

Metrics to Track

1. Validation Failure Rate

Target: <2% of requests fail validation
Alert: If >5% fail in 1-hour window (indicates upstream system issue)
Dashboard: Failure rate by endpoint, error type, source system

2. Error Response Time

Target: Validation errors returned in <200ms (fail fast)
Alert: If validation latency >500ms (performance degradation)

3. Contract Drift Detection

Monitor: Requests with unknown fields (forward compatibility test)
Alert: If >10% of requests have fields not in schema (schema outdated)

4. Data Quality Scoring

Score: % of optional fields populated (higher = better data quality)
Example: Orders with phone number: 85%, orders with shipping notes: 12%
Insight: Identify fields not being used (remove from schema?) or fields missing data (improve UI prompts)

Alerting on Contract Violations

Critical alerts (PagerDuty/Slack):

Validation failure rate >10% (system integration broken)
Downstream system rejecting >5% of requests (contract mismatch)
Data corruption detected (negative inventory, impossible dates)

Warning alerts (email/dashboard):

Validation failure rate >5% (investigate)
New error types appearing (schema change upstream)
Deprecated field usage increasing (clients not migrating)

Info alerts (dashboard only):

Optional field usage trends
Request volume by endpoint
Average payload size (performance monitoring)

How CommerceOS Enforces Data Contracts

CommerceOS uses defense-in-depth validation:

API gateway validation: OpenAPI spec enforced at entry (Shopify, Amazon webhooks)
Application-layer business rules: SKU format, price > cost, inventory >= reserved
Database constraints: Final safety net for data integrity
Integration middleware validation: Outbound to QuickBooks, 3PLs validated before transmission
Exception dashboard: All validation failures surfaced for operator review
Automated retry with backoff: Transient errors (network) retried; permanent errors (bad data) flagged

Result: 95%+ of integration errors caught and resolved without manual intervention; remaining 5% routed to exception queue with context for rapid resolution.

Frequently Asked Questions

How do I create a data contract for an existing integration that has none?

Step 1: Document current behavior (reverse-engineer schema from existing data). Step 2: Define ideal schema (required fields, formats, business rules). Step 3: Measure gap (what % of current data violates ideal schema?). Step 4: Implement validation in shadow mode (log errors, don’t reject). Step 5: Fix top error sources (coordinate with upstream system owners). Step 6: Enforce validation (reject invalid data, return errors). Timeline: 4–8 weeks for critical integration.

What happens when an upstream system sends data that violates our contract?

Option A (reject): Return HTTP 400 error with field-level details; upstream system responsible for fixing. Option B (accept with warnings): Log validation failure, process with best-effort data cleanup, alert for manual review. Choice depends on: Can upstream fix quickly? Is data critical (order) or optional (analytics event)? Is this our system or third-party beyond our control? Recommendation: Reject for critical flows (orders, payments); accept-with-warnings for non-critical (marketing events).

How strict should validation be—should I reject minor issues?

Strict validation (recommended for production): Reject anything not matching contract exactly. Lenient validation (dev/testing only): Accept data, coerce to expected format, log warnings. Why strict: Minor issues compound (phone without area code → can’t call customer → late delivery → chargeback). Exception: If data is low-criticality AND fixing upstream is prohibitively expensive, accept-with-transform (e.g., auto-format phone numbers).

How do I version data contracts when the schema needs to change?

Non-breaking changes (safe): Add optional fields, add new enum values, relax validation (reduce min length). Breaking changes (requires version bump): Remove fields, rename fields, make optional field required, tighten validation (increase min length). Strategy: Support multiple versions simultaneously (v1 and v2 URLs); deprecate old version after 6–12 months; monitor usage of old version, notify remaining clients before sunset. Never break existing integrations without notice.

Should I validate outbound data (data we send) or just inbound?

Validate both. Inbound validation protects your system from bad data. Outbound validation protects downstream systems from your bad data (and prevents blame when their system breaks). Example: Before sending invoice to QuickBooks, validate it matches QuickBooks schema. If validation fails, fix your data generation logic—don’t let QuickBooks error teach you your code is broken.

What tools help with schema definition and validation?

Schema definition: JSON Schema, OpenAPI/Swagger (REST APIs), GraphQL schema (GraphQL APIs), Protobuf (gRPC). Validation libraries: Joi (Node.js), Pydantic (Python), FluentValidation (C#), Yup (JavaScript). API gateways with validation: Kong, AWS API Gateway, Azure API Management. Documentation generation: OpenAPI → Swagger UI, Redoc; GraphQL → GraphiQL, Playground. Recommendation: OpenAPI 3.0 for REST APIs (industry standard, great tooling).

How do I handle data contracts across different teams or companies?

Internal teams: Central schema repository (Git repo); automated tests validate contract compliance; breaking changes require stakeholder approval (architecture review). External partners: Publish API documentation (OpenAPI); provide sandbox environment for testing; require integration certification before production access; versioning strategy with deprecation policy. Legal contracts: For high-value B2B integrations, SLA for schema stability (no breaking changes without 90-day notice).

What’s the performance cost of extensive validation?

Typical overhead: 5–20ms per request for schema validation (JSON Schema, Joi). Optimization: Cache compiled schemas (don’t recompile every request), validate only at boundaries (not every internal function call), use fast validation libraries (avoid regex-heavy rules if possible). When to skip validation: Internal service-to-service calls IF both services owned by same team and schemas strictly controlled. Never skip: External API requests, user input, third-party integrations.

Impact & Difficulty

Conservative Impact: 40% reduction in integration errors; 30% faster debugging when errors occur Likely Impact: 70% reduction in integration errors; 60% faster debugging; 90% of errors caught at boundary instead of deep in processing Upside Impact: 85% error reduction; near-zero silent data corruption; integration failures resolved in minutes instead of hours; team trust in data quality

Difficulty Rating: 3/5 (Requires discipline and cross-team coordination more than technical complexity)

CommerceOS

Apps

Resources

Data Contracts: Keeping Systems Honest—Validation, Schema Design, and Integration Trust

Data Contracts: Keeping Systems Honest—Validation, Schema Design, and Integration Trust

Why Commerce Integrations Fail (It’s Always Data Quality)

The Silent Data Corruption Problem

The Cascading Failure Problem

Anatomy of a Data Contract

Component 1: Schema Definition

Component 2: Validation Rules (Business Logic)

Component 3: Error Handling Contract

Component 4: Versioning and Evolution

Implementing Data Contracts Across Commerce Stack

Level 1: Database Constraints (Foundation)

Level 2: Application Validation (Business Logic)

Level 3: API Gateway Validation (Integration Boundary)

Level 4: Integration Middleware Validation

Real-World Data Contract Examples

Contract 1: Product Catalog Sync

Contract 2: Order Fulfillment Flow

Contract 3: Inventory Sync (Multi-Location, Multi-Channel)

Monitoring Data Contract Compliance

Metrics to Track

Alerting on Contract Violations

How CommerceOS Enforces Data Contracts

Frequently Asked Questions

How do I create a data contract for an existing integration that has none?

What happens when an upstream system sends data that violates our contract?

How strict should validation be—should I reject minor issues?

How do I version data contracts when the schema needs to change?

Should I validate outbound data (data we send) or just inbound?

What tools help with schema definition and validation?

How do I handle data contracts across different teams or companies?

What’s the performance cost of extensive validation?

Impact & Difficulty

Commerce is chaos.

Share this post

Insights to master the chaos of commerce

Data Contracts: Keeping Systems Honest—Validation, Schema Design, and Integration Trust

Data Contracts: Keeping Systems Honest—Validation, Schema Design, and Integration Trust

Why Commerce Integrations Fail (It’s Always Data Quality)

The Silent Data Corruption Problem

The Cascading Failure Problem

Anatomy of a Data Contract

Component 1: Schema Definition

Component 2: Validation Rules (Business Logic)

Component 3: Error Handling Contract

Component 4: Versioning and Evolution

Implementing Data Contracts Across Commerce Stack

Level 1: Database Constraints (Foundation)

Level 2: Application Validation (Business Logic)

Level 3: API Gateway Validation (Integration Boundary)

Level 4: Integration Middleware Validation

Real-World Data Contract Examples

Contract 1: Product Catalog Sync

Contract 2: Order Fulfillment Flow

Contract 3: Inventory Sync (Multi-Location, Multi-Channel)

Monitoring Data Contract Compliance

Metrics to Track

Alerting on Contract Violations

How CommerceOS Enforces Data Contracts

Frequently Asked Questions

How do I create a data contract for an existing integration that has none?

What happens when an upstream system sends data that violates our contract?

How strict should validation be—should I reject minor issues?

How do I version data contracts when the schema needs to change?

Should I validate outbound data (data we send) or just inbound?

What tools help with schema definition and validation?

How do I handle data contracts across different teams or companies?

What’s the performance cost of extensive validation?

Impact & Difficulty

Related Resources

Commerce is chaos.

Share this post

Insights to master the chaos of commerce