Data Contracts: Keeping Systems Honest—Validation, Schema Design, and Integration Trust

Operator Summary: Data contracts define what systems expect from each other—required fields, valid formats, business rules—and enforce those expectations at integration boundaries. Brands implementing formal data contracts reduce integration errors by 70–85% and cut debugging time by 60% according to integration platform research. The formula: Schema definition + validation rules + error handling + monitoring = systems that fail fast with useful errors instead of silently corrupting data across your stack.

Why Commerce Integrations Fail (It’s Always Data Quality)

“90% of integration failures trace back to unexpected data,” notes API design expert Mike Amundsen. His research across 500+ integration projects shows that data format mismatches, missing required fields, and invalid values cause more downtime than auth failures, network issues, and rate limits combined.

The Silent Data Corruption Problem

Scenario: Order flows from Shopify → operations platform → 3PL:

  1. Shopify sends order with phone number: "555-1234" (missing area code)
  2. Operations platform accepts, stores in database
  3. 3PL API requires phone in format: "+1-555-555-1234"
  4. Integration fails silently; order stuck in queue
  5. Customer never receives shipping notification
  6. 48 hours later, customer service discovers issue

Cost: Lost customer, manual investigation (30 min), expedited shipping ($25), brand damage.

Root cause: No validation at Shopify → operations platform boundary. System accepted invalid data, propagated it downstream, and failed late.

With data contract: Operations platform validates phone format on receipt, rejects Shopify order immediately with error: "Phone number must include area code". Merchant fixes in Shopify; order processes successfully.


The Cascading Failure Problem

Scenario: Product catalog sync from operations platform → Shopify → Amazon:

  1. Operations platform changes SKU format from "ABC-123" to "ABC123" (no hyphen)
  2. Shopify integration expects hyphens; treats as new product instead of update
  3. Inventory sync breaks (old SKU vs. new SKU mismatch)
  4. Amazon receives variant with invalid parent SKU relationship
  5. Listing suppressed; sales halt

Cost: 4 hours of downtime during peak season = $8K lost revenue; 6 hours of engineering time debugging.

Root cause: No schema contract defining SKU format rules. System change in one place broke assumptions everywhere else.

With data contract: SKU format defined in schema (pattern: "^[A-Z]{3}-\d{3}$"). Any attempt to save SKU not matching pattern rejected at database layer before propagation.

Anatomy of a Data Contract

Component 1: Schema Definition

What it includes:

  • Field names and types: sku is string, quantity is integer, price is decimal(10,2)
  • Required vs. optional: email required, phone optional
  • Format constraints: email must match email regex; postal_code format by country
  • Enumerated values: order_status must be one of: pending, processing, shipped, delivered, cancelled
  • Relationships: line_item.product_id must reference valid product.id

Example (JSON Schema for product):

{
  "type": "object",
  "required": ["sku", "title", "price", "inventory_quantity"],
  "properties": {
    "sku": {
      "type": "string",
      "pattern": "^[A-Z]{3}-[0-9]{3,5}$",
      "description": "Format: XXX-### (3 letters, dash, 3-5 digits)"
    },
    "title": {
      "type": "string",
      "minLength": 1,
      "maxLength": 200
    },
    "price": {
      "type": "number",
      "minimum": 0.01,
      "multipleOf": 0.01
    },
    "inventory_quantity": {
      "type": "integer",
      "minimum": 0
    },
    "status": {
      "enum": ["active", "draft", "archived"]
    }
  }
}

Component 2: Validation Rules (Business Logic)

Beyond schema structure, business rules enforce domain logic:

Examples:

  • price >= cost (prevent selling below cost accidentally)
  • inventory_quantity >= reserved_quantity (can’t reserve more than available)
  • ship_date >= order_date (can’t ship before order placed)
  • discount_amount <= subtotal (discount can’t exceed order value)
  • variant.product_id must reference active product (no orphan variants)

Implementation: Validation layer in application code or database constraints.


Component 3: Error Handling Contract

When validation fails, what happens?

Bad approach (silent failure):

  • System accepts invalid data
  • Logs error to file no one reads
  • Processes with corrupted data or skips transaction silently

Good approach (fail fast, fail loud):

  • Reject invalid data immediately (HTTP 400 error for API)
  • Return specific error: "Field ’price’ must be greater than 0.01; received: 0.00"
  • Log error with context: request ID, timestamp, source system, data payload
  • Alert monitoring system if error rate exceeds threshold

Error response contract example:

{
  "error": {
    "code": "VALIDATION_FAILED",
    "message": "Product validation failed",
    "details": [
      {
        "field": "sku",
        "message": "SKU format invalid. Expected: XXX-###, received: ABC123"
      },
      {
        "field": "price",
        "message": "Price must be at least 0.01, received: 0.00"
      }
    ],
    "request_id": "req_abc123xyz",
    "timestamp": "2025-10-19T14:32:00Z"
  }
}

Component 4: Versioning and Evolution

Data contracts change over time. How do you update without breaking existing integrations?

Versioning strategies:

Option A: URL versioning

  • /v1/products (current production)
  • /v2/products (new version with breaking changes)
  • Clients migrate at their own pace; both versions supported

Option B: Header versioning

  • Accept: application/vnd.myapp.v1+json
  • Same URL, version in header

Option C: Field-level versioning

  • Additive changes only (never remove or rename fields)
  • New fields optional; deprecated fields marked but still supported
  • Client code ignores unknown fields (forward compatibility)

Recommendation: URL versioning for major breaks; additive-only changes for minor versions.

Deprecation contract:

  • Announce 90 days before breaking change
  • Support both old and new versions for 6–12 months
  • Monitor usage of old version; notify remaining users before sunset

Implementing Data Contracts Across Commerce Stack

Level 1: Database Constraints (Foundation)

Enforce at lowest level—database schema:

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  sku VARCHAR(20) NOT NULL UNIQUE,
  title VARCHAR(200) NOT NULL,
  price DECIMAL(10,2) CHECK (price >= 0.01),
  cost DECIMAL(10,2) CHECK (cost >= 0),
  inventory_quantity INTEGER CHECK (inventory_quantity >= 0),
  status VARCHAR(20) CHECK (status IN (’active’, ’draft’, ’archived’)),
  created_at TIMESTAMP DEFAULT NOW(),
  CONSTRAINT price_above_cost CHECK (price >= cost)
);

Benefits:

  • Last line of defense (can’t write invalid data even if application code fails)
  • Protects against manual SQL updates, direct database access
  • Performance: DB enforces faster than application layer

Limitations:

  • Hard to change (schema migrations required)
  • Error messages less friendly (SQL errors not user-facing)
  • Can’t enforce complex business rules (multi-table logic)

Level 2: Application Validation (Business Logic)

Enforce in application before database:

Example (Node.js with Joi validation library):

const Joi = require(’joi’);

const productSchema = Joi.object({
  sku: Joi.string().pattern(/^[A-Z]{3}-[0-9]{3,5}$/).required(),
  title: Joi.string().min(1).max(200).required(),
  price: Joi.number().min(0.01).precision(2).required(),
  cost: Joi.number().min(0).precision(2).required(),
  inventory_quantity: Joi.number().integer().min(0).required(),
  status: Joi.string().valid(’active’, ’draft’, ’archived’).required()
}).custom((value, helpers) => {
  // Business rule: price must be >= cost
  if (value.price < value.cost) {
    return helpers.error(’price.below.cost’, { price: value.price, cost: value.cost });
  }
  return value;
});

// Validation on product create/update
const { error, value } = productSchema.validate(productData);
if (error) {
  return res.status(400).json({ error: error.details });
}

Benefits:

  • Friendly error messages for users/API clients
  • Complex validation logic (cross-field, external lookups)
  • Fast iteration (no database migrations)

Limitations:

  • Can be bypassed if multiple entry points (API, admin, scripts)
  • Requires discipline to validate consistently

Level 3: API Gateway Validation (Integration Boundary)

Validate at system boundaries (API requests/responses):

Example (OpenAPI 3.0 specification):

paths:
  /products:
    post:
      summary: Create product
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: ’#/components/schemas/Product’
      responses:
        ’201’:
          description: Product created successfully
        ’400’:
          description: Validation error
          content:
            application/json:
              schema:
                $ref: ’#/components/schemas/Error’

components:
  schemas:
    Product:
      type: object
      required: [sku, title, price, cost, inventory_quantity]
      properties:
        sku:
          type: string
          pattern: ’^[A-Z]{3}-[0-9]{3,5}$’
        title:
          type: string
          minLength: 1
          maxLength: 200
        price:
          type: number
          format: decimal
          minimum: 0.01
        cost:
          type: number
          format: decimal
          minimum: 0
        inventory_quantity:
          type: integer
          minimum: 0
        status:
          type: string
          enum: [active, draft, archived]

Benefits:

  • API documentation = validation contract (OpenAPI autogenerates docs)
  • Client libraries auto-validate before sending (code generation from schema)
  • Gateway validates before hitting application (security, performance)

Level 4: Integration Middleware Validation

For platform-to-platform integrations (Shopify → operations platform → QuickBooks):

Example data flow with validation:

  1. Shopify webhook sends order → middleware
  2. Middleware validates against Shopify order schema (required fields, format)
  3. Middleware transforms to internal order format
  4. Middleware validates against internal schema
  5. Middleware sends to operations platform
  6. Operations platform validates again (defense in depth)
  7. Operations platform sends to QuickBooks (invoice format)
  8. Middleware validates QuickBooks invoice schema before transmission

Validation points = 4 (Shopify out, middleware in, middleware out, QuickBooks in)

Why multiple validations?

  • Defense in depth (one layer failing doesn’t corrupt entire system)
  • Early failure (catch errors at boundary, not deep in processing)
  • Clear ownership (each system validates its own inputs, doesn’t trust upstream)

Real-World Data Contract Examples

Contract 1: Product Catalog Sync

Scenario: Operations platform → Shopify product sync

Contract:

product_sync:
  required_fields:
    - sku (unique identifier, cannot change)
    - title (1-200 chars)
    - price (decimal, >= 0.01)
    - inventory_quantity (integer, >= 0)
    - status (active|draft|archived)
  
  optional_fields:
    - description (HTML allowed, max 10KB)
    - images (array of URLs, max 10 images)
    - vendor (string, max 100 chars)
    - tags (array of strings, max 50 tags)
  
  business_rules:
    - If status=active, must have price > 0 and inventory_quantity > 0
    - If inventory_quantity = 0, auto-set status to archived (prevent selling out-of-stock)
    - SKU cannot change once created (immutable identifier)
    - Price changes log audit trail (compliance, pricing history)
  
  error_handling:
    - Return HTTP 400 with field-level errors for validation failures
    - Return HTTP 409 for SKU conflicts (duplicate SKU)
    - Return HTTP 422 for business rule violations
    - Log all errors with source_system, timestamp, payload

Enforcement:

  • Operations platform validates before sending to Shopify API
  • Shopify API rejects if doesn’t match Shopify schema
  • Errors logged in integration dashboard with retry logic

Contract 2: Order Fulfillment Flow

Scenario: E-commerce → operations platform → 3PL

Contract:

fulfillment_order:
  required_fields:
    - order_id (unique, source system + order number)
    - customer_email (valid email format)
    - shipping_address:
        - name (1-100 chars)
        - address1 (1-200 chars)
        - city (1-100 chars)
        - province/state (2-char code or full name)
        - postal_code (format by country)
        - country (ISO 3166-1 alpha-2 code)
    - line_items (array, min 1 item):
        - sku (must exist in inventory)
        - quantity (integer, > 0)
        - price (decimal, >= 0)
    
  optional_fields:
    - shipping_address.address2
    - shipping_address.company
    - customer_phone (E.164 format if provided)
    - special_instructions (max 500 chars)
  
  business_rules:
    - All line_item.sku must have available inventory (quantity <= on_hand - reserved)
    - shipping_address must pass address validation (USPS, Canada Post, etc.)
    - If country != domestic, must have customs declaration data
    - Total order value must equal sum(line_items.price × line_items.quantity) + shipping + tax
  
  validation_stages:
    1. Schema validation (format, required fields)
    2. Inventory allocation check (availability)
    3. Address validation (deliverability)
    4. Fraud check (if first-time customer or high value)
    5. Routing determination (which 3PL based on location, inventory)
  
  error_handling:
    - Validation failure → reject order, return to source with errors
    - Inventory insufficient → hold order in queue, alert for restock
    - Address invalid → flag for customer service review (manual fix)
    - Fraud flag → hold for manual approval

Enforcement:

  • Multi-stage validation pipeline
  • Orders only move to next stage after passing previous validations
  • Exception queue for manual review (not silently dropped)

Contract 3: Inventory Sync (Multi-Location, Multi-Channel)

Scenario: 3PL → operations platform → sales channels (Shopify, Amazon, Faire)

Contract:

inventory_update:
  required_fields:
    - sku (matches product master)
    - location_id (warehouse/3PL identifier)
    - quantity_on_hand (integer, >= 0)
    - quantity_reserved (integer, >= 0)
    - timestamp (ISO 8601, must be recent within 5 min)
  
  calculated_fields:
    - quantity_available = quantity_on_hand - quantity_reserved
  
  business_rules:
    - quantity_reserved <= quantity_on_hand (can’t reserve more than exists)
    - timestamp must be <= current_time (no future-dated updates)
    - Location must be active and mapped to sales channel(s)
  
  sync_logic:
    - If quantity_available decreases, push to channels immediately (prevent overselling)
    - If quantity_available increases, batch updates (5-min intervals OK)
    - If quantity_available = 0, set all channels to out-of-stock
    - Maintain audit log: all quantity changes with source, timestamp, reason
  
  error_handling:
    - Invalid SKU → log error, skip update (don’t create phantom SKUs)
    - Negative available qty → alert immediately (data corruption)
    - Stale timestamp (>10 min old) → reject, request fresh snapshot

Enforcement:

  • Validation on receipt from 3PL
  • Channel-specific inventory allocation rules (safety stock per channel)
  • Real-time alerts for anomalies (sudden quantity drop >50 units without order)

Monitoring Data Contract Compliance

Metrics to Track

1. Validation Failure Rate

  • Target: <2% of requests fail validation
  • Alert: If >5% fail in 1-hour window (indicates upstream system issue)
  • Dashboard: Failure rate by endpoint, error type, source system

2. Error Response Time

  • Target: Validation errors returned in <200ms (fail fast)
  • Alert: If validation latency >500ms (performance degradation)

3. Contract Drift Detection

  • Monitor: Requests with unknown fields (forward compatibility test)
  • Alert: If >10% of requests have fields not in schema (schema outdated)

4. Data Quality Scoring

  • Score: % of optional fields populated (higher = better data quality)
  • Example: Orders with phone number: 85%, orders with shipping notes: 12%
  • Insight: Identify fields not being used (remove from schema?) or fields missing data (improve UI prompts)

Alerting on Contract Violations

Critical alerts (PagerDuty/Slack):

  • Validation failure rate >10% (system integration broken)
  • Downstream system rejecting >5% of requests (contract mismatch)
  • Data corruption detected (negative inventory, impossible dates)

Warning alerts (email/dashboard):

  • Validation failure rate >5% (investigate)
  • New error types appearing (schema change upstream)
  • Deprecated field usage increasing (clients not migrating)

Info alerts (dashboard only):

  • Optional field usage trends
  • Request volume by endpoint
  • Average payload size (performance monitoring)

How CommerceOS Enforces Data Contracts

CommerceOS uses defense-in-depth validation:

  1. API gateway validation: OpenAPI spec enforced at entry (Shopify, Amazon webhooks)
  2. Application-layer business rules: SKU format, price > cost, inventory >= reserved
  3. Database constraints: Final safety net for data integrity
  4. Integration middleware validation: Outbound to QuickBooks, 3PLs validated before transmission
  5. Exception dashboard: All validation failures surfaced for operator review
  6. Automated retry with backoff: Transient errors (network) retried; permanent errors (bad data) flagged

Result: 95%+ of integration errors caught and resolved without manual intervention; remaining 5% routed to exception queue with context for rapid resolution.

Frequently Asked Questions

How do I create a data contract for an existing integration that has none?

Step 1: Document current behavior (reverse-engineer schema from existing data). Step 2: Define ideal schema (required fields, formats, business rules). Step 3: Measure gap (what % of current data violates ideal schema?). Step 4: Implement validation in shadow mode (log errors, don’t reject). Step 5: Fix top error sources (coordinate with upstream system owners). Step 6: Enforce validation (reject invalid data, return errors). Timeline: 4–8 weeks for critical integration.

What happens when an upstream system sends data that violates our contract?

Option A (reject): Return HTTP 400 error with field-level details; upstream system responsible for fixing. Option B (accept with warnings): Log validation failure, process with best-effort data cleanup, alert for manual review. Choice depends on: Can upstream fix quickly? Is data critical (order) or optional (analytics event)? Is this our system or third-party beyond our control? Recommendation: Reject for critical flows (orders, payments); accept-with-warnings for non-critical (marketing events).

How strict should validation be—should I reject minor issues?

Strict validation (recommended for production): Reject anything not matching contract exactly. Lenient validation (dev/testing only): Accept data, coerce to expected format, log warnings. Why strict: Minor issues compound (phone without area code → can’t call customer → late delivery → chargeback). Exception: If data is low-criticality AND fixing upstream is prohibitively expensive, accept-with-transform (e.g., auto-format phone numbers).

How do I version data contracts when the schema needs to change?

Non-breaking changes (safe): Add optional fields, add new enum values, relax validation (reduce min length). Breaking changes (requires version bump): Remove fields, rename fields, make optional field required, tighten validation (increase min length). Strategy: Support multiple versions simultaneously (v1 and v2 URLs); deprecate old version after 6–12 months; monitor usage of old version, notify remaining clients before sunset. Never break existing integrations without notice.

Should I validate outbound data (data we send) or just inbound?

Validate both. Inbound validation protects your system from bad data. Outbound validation protects downstream systems from your bad data (and prevents blame when their system breaks). Example: Before sending invoice to QuickBooks, validate it matches QuickBooks schema. If validation fails, fix your data generation logic—don’t let QuickBooks error teach you your code is broken.

What tools help with schema definition and validation?

Schema definition: JSON Schema, OpenAPI/Swagger (REST APIs), GraphQL schema (GraphQL APIs), Protobuf (gRPC). Validation libraries: Joi (Node.js), Pydantic (Python), FluentValidation (C#), Yup (JavaScript). API gateways with validation: Kong, AWS API Gateway, Azure API Management. Documentation generation: OpenAPI → Swagger UI, Redoc; GraphQL → GraphiQL, Playground. Recommendation: OpenAPI 3.0 for REST APIs (industry standard, great tooling).

How do I handle data contracts across different teams or companies?

Internal teams: Central schema repository (Git repo); automated tests validate contract compliance; breaking changes require stakeholder approval (architecture review). External partners: Publish API documentation (OpenAPI); provide sandbox environment for testing; require integration certification before production access; versioning strategy with deprecation policy. Legal contracts: For high-value B2B integrations, SLA for schema stability (no breaking changes without 90-day notice).

What’s the performance cost of extensive validation?

Typical overhead: 5–20ms per request for schema validation (JSON Schema, Joi). Optimization: Cache compiled schemas (don’t recompile every request), validate only at boundaries (not every internal function call), use fast validation libraries (avoid regex-heavy rules if possible). When to skip validation: Internal service-to-service calls IF both services owned by same team and schemas strictly controlled. Never skip: External API requests, user input, third-party integrations.

Impact & Difficulty

Conservative Impact: 40% reduction in integration errors; 30% faster debugging when errors occur Likely Impact: 70% reduction in integration errors; 60% faster debugging; 90% of errors caught at boundary instead of deep in processing Upside Impact: 85% error reduction; near-zero silent data corruption; integration failures resolved in minutes instead of hours; team trust in data quality

Difficulty Rating: 3/5 (Requires discipline and cross-team coordination more than technical complexity)


Commerce is chaos.

Tame your tech stack with one system that brings it all together—and actually works.

Book a Demo

Share this post