Methodology v1.0

How the SousLab corpus is sourced, normalized, and audited.

This page is the public-facing summary. The full methodology PDF — with the source taxonomy, ticker mapping, audit results, and known limitations — is sent on request alongside any sample query.

Source taxonomy

The corpus is sourced from three feeds, internally tagged tier1_a, tier2_a, and tier2_b. Each feed has its own ingestion pipeline, schema normalizer, and freshness audit. The public API exposes the abstract data_tier filter — buyers can filter to first-party (tier1) only, or include aggregator-curated (tier2) for breadth. Underlying feed identities are not surfaced publicly.

TierTypeWhy it matters
tier1_aFirst-party POS — operator-published menu dataHighest signal. Reflects what the operator actually charges. Backbone of data_tier=primary.
tier2_aAggregator-curatedCoverage breadth across long-tail restaurants. Some staleness vs. tier1.
tier2_bAggregator-curatedIndependent aggregator feed; cross-validates tier2_a coverage and price points.

Refresh cadence

  • tier1_a: daily refresh, monitored. Failure alerts fire within the same day.
  • tier2_a, tier2_b: weekly refresh on a staggered schedule.
  • Chain entity layer: revalidated weekly. Ticker mapping reconciliation runs on a quarterly schedule.
  • Premium tier (Scale+ and Enterprise): same-day refresh on tier1_a + daily Parquet drops.

Chain entity layer

Restaurants are normalized to consumer-facing brands (e.g. multiple McDonald's storefronts → one chain entity). Brands are then mapped to their corporate parents and, where applicable, public tickers. The chain coverage page lists the top 25 publicly-traded restaurant tickers we cover at the entity level. Mapping methodology, location-count audit, and ticker selection criteria are detailed in the methodology PDF.

Audit results

Freshness, completeness, and coverage-by-metro audits run alongside refresh. Numbers worth publicly stating: 449,716 restaurants covered, 44.3M+ menu items indexed, top 25 publicly-traded restaurant tickers mapped at the entity level. Per-source freshness, accuracy spot-check methodology, and known coverage gaps live in the methodology PDF — granularity intentional, sent under a soft NDA on request.

Versioning

This is methodology v1.0. Major revisions trigger a re-issued PDF and notification to all licensed customers; minor revisions are tracked inline in /changelog. If you have an active license or a pending request, you'll always have the methodology version that matches the data version you received.

Want the full PDF?

Send a one-line email — name, company, what you're evaluating. Methodology PDF + sample query + license terms come back within 48 hours.