Frozen snapshots of the SousLab corpus, methodology-documented, with a clean commercial license designed for model training and synthetic-data generation. Three tiers, sized to your buyer profile.
Sample within 48 hours · methodology PDF + license PDF included
$1k–$10k. Full corpus snapshot, commercial use, single project. The fastest path to a clean training corpus for an indie AI startup or hackathon team going pro.
$5k–$50k. Full corpus + quarterly refresh + custom labeling on request (allergens, dietary categories, cuisine taxonomy). For voice-ordering, recommendation, and recipe-AI teams.
$25k–$250k+. Full corpus, ongoing access, custom labeling, dedicated support, methodology-PDF revisions per request. For labs training food-aware foundation models.
✓ Permitted
✗ Not permitted
| Corpus size | 44.3M+ menu items, 449k restaurants |
| Geographic scope | United States, all 50 states |
| Language | English (with original menu-language preserved) |
| Schema | Restaurants × Items × Modifiers × Chain entities × Source provenance |
| Source tiers | tier1_a (first-party POS), tier2_a, tier2_b (aggregator) |
| Refresh cadence | Daily upstream |
| Format | Parquet (default), CSV/JSON (on request) |
| License | Commercial; no field-of-use restrictions for training |
| Methodology version | v1.0 (versioned; revisions tracked) |