AI training license

Commercial licenses for AI training and synthetic-data generation.

Frozen snapshots of the SousLab corpus, methodology-documented, with a clean commercial license designed for model training and synthetic-data generation. Three tiers, sized to your buyer profile.

Sample within 48 hours · methodology PDF + license PDF included

Indie AI license

$1k–$10k. Full corpus snapshot, commercial use, single project. The fastest path to a clean training corpus for an indie AI startup or hackathon team going pro.

Enterprise vertical AI

$5k–$50k. Full corpus + quarterly refresh + custom labeling on request (allergens, dietary categories, cuisine taxonomy). For voice-ordering, recommendation, and recipe-AI teams.

Foundation model lab

$25k–$250k+. Full corpus, ongoing access, custom labeling, dedicated support, methodology-PDF revisions per request. For labs training food-aware foundation models.

Pricing

Three tiers, methodology-documented, license PDF included.

Indie AI from $1k · Enterprise vertical AI $5k–$50k · Foundation model $25k+. Sample delivered within 48 hours of request.
License terms

What is and isn't permitted.

✓ Permitted

  • Model training (supervised, unsupervised, RLHF)
  • Synthetic-data generation pipelines
  • Embedding the data into derivative products and inference outputs
  • Internal evaluation, benchmarking, and audit datasets
  • Shipping models trained on the corpus to commercial customers

✗ Not permitted

  • Resale of the raw corpus, in whole or in part
  • Public republication of the corpus or substantial portions thereof
  • Reverse engineering of underlying source identities
  • Use in any application that re-aggregates the source feeds
Data card

Following HuggingFace data-card conventions.

Corpus size44.3M+ menu items, 449k restaurants
Geographic scopeUnited States, all 50 states
LanguageEnglish (with original menu-language preserved)
SchemaRestaurants × Items × Modifiers × Chain entities × Source provenance
Source tierstier1_a (first-party POS), tier2_a, tier2_b (aggregator)
Refresh cadenceDaily upstream
FormatParquet (default), CSV/JSON (on request)
LicenseCommercial; no field-of-use restrictions for training
Methodology versionv1.0 (versioned; revisions tracked)
FAQ

Common questions

Is this a perpetual license?
Yes for the snapshot you receive. Refresh access (Enterprise tier and above) is contracted separately and renews quarterly.
Can I train and ship a commercial product on this?
Yes. The license has no field-of-use restrictions for training or downstream commercial use. Resale of the raw corpus is the only meaningful exclusion.
Do you indemnify against third-party IP claims?
Standard indemnification language is part of the contract. Reach out and we'll send the specifics under NDA.
Are there usage reporting requirements?
No. Once licensed, no telemetry, no per-inference reporting. The corpus is yours for the contracted term.
Can I request custom labeling?
Yes — Enterprise tier and above. Common requests: allergens, dietary categories, cuisine taxonomy normalization, and price-band bucketing.
Get the sample

Methodology PDF + 30-day Parquet preview in your inbox in 48 hours.