diff --git a/CLAUDE.md b/CLAUDE.md index 99e5d4e..16808d1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -103,3 +103,42 @@ Marathon__to_.xlsx ## Recurring Issues & Patterns (Populated as we work — leave empty until something repeats.) + +## Feature: Initial Implementation > Phase 0: Scraping Spike — Learnings + +(Permanent learnings about marathonbet.by data shape, anti-bot, page structure. +For full detail see `spike/SCRAPE_FINDINGS.md` and `spike/SCHEMA_DRAFT.md`.) + +- **Site is fully SSR (`Server: nginx`).** Anonymous GET with browser User-Agent + returns full HTML for `/su/`, `/su/live`, `/su/popular/`, + `/su/betting/`. No Cloudflare, no JS challenge. +- **Use HttpClient + AngleSharp + Polly v8** — no Playwright needed for read-only. + Keep `Scraping:UsePlaywright = false` flag for future-proofing. +- **Sport ID = `data-sport-treeId` = breadcrumb canonical ID.** Confirmed: + Basketball=6, Football=11, Tennis=22723, Hockey=43658. URL by ID: + `/su/betting/+-+` (preferred over `/su/popular/` because the + ID is stable). +- **`EventCode` = `data-event-eventId`** (numeric, ~26-million range, stable). + `TreeId` = `data-event-treeId` (URL-routing ID, less stable). Use `EventCode` + as the entity primary key in SQLite. +- **Selection key format:** `{eventId}@{MarketName}{LineIndex?}.{Outcome}`. + Outcomes: `1`/`draw`/`3` for 3-way, `HB_H`/`HB_A` for handicap, `Under_`/ + `Over_` for totals. Total threshold is encoded in the outcome string; + handicap value lives in `` text. +- **Tennis has no Draw outcome.** Domain `Bet_Match_Draw` must be nullable; Excel + exporter writes empty cell when null. +- **Date parsing:** listing shows `HH:MM` (today) or `DD HH:MM` (future). + Anchor with `initData.serverTime` (Moscow TZ, format `YYYY,MM,DD,HH,MM,SS`) + parsed from the embedded `