# Phase 0 Spike — Domain Schema Draft **Purpose:** Map every customer-spec Excel column to a concrete DOM/JSON path in marathonbet.by. Phase 1 (Domain) and Phase 3 (Scraping/parsing) consume this. **Convention:** "selector" entries use AngleSharp/CSS notation. `evt` = the event detail page DOM; `list` = the listing page DOM (top-level grid view). --- ## 1. Event Metadata | Spec field | Source | Selector / extraction | |---|---|---| | `EventCode` | event detail page | `[data-event-eventId]` attribute on the outer `div.coupon-row`. Numeric, e.g., `26456117`. **Stable; use as primary key for the event in our SQLite.** | | `TreeId` (internal) | event detail page | `[data-event-treeId]` on the same `div.coupon-row`. Used for URL building, less stable than `EventCode`. | | `SportCode` | breadcrumb of event detail | `breadcrumbs-list .breadcrumbs-item:nth-child(2) a@href` matches `/su/betting/{Sport}+-+{N}`. Parse `N` as integer. Confirmed: Basketball = 6, Football = 11. | | `Sport` | breadcrumb (RU label) | `breadcrumbs-list .breadcrumbs-item:nth-child(2) .breadcrumb-text` → strip leading `Ставки на ` prefix. e.g., `Ставки на Баскетбол` → `Баскетбол`. | | `Country` | breadcrumb | `.breadcrumbs-item:nth-child(3) .breadcrumb-text`. May represent group ("Клубы. Международные") rather than literal country for international leagues — accept as-is. | | `League` | breadcrumb | `.breadcrumbs-item:nth-child(4) .breadcrumb-text`. e.g., `Лига чемпионов УЕФА`, `NBA`. | | `Category` | breadcrumb (deeper) | If breadcrumb has 5+ items beyond the event itself, join items 5..N-1 with ` / `. e.g., `Play-Offs / Semi Final / 2nd Leg`. The event detail's `category-label-link` `

` text also exposes this concatenated. | | `EventName` | event detail | `[data-event-name]` attribute on `div.coupon-row`. e.g., `Арсенал - Атлетико Мадрид`. | | `Team1` | event detail | `[data-event-name]`, split on ` - `, take index 0. Or: `.player-row.player1 .member-name [data-member-link]` text. | | `Team2` | event detail | Split index 1, or `.player-row.player2 .member-name [data-member-link]`. | | `ScheduledAt` (date+time) | event detail + listing | **Time:** `.date-wrapper` text. Two formats: `HH:MM` (today) or `DD HH:MM` (future, e.g., `06 мая 22:00`). **Anchor:** `initData.serverTime` (Moscow TZ, format `YYYY,MM,DD,HH,MM,SS`) parsed and combined with the time. **Title fallback:** `` and `<meta name="description">` contain a Russian-formatted full date (`05 мая 2026`) — use as authoritative when ambiguous. | | `IsLive` | event detail / listing | `[data-live="true"]` attribute. Live events also carry `.score-state` and `.time` elements with `2:1` and `83:30` style content. | | `LiveScore` | event detail (live only) | `.score-state` text (`2:1 (1:1)` style). Inning breakdown: parse the `eventJsonInfo` `[data-json]` attribute on the hidden `<td>` — JSON includes `mainScore`, `inningScore[]`, `matchTime.seconds`, `matchIsComplete`. | | `MatchIsComplete` | event detail | Decoded JSON of `[data-mutable-id="eventJsonInfo"][data-json]` → `.matchIsComplete` boolean. Critical for Phase 8 (Results loader). | | `FinalScore` | event detail (post-match) | Same `eventJsonInfo` JSON → `.resultDescription` (e.g., `"2:1 (1:1)"`) when `matchIsComplete=true`. | --- ## 2. Match-Scope Bets (1×2, Handicap, Total) The event-detail "main row" presents three primary markets in a `coefficients-table`: **Result** (1×2), **Handicap** (Win-Fora), **Total** (Goals/Points/Games depending on sport). These map to spec fields `Bet_Match_*`. ### 2.1 Match Win 1 / Draw / Win 2 | Spec field | data-selection-key suffix | DOM path | |---|---|---| | `Bet_Match_Win_1` | `@Match_Result.1` (football, tennis, hockey) **OR** `@Result.1` (basketball pre-match) **OR** `@Normal_Time_Result.1` (basketball detail) | `evt span[data-selection-key$='@Match_Result.1']@data-selection-price` (decimal odds, e.g., `1.65`) | | `Bet_Match_Draw` | `.draw` outcome of same market | `evt span[data-selection-key$='@Match_Result.draw']@data-selection-price`. **NULL for tennis** (2-way market, no draw). | | `Bet_Match_Win_2` | `.3` outcome | `evt span[data-selection-key$='@Match_Result.3']@data-selection-price` | **Sport variance:** - Football, Tennis, Table-tennis: `Match_Result`. - Basketball: in pre-match landing, label is `Match_Winner_Including_All_OT.HB_H/HB_A` (2-way, OT included). On the detail page, both `Normal_Time_Result.{1,draw,3}` (3-way, reg time) and `Match_Winner_Including_All_OT.{HB_H,HB_A}` (2-way, OT included) appear. **Recommendation:** treat `Match_Winner_Including_All_OT` as the canonical Win-1 / Win-2 (no Draw) when a 3-way `Result` market is absent; fall back to draw-included `Normal_Time_Result` when present. - Hockey: TBD — verify in Phase 3 with an actual hockey event capture. **Recommendation for Phase 1 domain:** define `BetType.WinDraw` allowing nullable `Draw`. The Excel exporter writes empty cell when `Draw` is null. ### 2.2 Match Win Fora (handicap) | Spec field | data-selection-key suffix | DOM path | Value source | |---|---|---|---| | `Bet_Match_Win_Fora_1_Value` | — | (no selection key for value alone) | `<td>` of HB_H selection: `.middle-simple` text inside the `<div class="nowrap simple-price">` (e.g., `(-1.0)`). Strip parens, parse as `decimal`. | | `Bet_Match_Win_Fora_1_Rate` | `@To_Win_Match_With_Handicap{N}.HB_H` (or `@Match_Handicap.HB_H` variant) | `[data-selection-key$='@To_Win_Match_With_Handicap.HB_H']@data-selection-price` | — | | `Bet_Match_Win_Fora_2_Value` | — | `.middle-simple` next to HB_A selection (e.g., `(+1.0)`). | — | | `Bet_Match_Win_Fora_2_Rate` | `@To_Win_Match_With_Handicap{N}.HB_A` | `[data-selection-key$='@To_Win_Match_With_Handicap.HB_A']@data-selection-price` | — | **Tennis variant:** uses `@To_Win_Match_With_Handicap_By_Games{N}.HB_H/HB_A`. The handicap is in **games** not points — emit `Value` as-is, the unit is implicit in the sport. **Multi-line handicap:** the site offers many lines (`To_Win_Match_With_Handicap0`, `...1`, `...2`, ...), each a different handicap value. The customer spec wants only the **main line** (the one displayed in the listing's main row). Phase 3 should: 1. On listing pages, take the handicap displayed in the `coefficients-table` `data-market-type="HANDICAP"` cell. 2. On event detail, identify the "main" line as the one without a numeric suffix (`@To_Win_Match_With_Handicap.HB_H`) or with suffix `0` if both exist — sample shows both `To_Win_Match_With_Handicap.HB_H` and `...0.HB_H`. Heuristic: pick the line whose handicap value is closest to ±1.0 from the favorite, OR explicitly prefer the no-suffix variant; fall back to suffix `0`. 3. Optional: capture the full handicap ladder into a separate normalized table so anomaly detection can use the spread, even if Excel only exports the main line. ### 2.3 Match Total Less / More | Spec field | data-selection-key suffix | DOM path | |---|---|---| | `Bet_Match_Total_Less_Value` | — | `.middle-simple` next to the `Меньше` selection (e.g., `3.5`, `213.5`). | | `Bet_Match_Total_Less_Rate` | `@Total_{Goals\|Points\|Games}{N}.Under_<X>` | `[data-selection-key^='<eventId>@Total_'][data-selection-key$='.Under_<X>']@data-selection-price`. Use the row whose Value equals the chosen total threshold. | | `Bet_Match_Total_More_Value` | — | Same value as Less (paired). | | `Bet_Match_Total_More_Rate` | `@Total_{Goals\|Points\|Games}{N}.Over_<X>` | `[data-selection-key$='.Over_<X>']@data-selection-price` | **Sport vocabulary:** - Football: `Total_Goals` - Basketball: `Total_Points` - Tennis: `Total_Games` - Hockey: `Total_Goals` (TBD) - Volleyball / handball: TBD **Choosing the "main" total line:** customer spec wants ONE Total Value + Less/More rates per event. The site offers ~20 different total thresholds per event. The listing page main row exposes the "headline" total (the one the bookmaker chose to show). **Heuristic:** 1. On listing: read the `data-market-type="TOTAL"` cell directly. 2. On event detail: find the row labeled in `coefficients-row` (visible main view), not in `coefficients-hidden-row`. The `data-mutable-id="S_3_1_european"` / `S_3_3_european` pair is the main line. 3. Fall back to picking the line whose Under/Over rates are closest to **2.00** each (the "balanced" line — most representative of bookmaker's expectation). 4. As with handicap, capture the full ladder for analysis even if exports only one row. --- ## 3. Period-N Scope Bets Period markets follow the same pattern as match markets but with a period prefix in the market token. Examples for `Period-1` (1st half of football, 1st quarter of basketball, 1st set of tennis): ### 3.1 Period-N Win 1 / Draw / Win 2 > **CORRECTED FROM CAPTURE EVIDENCE (2026-05-05):** Period result markets use > `RN_H` / `RN_D` / `RN_A` outcome codes (Reduced Numerals: Home / Draw / Away), > NOT the `1` / `draw` / `3` codes used by `@Match_Result`. Market names also > vary: football uses `Result_-_1st_Half` (with separator dashes); basketball and > tennis use `1st_Half_Result0` / `1st_Quarter_Result0` / `1st_Set_Result0` > (note the literal `0` suffix on the market name — line index for the period > result market). Phase 3 parser must use these exact tokens. | Customer field | Football (1st Half) | Basketball (1st Half *or* Quarter) | Tennis (1st Set) | Hockey (1st Period) | |---|---|---|---|---| | `Bet_Period-1_Win_1` | `@Result_-_1st_Half.RN_H` | `@1st_Half_Result0.RN_H` (halves) **or** `@1st_Quarter_Result0.RN_H` (quarters) | `@1st_Set_Result0.RN_H` | `@1st_Period_Result0.RN_H` (TBD verify on hockey event) | | `Bet_Period-1_Draw` | `@Result_-_1st_Half.RN_D` | `@1st_Half_Result0.RN_D` / `@1st_Quarter_Result0.RN_D` | (NULL — no draw) | `@1st_Period_Result0.RN_D` (TBD) | | `Bet_Period-1_Win_2` | `@Result_-_1st_Half.RN_A` | `@1st_Half_Result0.RN_A` / `@1st_Quarter_Result0.RN_A` | `@1st_Set_Result0.RN_A` | `@1st_Period_Result0.RN_A` (TBD) | The market token vocabulary differs by sport: - **Football:** `Result_-_<ordinal>_<unit>` (e.g., `Result_-_1st_Half`, `Result_-_2nd_Half`). - **Basketball / Tennis / Hockey:** `<ordinal>_<unit>_Result0` (e.g., `1st_Half_Result0`, `1st_Quarter_Result0`, `1st_Set_Result0`, `1st_Period_Result0`). The `0` suffix is required. - **Note:** non-period markets like `@Match_Result.1` and `@Match_Result.draw` still use the `1`/`draw`/`3` outcome codes — the `RN_*` codes are specific to period/half/quarter/set markets. **Period count by sport** (default mapping for `Period-N`): - Football: N ∈ {1, 2} - Basketball: configurable — halves (N ∈ {1,2}) or quarters (N ∈ {1,2,3,4}). **Default to halves.** - Tennis: N ∈ {1, 2, ...} until `<i>th_Set_Result` selection is absent. Cap at 5 for Grand Slams. - Hockey: N ∈ {1, 2, 3}. ### 3.2 Period-N Win Fora Same as match handicap, with period prefix: | Sport | Selection key | |---|---| | Football | `@To_Win_1st_Half_With_Handicap{N}.HB_H` / `.HB_A` | | Basketball | `@To_Win_1st_Half_With_Handicap{N}.HB_*` (or `_1st_Quarter_`) | | Tennis | `@To_Win_1st_Set_With_Handicap{N}.HB_*` | | Hockey | `@To_Win_1st_Period_With_Handicap{N}.HB_*` (TBD verify) | Value extraction: same `.middle-simple` text as match handicap. ### 3.3 Period-N Total Less / More This is the **least uniform** market. Observed: | Sport | Period-1 Total selection key | |---|---| | Football | (search HTML directly — Phase 3 should parse the "Тотал тайма" tab) Likely `@1st_Half_Total_Goals{N}.Under_<X>` / `.Over_<X>`. | | Basketball | Per-quarter total exposed as separate market in the "Тоталы" tab; sample event did not show clean `1st_Half_Total_Points` keys — see SCRAPE_FINDINGS.md §6 risk #4. **May need to fall back to NULL** for basketball Period-N Total in some leagues. | | Tennis | `@1st_Set_Total_Games{N}.Under_<X>` / `.Over_<X>` — confirmed in sample. | | Hockey | `@1st_Period_Total_Goals...` (TBD verify). | **Phase 3 robustness rule:** if a period-N market is absent in the parsed HTML, emit `null` for the corresponding rate/value. Never throw. The Excel exporter writes empty cell. --- ## 4. Live Counterparts When the same scope is captured from the **live** site (`/su/live` or live-flagged events on `/su/`), the spec wants column prefix `Live_*` instead of `Bet_*`. **Important:** live events use the SAME `data-selection-key` naming conventions. The distinguishing signal is `data-live="true"` on the outer `div.coupon-row` and the URL the snapshot was scraped from (`/su/live`). Examples: - `Live_Match_Win_1` ← `[data-selection-key$='@Match_Result.1']` from live page - `Live_Match_Win_Fora_1_Value`, `Live_Match_Win_Fora_1_Rate` ← same DOM, same logic - `Live_Period-1_Win_1` ← same as `Bet_Period-1_Win_1` but captured from live event **Implementation:** the parser does not change. The application service simply records `Source = Live | PreMatch` on each `OddsSnapshot` and the Excel exporter denormalizes pre-match snapshots to `Bet_*` columns and live snapshots to `Live_*` columns at write time. --- ## 5. Field Coverage Matrix (spec → confidence) | Field family | Football | Basketball | Tennis | Hockey | Notes | |---|---|---|---|---|---| | `Match_Win_1/2`, `Match_Draw` | ✅ confirmed | ⚠️ Win-1/2 confirmed; Draw conditional on `Normal_Time_Result` presence | ✅ Win-1/2 confirmed; **Draw is null** | ❓ verify Phase 3 | — | | `Match_Win_Fora_*` | ✅ | ✅ | ✅ (in games) | ❓ | "Main line" heuristic needed (§2.2) | | `Match_Total_*` | ✅ Goals | ✅ Points | ✅ Games | ❓ | "Main line" heuristic needed (§2.3) | | `Period-1_Win_*` | ✅ Half | ✅ Half / Quarter | ✅ Set | ❓ Period | basketball mode is configurable | | `Period-1_Win_Fora_*` | ✅ | ✅ | ✅ | ❓ | — | | `Period-1_Total_*` | ⚠️ structure verified, exact key TBD | ⚠️ may be absent for some games | ✅ Set | ❓ | risk: emit null where absent | | `Period-2/3/4_*` | (Period-2 only) | ✅ all | up to actual played sets | ❓ | — | | `Live_*` (any of above) | same parser | same | same | same | distinguished only by `data-live` flag + scrape URL | Legend: ✅ confirmed in spike sample, ⚠️ partial / heuristic needed, ❓ Phase 3 must verify. --- ## 6. Suggested Domain Types (Phase 1 input) ```csharp // Marathon.Domain public enum BetScope { Match, Period } public enum BetType { Win, Draw, WinFora, Total } public enum BetSide { Side1, Side2, Less, More } // Side1=home/W1, Side2=away/W2 public sealed record Sport(int Code, string NameRu, string NameEn); public sealed record League(int TreeId, string NameRu, int SportCode); public sealed record Event( long EventCode, // marathonbet's data-event-eventId int TreeId, // for URL building int SportCode, int LeagueTreeId, string Country, // breadcrumb position 3 string? Category, // joined breadcrumb 5..N-1 string Team1, string Team2, DateTimeOffset ScheduledAt, // anchored on initData.serverTime string DetailUrl); public sealed record Bet( BetScope Scope, int? PeriodNumber, // null when Scope=Match BetType Type, BetSide? Side, // null for Type=Draw decimal? Value, // handicap/total threshold; null for Win/Draw decimal Rate); // decimal odds (e.g., 1.65) public sealed record OddsSnapshot( long EventCode, DateTimeOffset CapturedAt, SnapshotSource Source, // Pre | Live IReadOnlyList<Bet> Bets); public enum SnapshotSource { PreMatch, Live } ``` Phase 1 will refine names, but this captures the data shape Phase 3 produces. --- ## 7. Excel Column Generation (Phase 4 / 9 reference) The Excel exporter generates wide rows by joining all `Bet`s of an `OddsSnapshot` into named columns. Pseudocode: ``` foreach snapshot: row.EventCode = snapshot.EventCode row.SportCode = event.SportCode row.Sport = event.Sport.NameRu row.Country = event.Country row.League = event.League.NameRu row.Category = event.Category row.ScheduledAt = event.ScheduledAt prefix = snapshot.Source == PreMatch ? "Bet_" : "Live_" // Match scope row[prefix+"Match_Win_1"] = bet.Where(scope=Match, type=Win, side=Side1).Rate row[prefix+"Match_Draw"] = bet.Where(scope=Match, type=Draw).Rate row[prefix+"Match_Win_2"] = bet.Where(scope=Match, type=Win, side=Side2).Rate row[prefix+"Match_Win_Fora_1_Value"] = bet.Where(scope=Match, type=WinFora, side=Side1).Value row[prefix+"Match_Win_Fora_1_Rate"] = bet.Where(scope=Match, type=WinFora, side=Side1).Rate row[prefix+"Match_Win_Fora_2_Value"] = bet.Where(scope=Match, type=WinFora, side=Side2).Value row[prefix+"Match_Win_Fora_2_Rate"] = bet.Where(scope=Match, type=WinFora, side=Side2).Rate row[prefix+"Match_Total_Less_Value"] = bet.Where(scope=Match, type=Total, side=Less).Value row[prefix+"Match_Total_Less_Rate"] = bet.Where(scope=Match, type=Total, side=Less).Rate row[prefix+"Match_Total_More_Value"] = bet.Where(scope=Match, type=Total, side=More).Value row[prefix+"Match_Total_More_Rate"] = bet.Where(scope=Match, type=Total, side=More).Rate // Period scope (foreach period N exposed for that sport) for N in 1..MaxPeriodForSport(sportCode): same fields with key {prefix}Period-{N}_* null when bet absent ``` Spec column order is left to Phase 4 (`ExcelExporter`). Recommend: `Date, Time, Sport, Country, League, Category, Event, EventCode, Bet_Match_*..., Bet_Period-1_*..., Bet_Period-2_*..., Live_Match_*..., Live_Period-N_*...` --- ## 8. Decisions Pending Customer Confirmation 1. **Basketball Period mapping** — halves (default) or quarters? Spec says "Period-N" but is silent on which N applies. Recommend halves (`N ∈ {1,2}`) with a quarter mode opt-in via `appsettings.Sports.Basketball.PeriodMode`. 2. **Tennis Draw column** — emit empty / 0 / "—"? Recommend empty cell. 3. **Handicap "main line" rule** — pick the listing's main row, OR the no-suffix selection, OR the spread closest to bookmaker-implied probability 50/50? 4. **Total "main line" rule** — same as above. 5. **Field name capitalization** — spec uses `Bet_Match_Win_Fora_1_Value` exactly. Recommend matching exactly (case-sensitive) for compatibility with downstream pivot tables / scripts.