Anonymous scraping confirmed feasible for marathonbet.by — site is fully SSR (nginx), no Cloudflare or JS challenge. HttpClient + AngleSharp + Polly v8 is sufficient; Playwright not required (kept as a future-flag). Spike outputs: - spike/SCRAPE_FINDINGS.md — page rendering, URL templates, anti-bot, rate limits, recommended scraping strategy for Phase 3. - spike/SCHEMA_DRAFT.md — customer-spec field → DOM selector mapping for Match + Period-N scope across football/basketball/tennis (hockey TBD). Phase 1+ handoff captured in subplan + CLAUDE.md. Critical Phase 8 finding: no public results endpoint at /su/results — phase 8 must switch to polling event-detail until eventJsonInfo.matchIsComplete=true (deviation flagged). Reviewer notes addressed: - Period market outcome codes corrected to RN_H/RN_D/RN_A (not 1/draw/3) and market name vocabulary clarified per-sport in SCHEMA_DRAFT §3.1. - results-page.html capture added to file list with caveat about live-landing score-state and unsampled hockey selectors.
18 KiB
Phase 0 Spike — Domain Schema Draft
Purpose: Map every customer-spec Excel column to a concrete DOM/JSON path in marathonbet.by. Phase 1 (Domain) and Phase 3 (Scraping/parsing) consume this.
Convention: "selector" entries use AngleSharp/CSS notation. evt = the
event detail page DOM; list = the listing page DOM (top-level grid view).
1. Event Metadata
| Spec field | Source | Selector / extraction |
|---|---|---|
EventCode |
event detail page | [data-event-eventId] attribute on the outer div.coupon-row. Numeric, e.g., 26456117. Stable; use as primary key for the event in our SQLite. |
TreeId (internal) |
event detail page | [data-event-treeId] on the same div.coupon-row. Used for URL building, less stable than EventCode. |
SportCode |
breadcrumb of event detail | breadcrumbs-list .breadcrumbs-item:nth-child(2) a@href matches /su/betting/{Sport}+-+{N}. Parse N as integer. Confirmed: Basketball = 6, Football = 11. |
Sport |
breadcrumb (RU label) | breadcrumbs-list .breadcrumbs-item:nth-child(2) .breadcrumb-text → strip leading Ставки на prefix. e.g., Ставки на Баскетбол → Баскетбол. |
Country |
breadcrumb | .breadcrumbs-item:nth-child(3) .breadcrumb-text. May represent group ("Клубы. Международные") rather than literal country for international leagues — accept as-is. |
League |
breadcrumb | .breadcrumbs-item:nth-child(4) .breadcrumb-text. e.g., Лига чемпионов УЕФА, NBA. |
Category |
breadcrumb (deeper) | If breadcrumb has 5+ items beyond the event itself, join items 5..N-1 with /. e.g., Play-Offs / Semi Final / 2nd Leg. The event detail's category-label-link <h2> text also exposes this concatenated. |
EventName |
event detail | [data-event-name] attribute on div.coupon-row. e.g., Арсенал - Атлетико Мадрид. |
Team1 |
event detail | [data-event-name], split on -, take index 0. Or: .player-row.player1 .member-name [data-member-link] text. |
Team2 |
event detail | Split index 1, or .player-row.player2 .member-name [data-member-link]. |
ScheduledAt (date+time) |
event detail + listing | Time: .date-wrapper text. Two formats: HH:MM (today) or DD <ru-month> HH:MM (future, e.g., 06 мая 22:00). Anchor: initData.serverTime (Moscow TZ, format YYYY,MM,DD,HH,MM,SS) parsed and combined with the time. Title fallback: <title> and <meta name="description"> contain a Russian-formatted full date (05 мая 2026) — use as authoritative when ambiguous. |
IsLive |
event detail / listing | [data-live="true"] attribute. Live events also carry .score-state and .time elements with 2:1 and 83:30 style content. |
LiveScore |
event detail (live only) | .score-state text (2:1 (1:1) style). Inning breakdown: parse the eventJsonInfo [data-json] attribute on the hidden <td> — JSON includes mainScore, inningScore[], matchTime.seconds, matchIsComplete. |
MatchIsComplete |
event detail | Decoded JSON of [data-mutable-id="eventJsonInfo"][data-json] → .matchIsComplete boolean. Critical for Phase 8 (Results loader). |
FinalScore |
event detail (post-match) | Same eventJsonInfo JSON → .resultDescription (e.g., "2:1 (1:1)") when matchIsComplete=true. |
2. Match-Scope Bets (1×2, Handicap, Total)
The event-detail "main row" presents three primary markets in a coefficients-table:
Result (1×2), Handicap (Win-Fora), Total (Goals/Points/Games depending
on sport). These map to spec fields Bet_Match_*.
2.1 Match Win 1 / Draw / Win 2
| Spec field | data-selection-key suffix | DOM path |
|---|---|---|
Bet_Match_Win_1 |
@Match_Result.1 (football, tennis, hockey) OR @Result.1 (basketball pre-match) OR @Normal_Time_Result.1 (basketball detail) |
evt span[data-selection-key$='@Match_Result.1']@data-selection-price (decimal odds, e.g., 1.65) |
Bet_Match_Draw |
.draw outcome of same market |
evt span[data-selection-key$='@Match_Result.draw']@data-selection-price. NULL for tennis (2-way market, no draw). |
Bet_Match_Win_2 |
.3 outcome |
evt span[data-selection-key$='@Match_Result.3']@data-selection-price |
Sport variance:
- Football, Tennis, Table-tennis:
Match_Result. - Basketball: in pre-match landing, label is
Match_Winner_Including_All_OT.HB_H/HB_A(2-way, OT included). On the detail page, bothNormal_Time_Result.{1,draw,3}(3-way, reg time) andMatch_Winner_Including_All_OT.{HB_H,HB_A}(2-way, OT included) appear. Recommendation: treatMatch_Winner_Including_All_OTas the canonical Win-1 / Win-2 (no Draw) when a 3-wayResultmarket is absent; fall back to draw-includedNormal_Time_Resultwhen present. - Hockey: TBD — verify in Phase 3 with an actual hockey event capture.
Recommendation for Phase 1 domain: define BetType.WinDraw allowing nullable
Draw. The Excel exporter writes empty cell when Draw is null.
2.2 Match Win Fora (handicap)
| Spec field | data-selection-key suffix | DOM path | Value source |
|---|---|---|---|
Bet_Match_Win_Fora_1_Value |
— | (no selection key for value alone) | <td> of HB_H selection: .middle-simple text inside the <div class="nowrap simple-price"> (e.g., (-1.0)). Strip parens, parse as decimal. |
Bet_Match_Win_Fora_1_Rate |
@To_Win_Match_With_Handicap{N}.HB_H (or @Match_Handicap.HB_H variant) |
[data-selection-key$='@To_Win_Match_With_Handicap.HB_H']@data-selection-price |
— |
Bet_Match_Win_Fora_2_Value |
— | .middle-simple next to HB_A selection (e.g., (+1.0)). |
— |
Bet_Match_Win_Fora_2_Rate |
@To_Win_Match_With_Handicap{N}.HB_A |
[data-selection-key$='@To_Win_Match_With_Handicap.HB_A']@data-selection-price |
— |
Tennis variant: uses @To_Win_Match_With_Handicap_By_Games{N}.HB_H/HB_A.
The handicap is in games not points — emit Value as-is, the unit is implicit
in the sport.
Multi-line handicap: the site offers many lines (To_Win_Match_With_Handicap0,
...1, ...2, ...), each a different handicap value. The customer spec wants only
the main line (the one displayed in the listing's main row). Phase 3 should:
- On listing pages, take the handicap displayed in the
coefficients-tabledata-market-type="HANDICAP"cell. - On event detail, identify the "main" line as the one without a numeric suffix
(
@To_Win_Match_With_Handicap.HB_H) or with suffix0if both exist — sample shows bothTo_Win_Match_With_Handicap.HB_Hand...0.HB_H. Heuristic: pick the line whose handicap value is closest to ±1.0 from the favorite, OR explicitly prefer the no-suffix variant; fall back to suffix0. - Optional: capture the full handicap ladder into a separate normalized table so anomaly detection can use the spread, even if Excel only exports the main line.
2.3 Match Total Less / More
| Spec field | data-selection-key suffix | DOM path |
|---|---|---|
Bet_Match_Total_Less_Value |
— | .middle-simple next to the Меньше selection (e.g., 3.5, 213.5). |
Bet_Match_Total_Less_Rate |
@Total_{Goals|Points|Games}{N}.Under_<X> |
[data-selection-key^='<eventId>@Total_'][data-selection-key$='.Under_<X>']@data-selection-price. Use the row whose Value equals the chosen total threshold. |
Bet_Match_Total_More_Value |
— | Same value as Less (paired). |
Bet_Match_Total_More_Rate |
@Total_{Goals|Points|Games}{N}.Over_<X> |
[data-selection-key$='.Over_<X>']@data-selection-price |
Sport vocabulary:
- Football:
Total_Goals - Basketball:
Total_Points - Tennis:
Total_Games - Hockey:
Total_Goals(TBD) - Volleyball / handball: TBD
Choosing the "main" total line: customer spec wants ONE Total Value + Less/More rates per event. The site offers ~20 different total thresholds per event. The listing page main row exposes the "headline" total (the one the bookmaker chose to show). Heuristic:
- On listing: read the
data-market-type="TOTAL"cell directly. - On event detail: find the row labeled in
coefficients-row(visible main view), not incoefficients-hidden-row. Thedata-mutable-id="S_3_1_european"/S_3_3_europeanpair is the main line. - Fall back to picking the line whose Under/Over rates are closest to 2.00 each (the "balanced" line — most representative of bookmaker's expectation).
- As with handicap, capture the full ladder for analysis even if exports only one row.
3. Period-N Scope Bets
Period markets follow the same pattern as match markets but with a period prefix
in the market token. Examples for Period-1 (1st half of football, 1st quarter
of basketball, 1st set of tennis):
3.1 Period-N Win 1 / Draw / Win 2
CORRECTED FROM CAPTURE EVIDENCE (2026-05-05): Period result markets use
RN_H/RN_D/RN_Aoutcome codes (Reduced Numerals: Home / Draw / Away), NOT the1/draw/3codes used by@Match_Result. Market names also vary: football usesResult_-_1st_Half(with separator dashes); basketball and tennis use1st_Half_Result0/1st_Quarter_Result0/1st_Set_Result0(note the literal0suffix on the market name — line index for the period result market). Phase 3 parser must use these exact tokens.
| Customer field | Football (1st Half) | Basketball (1st Half or Quarter) | Tennis (1st Set) | Hockey (1st Period) |
|---|---|---|---|---|
Bet_Period-1_Win_1 |
@Result_-_1st_Half.RN_H |
@1st_Half_Result0.RN_H (halves) or @1st_Quarter_Result0.RN_H (quarters) |
@1st_Set_Result0.RN_H |
@1st_Period_Result0.RN_H (TBD verify on hockey event) |
Bet_Period-1_Draw |
@Result_-_1st_Half.RN_D |
@1st_Half_Result0.RN_D / @1st_Quarter_Result0.RN_D |
(NULL — no draw) | @1st_Period_Result0.RN_D (TBD) |
Bet_Period-1_Win_2 |
@Result_-_1st_Half.RN_A |
@1st_Half_Result0.RN_A / @1st_Quarter_Result0.RN_A |
@1st_Set_Result0.RN_A |
@1st_Period_Result0.RN_A (TBD) |
The market token vocabulary differs by sport:
- Football:
Result_-_<ordinal>_<unit>(e.g.,Result_-_1st_Half,Result_-_2nd_Half). - Basketball / Tennis / Hockey:
<ordinal>_<unit>_Result0(e.g.,1st_Half_Result0,1st_Quarter_Result0,1st_Set_Result0,1st_Period_Result0). The0suffix is required. - Note: non-period markets like
@Match_Result.1and@Match_Result.drawstill use the1/draw/3outcome codes — theRN_*codes are specific to period/half/quarter/set markets.
Period count by sport (default mapping for Period-N):
- Football: N ∈ {1, 2}
- Basketball: configurable — halves (N ∈ {1,2}) or quarters (N ∈ {1,2,3,4}). Default to halves.
- Tennis: N ∈ {1, 2, ...} until
<i>th_Set_Resultselection is absent. Cap at 5 for Grand Slams. - Hockey: N ∈ {1, 2, 3}.
3.2 Period-N Win Fora
Same as match handicap, with period prefix:
| Sport | Selection key |
|---|---|
| Football | @To_Win_1st_Half_With_Handicap{N}.HB_H / .HB_A |
| Basketball | @To_Win_1st_Half_With_Handicap{N}.HB_* (or _1st_Quarter_) |
| Tennis | @To_Win_1st_Set_With_Handicap{N}.HB_* |
| Hockey | @To_Win_1st_Period_With_Handicap{N}.HB_* (TBD verify) |
Value extraction: same .middle-simple text as match handicap.
3.3 Period-N Total Less / More
This is the least uniform market. Observed:
| Sport | Period-1 Total selection key |
|---|---|
| Football | (search HTML directly — Phase 3 should parse the "Тотал тайма" tab) Likely @1st_Half_Total_Goals{N}.Under_<X> / .Over_<X>. |
| Basketball | Per-quarter total exposed as separate market in the "Тоталы" tab; sample event did not show clean 1st_Half_Total_Points keys — see SCRAPE_FINDINGS.md §6 risk #4. May need to fall back to NULL for basketball Period-N Total in some leagues. |
| Tennis | @1st_Set_Total_Games{N}.Under_<X> / .Over_<X> — confirmed in sample. |
| Hockey | @1st_Period_Total_Goals... (TBD verify). |
Phase 3 robustness rule: if a period-N market is absent in the parsed HTML,
emit null for the corresponding rate/value. Never throw. The Excel exporter
writes empty cell.
4. Live Counterparts
When the same scope is captured from the live site (/su/live or live-flagged
events on /su/), the spec wants column prefix Live_* instead of Bet_*.
Important: live events use the SAME data-selection-key naming conventions.
The distinguishing signal is data-live="true" on the outer div.coupon-row and
the URL the snapshot was scraped from (/su/live).
Examples:
Live_Match_Win_1←[data-selection-key$='@Match_Result.1']from live pageLive_Match_Win_Fora_1_Value,Live_Match_Win_Fora_1_Rate← same DOM, same logicLive_Period-1_Win_1← same asBet_Period-1_Win_1but captured from live event
Implementation: the parser does not change. The application service simply
records Source = Live | PreMatch on each OddsSnapshot and the Excel exporter
denormalizes pre-match snapshots to Bet_* columns and live snapshots to Live_*
columns at write time.
5. Field Coverage Matrix (spec → confidence)
| Field family | Football | Basketball | Tennis | Hockey | Notes |
|---|---|---|---|---|---|
Match_Win_1/2, Match_Draw |
✅ confirmed | ⚠️ Win-1/2 confirmed; Draw conditional on Normal_Time_Result presence |
✅ Win-1/2 confirmed; Draw is null | ❓ verify Phase 3 | — |
Match_Win_Fora_* |
✅ | ✅ | ✅ (in games) | ❓ | "Main line" heuristic needed (§2.2) |
Match_Total_* |
✅ Goals | ✅ Points | ✅ Games | ❓ | "Main line" heuristic needed (§2.3) |
Period-1_Win_* |
✅ Half | ✅ Half / Quarter | ✅ Set | ❓ Period | basketball mode is configurable |
Period-1_Win_Fora_* |
✅ | ✅ | ✅ | ❓ | — |
Period-1_Total_* |
⚠️ structure verified, exact key TBD | ⚠️ may be absent for some games | ✅ Set | ❓ | risk: emit null where absent |
Period-2/3/4_* |
(Period-2 only) | ✅ all | up to actual played sets | ❓ | — |
Live_* (any of above) |
same parser | same | same | same | distinguished only by data-live flag + scrape URL |
Legend: ✅ confirmed in spike sample, ⚠️ partial / heuristic needed, ❓ Phase 3 must verify.
6. Suggested Domain Types (Phase 1 input)
// Marathon.Domain
public enum BetScope { Match, Period }
public enum BetType { Win, Draw, WinFora, Total }
public enum BetSide { Side1, Side2, Less, More } // Side1=home/W1, Side2=away/W2
public sealed record Sport(int Code, string NameRu, string NameEn);
public sealed record League(int TreeId, string NameRu, int SportCode);
public sealed record Event(
long EventCode, // marathonbet's data-event-eventId
int TreeId, // for URL building
int SportCode,
int LeagueTreeId,
string Country, // breadcrumb position 3
string? Category, // joined breadcrumb 5..N-1
string Team1,
string Team2,
DateTimeOffset ScheduledAt, // anchored on initData.serverTime
string DetailUrl);
public sealed record Bet(
BetScope Scope,
int? PeriodNumber, // null when Scope=Match
BetType Type,
BetSide? Side, // null for Type=Draw
decimal? Value, // handicap/total threshold; null for Win/Draw
decimal Rate); // decimal odds (e.g., 1.65)
public sealed record OddsSnapshot(
long EventCode,
DateTimeOffset CapturedAt,
SnapshotSource Source, // Pre | Live
IReadOnlyList<Bet> Bets);
public enum SnapshotSource { PreMatch, Live }
Phase 1 will refine names, but this captures the data shape Phase 3 produces.
7. Excel Column Generation (Phase 4 / 9 reference)
The Excel exporter generates wide rows by joining all Bets of an OddsSnapshot
into named columns. Pseudocode:
foreach snapshot:
row.EventCode = snapshot.EventCode
row.SportCode = event.SportCode
row.Sport = event.Sport.NameRu
row.Country = event.Country
row.League = event.League.NameRu
row.Category = event.Category
row.ScheduledAt = event.ScheduledAt
prefix = snapshot.Source == PreMatch ? "Bet_" : "Live_"
// Match scope
row[prefix+"Match_Win_1"] = bet.Where(scope=Match, type=Win, side=Side1).Rate
row[prefix+"Match_Draw"] = bet.Where(scope=Match, type=Draw).Rate
row[prefix+"Match_Win_2"] = bet.Where(scope=Match, type=Win, side=Side2).Rate
row[prefix+"Match_Win_Fora_1_Value"] = bet.Where(scope=Match, type=WinFora, side=Side1).Value
row[prefix+"Match_Win_Fora_1_Rate"] = bet.Where(scope=Match, type=WinFora, side=Side1).Rate
row[prefix+"Match_Win_Fora_2_Value"] = bet.Where(scope=Match, type=WinFora, side=Side2).Value
row[prefix+"Match_Win_Fora_2_Rate"] = bet.Where(scope=Match, type=WinFora, side=Side2).Rate
row[prefix+"Match_Total_Less_Value"] = bet.Where(scope=Match, type=Total, side=Less).Value
row[prefix+"Match_Total_Less_Rate"] = bet.Where(scope=Match, type=Total, side=Less).Rate
row[prefix+"Match_Total_More_Value"] = bet.Where(scope=Match, type=Total, side=More).Value
row[prefix+"Match_Total_More_Rate"] = bet.Where(scope=Match, type=Total, side=More).Rate
// Period scope (foreach period N exposed for that sport)
for N in 1..MaxPeriodForSport(sportCode):
same fields with key {prefix}Period-{N}_*
null when bet absent
Spec column order is left to Phase 4 (ExcelExporter). Recommend:
Date, Time, Sport, Country, League, Category, Event, EventCode, Bet_Match_*..., Bet_Period-1_*..., Bet_Period-2_*..., Live_Match_*..., Live_Period-N_*...
8. Decisions Pending Customer Confirmation
- Basketball Period mapping — halves (default) or quarters? Spec says
"Period-N" but is silent on which N applies. Recommend halves (
N ∈ {1,2}) with a quarter mode opt-in viaappsettings.Sports.Basketball.PeriodMode. - Tennis Draw column — emit empty / 0 / "—"? Recommend empty cell.
- Handicap "main line" rule — pick the listing's main row, OR the no-suffix selection, OR the spread closest to bookmaker-implied probability 50/50?
- Total "main line" rule — same as above.
- Field name capitalization — spec uses
Bet_Match_Win_Fora_1_Valueexactly. Recommend matching exactly (case-sensitive) for compatibility with downstream pivot tables / scripts.