Anonymous scraping confirmed feasible for marathonbet.by — site is fully SSR (nginx), no Cloudflare or JS challenge. HttpClient + AngleSharp + Polly v8 is sufficient; Playwright not required (kept as a future-flag). Spike outputs: - spike/SCRAPE_FINDINGS.md — page rendering, URL templates, anti-bot, rate limits, recommended scraping strategy for Phase 3. - spike/SCHEMA_DRAFT.md — customer-spec field → DOM selector mapping for Match + Period-N scope across football/basketball/tennis (hockey TBD). Phase 1+ handoff captured in subplan + CLAUDE.md. Critical Phase 8 finding: no public results endpoint at /su/results — phase 8 must switch to polling event-detail until eventJsonInfo.matchIsComplete=true (deviation flagged). Reviewer notes addressed: - Period market outcome codes corrected to RN_H/RN_D/RN_A (not 1/draw/3) and market name vocabulary clarified per-sport in SCHEMA_DRAFT §3.1. - results-page.html capture added to file list with caveat about live-landing score-state and unsampled hockey selectors.
6.7 KiB
CLAUDE.md — maraphon-app
Project memory for Claude Code sessions on this repository. Keep entries concise. Per-feature learnings are appended below by the feature-planner workflow.
Project Overview
maraphon-app is a sports betting odds analyzer for marathonbet.by. It scrapes
pre-match (/su) and live (/su/live) events, persists odds snapshots over time, and
detects anomalies — especially the odds-flip pattern (bookmaker freezes bets then
inverts underdog/favorite coefficients).
Architecture (Clean Architecture, 5 projects + tests)
Marathon.Domain ← entities, value objects, no external deps
↑
Marathon.Application ← use cases + abstractions (IOddsScraper, IRepository, ...)
↑
Marathon.Infrastructure ← EF Core (SQLite), scraping (AngleSharp/Playwright), Excel, Polly
Marathon.UI ← Razor Class Library (all Blazor components — host-agnostic)
↑
Marathon.Hosts.WpfBlazor ← WPF + BlazorWebView host (replaceable for ASP.NET Core later)
Key portability invariant: All UI lives in Marathon.UI (Razor Class Library). The
host project (Marathon.Hosts.WpfBlazor) is the only thing that changes when migrating
to a web app — drop in an ASP.NET Core Blazor Server host that references the same RCL.
Tech stack
- .NET 8 LTS, C# 12
- EF Core 8 + SQLite (WAL mode)
- AngleSharp (HTML), Playwright for .NET (SPA fallback)
- Polly v8 (
Microsoft.Extensions.Http.Resilience) - MudBlazor components, Plotly.Blazor charts
- Serilog logging (rolling file + console)
- xUnit + FluentAssertions + NSubstitute, in-memory SQLite for repo tests
Build & test
| Command | Purpose |
|---|---|
dotnet build Marathon.sln |
Build all projects |
dotnet test Marathon.sln |
Run all tests |
dotnet format Marathon.sln --verify-no-changes |
Lint |
dotnet run --project src/Marathon.Hosts.WpfBlazor |
Run desktop app |
Coding conventions
- Nullable reference types: enabled (
<Nullable>enable</Nullable>) - Implicit usings: enabled
- Treat warnings as errors in
Releasebuilds - File-scoped namespaces
- One public type per file (except small DTOs/records grouped in a feature folder)
- Domain entities: prefer
recordfor immutable data; class with private setters when identity matters - No mutation of domain objects after construction — return new instances
- Repositories return
IReadOnlyList<T>, notList<T>orIEnumerable<T>(clarity on enumeration cost) - Tests follow
Given_When_ThenorShould_<expected>_When_<condition>naming
Configuration
Every variable parameter is configurable via appsettings.json and overridable via
appsettings.Local.json (gitignored) or environment variables:
Scraping:PollingIntervalSeconds(default 30)Scraping:MaxConcurrentRequests(default 4)Scraping:UserAgents[](rotated per request)Scraping:RetryPolicy:*(Polly settings)Scraping:RateLimit:RequestsPerSecond(default 1)Storage:DatabasePath(default./data/marathon.db)Storage:ExportDirectory(default./exports)Storage:SnapshotRetentionDays(default 90)Anomaly:SuspensionGapSeconds(default 60)Anomaly:OddsFlipThreshold(default 0.30 — implied probability delta)Localization:DefaultCulture(defaultru-RU)
A future Settings page in the UI binds to these.
Domain model summary
Sport(Code, Name)— e.g.,(6, "Баскетбол")Event(Id, SportCode, CountryCode, LeagueId, CategoryId, ScheduledAt, EventCodeFromBookmaker)OddsSnapshot(EventId, CapturedAt, Source: Pre|Live, Bets: List<Bet>)Bet(Scope: Match|Period[N], Type: Win|Draw|WinFora|Total, Side: 1|2|Less|More, Value?, Rate)EventResult(EventId, FinalScore, WinnerSide)Anomaly(EventId, DetectedAt, Kind: SuspensionFlip, Score, EvidenceTimeline)
Excel export schema (compliance with customer spec)
Customer TZ requires wide-table layout with columns like Bet_Match_Win_1,
Bet_Period-1_Win_Fora_2_Value, etc.
Internal storage is normalized (one row per Bet in OddsSnapshots). The Excel
exporter denormalizes to the wide format on demand. Filename pattern:
Marathon_<YYYY-MM-DD>_to_<YYYY-MM-DD>.xlsx
Recurring Issues & Patterns
(Populated as we work — leave empty until something repeats.)
Feature: Initial Implementation > Phase 0: Scraping Spike — Learnings
(Permanent learnings about marathonbet.by data shape, anti-bot, page structure.
For full detail see spike/SCRAPE_FINDINGS.md and spike/SCHEMA_DRAFT.md.)
- Site is fully SSR (
Server: nginx). Anonymous GET with browser User-Agent returns full HTML for/su/,/su/live,/su/popular/<Sport>,/su/betting/<event-path>. No Cloudflare, no JS challenge. - Use HttpClient + AngleSharp + Polly v8 — no Playwright needed for read-only.
Keep
Scraping:UsePlaywright = falseflag for future-proofing. - Sport ID =
data-sport-treeId= breadcrumb canonical ID. Confirmed: Basketball=6, Football=11, Tennis=22723, Hockey=43658. URL by ID:/su/betting/<Sport>+-+<id>(preferred over/su/popular/<Sport>because the ID is stable). EventCode=data-event-eventId(numeric, ~26-million range, stable).TreeId=data-event-treeId(URL-routing ID, less stable). UseEventCodeas the entity primary key in SQLite.- Selection key format:
{eventId}@{MarketName}{LineIndex?}.{Outcome}. Outcomes:1/draw/3for 3-way,HB_H/HB_Afor handicap,Under_<X>/Over_<X>for totals. Total threshold is encoded in the outcome string; handicap value lives in<span class="middle-simple">text. - Tennis has no Draw outcome. Domain
Bet_Match_Drawmust be nullable; Excel exporter writes empty cell when null. - Date parsing: listing shows
HH:MM(today) orDD <ru-month> HH:MM(future). Anchor withinitData.serverTime(Moscow TZ, formatYYYY,MM,DD,HH,MM,SS) parsed from the embedded<script>blob on every scraped page. - Live updates: site polls
/su/liveupdate/popular/?treeIds=...every 3 s but response is just{"modified":[{"type":"refreshPage"}],...}— re-scrape the full event detail HTML for actual odds. Our analyzer cadence: pre-match 30 s, live 5–10 s. - No public results / archive page (
/su/results→ 404). Final scores must be harvested by polling the event detail page untileventJsonInfo.matchIsComplete=true, then storingresultDescription. Phase 8 cannot back-fill from a public archive. - Period scope vocabulary varies by sport: football=
1st_Half, basketball=1st_Half/1st_Quarter, tennis=1st_Set, hockey=1st_Period. Domain storesPeriodNumber:intand a sport-awarePeriodScopeMapperresolves the correct market token at parse time.