Files
maraphon-app/CLAUDE.md
T
alexei.dolgolyov 070e34b911 feat(initial-implementation): phase 0 - scraping spike findings
Anonymous scraping confirmed feasible for marathonbet.by — site is fully SSR
(nginx), no Cloudflare or JS challenge. HttpClient + AngleSharp + Polly v8 is
sufficient; Playwright not required (kept as a future-flag).

Spike outputs:
- spike/SCRAPE_FINDINGS.md  — page rendering, URL templates, anti-bot, rate
  limits, recommended scraping strategy for Phase 3.
- spike/SCHEMA_DRAFT.md     — customer-spec field → DOM selector mapping for
  Match + Period-N scope across football/basketball/tennis (hockey TBD).

Phase 1+ handoff captured in subplan + CLAUDE.md. Critical Phase 8 finding:
no public results endpoint at /su/results — phase 8 must switch to polling
event-detail until eventJsonInfo.matchIsComplete=true (deviation flagged).

Reviewer notes addressed:
- Period market outcome codes corrected to RN_H/RN_D/RN_A (not 1/draw/3) and
  market name vocabulary clarified per-sport in SCHEMA_DRAFT §3.1.
- results-page.html capture added to file list with caveat about live-landing
  score-state and unsampled hockey selectors.
2026-05-05 01:04:03 +03:00

6.7 KiB
Raw Blame History

CLAUDE.md — maraphon-app

Project memory for Claude Code sessions on this repository. Keep entries concise. Per-feature learnings are appended below by the feature-planner workflow.

Project Overview

maraphon-app is a sports betting odds analyzer for marathonbet.by. It scrapes pre-match (/su) and live (/su/live) events, persists odds snapshots over time, and detects anomalies — especially the odds-flip pattern (bookmaker freezes bets then inverts underdog/favorite coefficients).

Architecture (Clean Architecture, 5 projects + tests)

Marathon.Domain          ← entities, value objects, no external deps
   ↑
Marathon.Application     ← use cases + abstractions (IOddsScraper, IRepository, ...)
   ↑
Marathon.Infrastructure  ← EF Core (SQLite), scraping (AngleSharp/Playwright), Excel, Polly
Marathon.UI              ← Razor Class Library (all Blazor components — host-agnostic)
   ↑
Marathon.Hosts.WpfBlazor ← WPF + BlazorWebView host (replaceable for ASP.NET Core later)

Key portability invariant: All UI lives in Marathon.UI (Razor Class Library). The host project (Marathon.Hosts.WpfBlazor) is the only thing that changes when migrating to a web app — drop in an ASP.NET Core Blazor Server host that references the same RCL.

Tech stack

  • .NET 8 LTS, C# 12
  • EF Core 8 + SQLite (WAL mode)
  • AngleSharp (HTML), Playwright for .NET (SPA fallback)
  • Polly v8 (Microsoft.Extensions.Http.Resilience)
  • MudBlazor components, Plotly.Blazor charts
  • Serilog logging (rolling file + console)
  • xUnit + FluentAssertions + NSubstitute, in-memory SQLite for repo tests

Build & test

Command Purpose
dotnet build Marathon.sln Build all projects
dotnet test Marathon.sln Run all tests
dotnet format Marathon.sln --verify-no-changes Lint
dotnet run --project src/Marathon.Hosts.WpfBlazor Run desktop app

Coding conventions

  • Nullable reference types: enabled (<Nullable>enable</Nullable>)
  • Implicit usings: enabled
  • Treat warnings as errors in Release builds
  • File-scoped namespaces
  • One public type per file (except small DTOs/records grouped in a feature folder)
  • Domain entities: prefer record for immutable data; class with private setters when identity matters
  • No mutation of domain objects after construction — return new instances
  • Repositories return IReadOnlyList<T>, not List<T> or IEnumerable<T> (clarity on enumeration cost)
  • Tests follow Given_When_Then or Should_<expected>_When_<condition> naming

Configuration

Every variable parameter is configurable via appsettings.json and overridable via appsettings.Local.json (gitignored) or environment variables:

  • Scraping:PollingIntervalSeconds (default 30)
  • Scraping:MaxConcurrentRequests (default 4)
  • Scraping:UserAgents[] (rotated per request)
  • Scraping:RetryPolicy:* (Polly settings)
  • Scraping:RateLimit:RequestsPerSecond (default 1)
  • Storage:DatabasePath (default ./data/marathon.db)
  • Storage:ExportDirectory (default ./exports)
  • Storage:SnapshotRetentionDays (default 90)
  • Anomaly:SuspensionGapSeconds (default 60)
  • Anomaly:OddsFlipThreshold (default 0.30 — implied probability delta)
  • Localization:DefaultCulture (default ru-RU)

A future Settings page in the UI binds to these.

Domain model summary

  • Sport(Code, Name) — e.g., (6, "Баскетбол")
  • Event(Id, SportCode, CountryCode, LeagueId, CategoryId, ScheduledAt, EventCodeFromBookmaker)
  • OddsSnapshot(EventId, CapturedAt, Source: Pre|Live, Bets: List<Bet>)
  • Bet(Scope: Match|Period[N], Type: Win|Draw|WinFora|Total, Side: 1|2|Less|More, Value?, Rate)
  • EventResult(EventId, FinalScore, WinnerSide)
  • Anomaly(EventId, DetectedAt, Kind: SuspensionFlip, Score, EvidenceTimeline)

Excel export schema (compliance with customer spec)

Customer TZ requires wide-table layout with columns like Bet_Match_Win_1, Bet_Period-1_Win_Fora_2_Value, etc.

Internal storage is normalized (one row per Bet in OddsSnapshots). The Excel exporter denormalizes to the wide format on demand. Filename pattern:

Marathon_<YYYY-MM-DD>_to_<YYYY-MM-DD>.xlsx

Recurring Issues & Patterns

(Populated as we work — leave empty until something repeats.)

Feature: Initial Implementation > Phase 0: Scraping Spike — Learnings

(Permanent learnings about marathonbet.by data shape, anti-bot, page structure. For full detail see spike/SCRAPE_FINDINGS.md and spike/SCHEMA_DRAFT.md.)

  • Site is fully SSR (Server: nginx). Anonymous GET with browser User-Agent returns full HTML for /su/, /su/live, /su/popular/<Sport>, /su/betting/<event-path>. No Cloudflare, no JS challenge.
  • Use HttpClient + AngleSharp + Polly v8 — no Playwright needed for read-only. Keep Scraping:UsePlaywright = false flag for future-proofing.
  • Sport ID = data-sport-treeId = breadcrumb canonical ID. Confirmed: Basketball=6, Football=11, Tennis=22723, Hockey=43658. URL by ID: /su/betting/<Sport>+-+<id> (preferred over /su/popular/<Sport> because the ID is stable).
  • EventCode = data-event-eventId (numeric, ~26-million range, stable). TreeId = data-event-treeId (URL-routing ID, less stable). Use EventCode as the entity primary key in SQLite.
  • Selection key format: {eventId}@{MarketName}{LineIndex?}.{Outcome}. Outcomes: 1/draw/3 for 3-way, HB_H/HB_A for handicap, Under_<X>/ Over_<X> for totals. Total threshold is encoded in the outcome string; handicap value lives in <span class="middle-simple"> text.
  • Tennis has no Draw outcome. Domain Bet_Match_Draw must be nullable; Excel exporter writes empty cell when null.
  • Date parsing: listing shows HH:MM (today) or DD <ru-month> HH:MM (future). Anchor with initData.serverTime (Moscow TZ, format YYYY,MM,DD,HH,MM,SS) parsed from the embedded <script> blob on every scraped page.
  • Live updates: site polls /su/liveupdate/popular/?treeIds=... every 3 s but response is just {"modified":[{"type":"refreshPage"}],...} — re-scrape the full event detail HTML for actual odds. Our analyzer cadence: pre-match 30 s, live 510 s.
  • No public results / archive page (/su/results → 404). Final scores must be harvested by polling the event detail page until eventJsonInfo.matchIsComplete=true, then storing resultDescription. Phase 8 cannot back-fill from a public archive.
  • Period scope vocabulary varies by sport: football=1st_Half, basketball= 1st_Half/1st_Quarter, tennis=1st_Set, hockey=1st_Period. Domain stores PeriodNumber:int and a sport-aware PeriodScopeMapper resolves the correct market token at parse time.