Files
maraphon-app/CLAUDE.md
T
alexei.dolgolyov 070e34b911 feat(initial-implementation): phase 0 - scraping spike findings
Anonymous scraping confirmed feasible for marathonbet.by — site is fully SSR
(nginx), no Cloudflare or JS challenge. HttpClient + AngleSharp + Polly v8 is
sufficient; Playwright not required (kept as a future-flag).

Spike outputs:
- spike/SCRAPE_FINDINGS.md  — page rendering, URL templates, anti-bot, rate
  limits, recommended scraping strategy for Phase 3.
- spike/SCHEMA_DRAFT.md     — customer-spec field → DOM selector mapping for
  Match + Period-N scope across football/basketball/tennis (hockey TBD).

Phase 1+ handoff captured in subplan + CLAUDE.md. Critical Phase 8 finding:
no public results endpoint at /su/results — phase 8 must switch to polling
event-detail until eventJsonInfo.matchIsComplete=true (deviation flagged).

Reviewer notes addressed:
- Period market outcome codes corrected to RN_H/RN_D/RN_A (not 1/draw/3) and
  market name vocabulary clarified per-sport in SCHEMA_DRAFT §3.1.
- results-page.html capture added to file list with caveat about live-landing
  score-state and unsampled hockey selectors.
2026-05-05 01:04:03 +03:00

145 lines
6.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md — maraphon-app
> Project memory for Claude Code sessions on this repository. Keep entries concise.
> Per-feature learnings are appended below by the feature-planner workflow.
## Project Overview
**maraphon-app** is a sports betting odds analyzer for marathonbet.by. It scrapes
pre-match (`/su`) and live (`/su/live`) events, persists odds snapshots over time, and
detects anomalies — especially the **odds-flip** pattern (bookmaker freezes bets then
inverts underdog/favorite coefficients).
## Architecture (Clean Architecture, 5 projects + tests)
```
Marathon.Domain ← entities, value objects, no external deps
Marathon.Application ← use cases + abstractions (IOddsScraper, IRepository, ...)
Marathon.Infrastructure ← EF Core (SQLite), scraping (AngleSharp/Playwright), Excel, Polly
Marathon.UI ← Razor Class Library (all Blazor components — host-agnostic)
Marathon.Hosts.WpfBlazor ← WPF + BlazorWebView host (replaceable for ASP.NET Core later)
```
**Key portability invariant:** All UI lives in `Marathon.UI` (Razor Class Library). The
host project (`Marathon.Hosts.WpfBlazor`) is the *only* thing that changes when migrating
to a web app — drop in an ASP.NET Core Blazor Server host that references the same RCL.
## Tech stack
- **.NET 8 LTS**, C# 12
- **EF Core 8** + SQLite (WAL mode)
- **AngleSharp** (HTML), **Playwright for .NET** (SPA fallback)
- **Polly v8** (`Microsoft.Extensions.Http.Resilience`)
- **MudBlazor** components, **Plotly.Blazor** charts
- **Serilog** logging (rolling file + console)
- **xUnit + FluentAssertions + NSubstitute**, in-memory SQLite for repo tests
## Build & test
| Command | Purpose |
|---|---|
| `dotnet build Marathon.sln` | Build all projects |
| `dotnet test Marathon.sln` | Run all tests |
| `dotnet format Marathon.sln --verify-no-changes` | Lint |
| `dotnet run --project src/Marathon.Hosts.WpfBlazor` | Run desktop app |
## Coding conventions
- Nullable reference types: **enabled** (`<Nullable>enable</Nullable>`)
- Implicit usings: enabled
- Treat warnings as errors in `Release` builds
- File-scoped namespaces
- One public type per file (except small DTOs/records grouped in a feature folder)
- Domain entities: prefer `record` for immutable data; class with private setters when
identity matters
- No mutation of domain objects after construction — return new instances
- Repositories return `IReadOnlyList<T>`, not `List<T>` or `IEnumerable<T>` (clarity on
enumeration cost)
- Tests follow `Given_When_Then` or `Should_<expected>_When_<condition>` naming
## Configuration
Every variable parameter is configurable via `appsettings.json` and overridable via
`appsettings.Local.json` (gitignored) or environment variables:
- `Scraping:PollingIntervalSeconds` (default 30)
- `Scraping:MaxConcurrentRequests` (default 4)
- `Scraping:UserAgents[]` (rotated per request)
- `Scraping:RetryPolicy:*` (Polly settings)
- `Scraping:RateLimit:RequestsPerSecond` (default 1)
- `Storage:DatabasePath` (default `./data/marathon.db`)
- `Storage:ExportDirectory` (default `./exports`)
- `Storage:SnapshotRetentionDays` (default 90)
- `Anomaly:SuspensionGapSeconds` (default 60)
- `Anomaly:OddsFlipThreshold` (default 0.30 — implied probability delta)
- `Localization:DefaultCulture` (default `ru-RU`)
A future Settings page in the UI binds to these.
## Domain model summary
- `Sport(Code, Name)` — e.g., `(6, "Баскетбол")`
- `Event(Id, SportCode, CountryCode, LeagueId, CategoryId, ScheduledAt, EventCodeFromBookmaker)`
- `OddsSnapshot(EventId, CapturedAt, Source: Pre|Live, Bets: List<Bet>)`
- `Bet(Scope: Match|Period[N], Type: Win|Draw|WinFora|Total, Side: 1|2|Less|More, Value?, Rate)`
- `EventResult(EventId, FinalScore, WinnerSide)`
- `Anomaly(EventId, DetectedAt, Kind: SuspensionFlip, Score, EvidenceTimeline)`
## Excel export schema (compliance with customer spec)
Customer TZ requires wide-table layout with columns like `Bet_Match_Win_1`,
`Bet_Period-1_Win_Fora_2_Value`, etc.
**Internal storage is normalized** (one row per Bet in `OddsSnapshots`). The Excel
exporter denormalizes to the wide format on demand. Filename pattern:
```
Marathon_<YYYY-MM-DD>_to_<YYYY-MM-DD>.xlsx
```
## Recurring Issues & Patterns
(Populated as we work — leave empty until something repeats.)
## Feature: Initial Implementation > Phase 0: Scraping Spike — Learnings
(Permanent learnings about marathonbet.by data shape, anti-bot, page structure.
For full detail see `spike/SCRAPE_FINDINGS.md` and `spike/SCHEMA_DRAFT.md`.)
- **Site is fully SSR (`Server: nginx`).** Anonymous GET with browser User-Agent
returns full HTML for `/su/`, `/su/live`, `/su/popular/<Sport>`,
`/su/betting/<event-path>`. No Cloudflare, no JS challenge.
- **Use HttpClient + AngleSharp + Polly v8** — no Playwright needed for read-only.
Keep `Scraping:UsePlaywright = false` flag for future-proofing.
- **Sport ID = `data-sport-treeId` = breadcrumb canonical ID.** Confirmed:
Basketball=6, Football=11, Tennis=22723, Hockey=43658. URL by ID:
`/su/betting/<Sport>+-+<id>` (preferred over `/su/popular/<Sport>` because the
ID is stable).
- **`EventCode` = `data-event-eventId`** (numeric, ~26-million range, stable).
`TreeId` = `data-event-treeId` (URL-routing ID, less stable). Use `EventCode`
as the entity primary key in SQLite.
- **Selection key format:** `{eventId}@{MarketName}{LineIndex?}.{Outcome}`.
Outcomes: `1`/`draw`/`3` for 3-way, `HB_H`/`HB_A` for handicap, `Under_<X>`/
`Over_<X>` for totals. Total threshold is encoded in the outcome string;
handicap value lives in `<span class="middle-simple">` text.
- **Tennis has no Draw outcome.** Domain `Bet_Match_Draw` must be nullable; Excel
exporter writes empty cell when null.
- **Date parsing:** listing shows `HH:MM` (today) or `DD <ru-month> HH:MM` (future).
Anchor with `initData.serverTime` (Moscow TZ, format `YYYY,MM,DD,HH,MM,SS`)
parsed from the embedded `<script>` blob on every scraped page.
- **Live updates:** site polls `/su/liveupdate/popular/?treeIds=...` every 3 s but
response is just `{"modified":[{"type":"refreshPage"}],...}` — re-scrape the
full event detail HTML for actual odds. Our analyzer cadence: pre-match 30 s,
live 510 s.
- **No public results / archive page** (`/su/results` → 404). Final scores must
be harvested by polling the event detail page until
`eventJsonInfo.matchIsComplete=true`, then storing `resultDescription`. Phase 8
cannot back-fill from a public archive.
- **Period scope vocabulary varies by sport:** football=`1st_Half`, basketball=
`1st_Half`/`1st_Quarter`, tennis=`1st_Set`, hockey=`1st_Period`. Domain stores
`PeriodNumber:int` and a sport-aware `PeriodScopeMapper` resolves the correct
market token at parse time.