070e34b911
Anonymous scraping confirmed feasible for marathonbet.by — site is fully SSR (nginx), no Cloudflare or JS challenge. HttpClient + AngleSharp + Polly v8 is sufficient; Playwright not required (kept as a future-flag). Spike outputs: - spike/SCRAPE_FINDINGS.md — page rendering, URL templates, anti-bot, rate limits, recommended scraping strategy for Phase 3. - spike/SCHEMA_DRAFT.md — customer-spec field → DOM selector mapping for Match + Period-N scope across football/basketball/tennis (hockey TBD). Phase 1+ handoff captured in subplan + CLAUDE.md. Critical Phase 8 finding: no public results endpoint at /su/results — phase 8 must switch to polling event-detail until eventJsonInfo.matchIsComplete=true (deviation flagged). Reviewer notes addressed: - Period market outcome codes corrected to RN_H/RN_D/RN_A (not 1/draw/3) and market name vocabulary clarified per-sport in SCHEMA_DRAFT §3.1. - results-page.html capture added to file list with caveat about live-landing score-state and unsampled hockey selectors.
145 lines
6.7 KiB
Markdown
145 lines
6.7 KiB
Markdown
# CLAUDE.md — maraphon-app
|
||
|
||
> Project memory for Claude Code sessions on this repository. Keep entries concise.
|
||
> Per-feature learnings are appended below by the feature-planner workflow.
|
||
|
||
## Project Overview
|
||
|
||
**maraphon-app** is a sports betting odds analyzer for marathonbet.by. It scrapes
|
||
pre-match (`/su`) and live (`/su/live`) events, persists odds snapshots over time, and
|
||
detects anomalies — especially the **odds-flip** pattern (bookmaker freezes bets then
|
||
inverts underdog/favorite coefficients).
|
||
|
||
## Architecture (Clean Architecture, 5 projects + tests)
|
||
|
||
```
|
||
Marathon.Domain ← entities, value objects, no external deps
|
||
↑
|
||
Marathon.Application ← use cases + abstractions (IOddsScraper, IRepository, ...)
|
||
↑
|
||
Marathon.Infrastructure ← EF Core (SQLite), scraping (AngleSharp/Playwright), Excel, Polly
|
||
Marathon.UI ← Razor Class Library (all Blazor components — host-agnostic)
|
||
↑
|
||
Marathon.Hosts.WpfBlazor ← WPF + BlazorWebView host (replaceable for ASP.NET Core later)
|
||
```
|
||
|
||
**Key portability invariant:** All UI lives in `Marathon.UI` (Razor Class Library). The
|
||
host project (`Marathon.Hosts.WpfBlazor`) is the *only* thing that changes when migrating
|
||
to a web app — drop in an ASP.NET Core Blazor Server host that references the same RCL.
|
||
|
||
## Tech stack
|
||
|
||
- **.NET 8 LTS**, C# 12
|
||
- **EF Core 8** + SQLite (WAL mode)
|
||
- **AngleSharp** (HTML), **Playwright for .NET** (SPA fallback)
|
||
- **Polly v8** (`Microsoft.Extensions.Http.Resilience`)
|
||
- **MudBlazor** components, **Plotly.Blazor** charts
|
||
- **Serilog** logging (rolling file + console)
|
||
- **xUnit + FluentAssertions + NSubstitute**, in-memory SQLite for repo tests
|
||
|
||
## Build & test
|
||
|
||
| Command | Purpose |
|
||
|---|---|
|
||
| `dotnet build Marathon.sln` | Build all projects |
|
||
| `dotnet test Marathon.sln` | Run all tests |
|
||
| `dotnet format Marathon.sln --verify-no-changes` | Lint |
|
||
| `dotnet run --project src/Marathon.Hosts.WpfBlazor` | Run desktop app |
|
||
|
||
## Coding conventions
|
||
|
||
- Nullable reference types: **enabled** (`<Nullable>enable</Nullable>`)
|
||
- Implicit usings: enabled
|
||
- Treat warnings as errors in `Release` builds
|
||
- File-scoped namespaces
|
||
- One public type per file (except small DTOs/records grouped in a feature folder)
|
||
- Domain entities: prefer `record` for immutable data; class with private setters when
|
||
identity matters
|
||
- No mutation of domain objects after construction — return new instances
|
||
- Repositories return `IReadOnlyList<T>`, not `List<T>` or `IEnumerable<T>` (clarity on
|
||
enumeration cost)
|
||
- Tests follow `Given_When_Then` or `Should_<expected>_When_<condition>` naming
|
||
|
||
## Configuration
|
||
|
||
Every variable parameter is configurable via `appsettings.json` and overridable via
|
||
`appsettings.Local.json` (gitignored) or environment variables:
|
||
|
||
- `Scraping:PollingIntervalSeconds` (default 30)
|
||
- `Scraping:MaxConcurrentRequests` (default 4)
|
||
- `Scraping:UserAgents[]` (rotated per request)
|
||
- `Scraping:RetryPolicy:*` (Polly settings)
|
||
- `Scraping:RateLimit:RequestsPerSecond` (default 1)
|
||
- `Storage:DatabasePath` (default `./data/marathon.db`)
|
||
- `Storage:ExportDirectory` (default `./exports`)
|
||
- `Storage:SnapshotRetentionDays` (default 90)
|
||
- `Anomaly:SuspensionGapSeconds` (default 60)
|
||
- `Anomaly:OddsFlipThreshold` (default 0.30 — implied probability delta)
|
||
- `Localization:DefaultCulture` (default `ru-RU`)
|
||
|
||
A future Settings page in the UI binds to these.
|
||
|
||
## Domain model summary
|
||
|
||
- `Sport(Code, Name)` — e.g., `(6, "Баскетбол")`
|
||
- `Event(Id, SportCode, CountryCode, LeagueId, CategoryId, ScheduledAt, EventCodeFromBookmaker)`
|
||
- `OddsSnapshot(EventId, CapturedAt, Source: Pre|Live, Bets: List<Bet>)`
|
||
- `Bet(Scope: Match|Period[N], Type: Win|Draw|WinFora|Total, Side: 1|2|Less|More, Value?, Rate)`
|
||
- `EventResult(EventId, FinalScore, WinnerSide)`
|
||
- `Anomaly(EventId, DetectedAt, Kind: SuspensionFlip, Score, EvidenceTimeline)`
|
||
|
||
## Excel export schema (compliance with customer spec)
|
||
|
||
Customer TZ requires wide-table layout with columns like `Bet_Match_Win_1`,
|
||
`Bet_Period-1_Win_Fora_2_Value`, etc.
|
||
|
||
**Internal storage is normalized** (one row per Bet in `OddsSnapshots`). The Excel
|
||
exporter denormalizes to the wide format on demand. Filename pattern:
|
||
|
||
```
|
||
Marathon_<YYYY-MM-DD>_to_<YYYY-MM-DD>.xlsx
|
||
```
|
||
|
||
## Recurring Issues & Patterns
|
||
|
||
(Populated as we work — leave empty until something repeats.)
|
||
|
||
## Feature: Initial Implementation > Phase 0: Scraping Spike — Learnings
|
||
|
||
(Permanent learnings about marathonbet.by data shape, anti-bot, page structure.
|
||
For full detail see `spike/SCRAPE_FINDINGS.md` and `spike/SCHEMA_DRAFT.md`.)
|
||
|
||
- **Site is fully SSR (`Server: nginx`).** Anonymous GET with browser User-Agent
|
||
returns full HTML for `/su/`, `/su/live`, `/su/popular/<Sport>`,
|
||
`/su/betting/<event-path>`. No Cloudflare, no JS challenge.
|
||
- **Use HttpClient + AngleSharp + Polly v8** — no Playwright needed for read-only.
|
||
Keep `Scraping:UsePlaywright = false` flag for future-proofing.
|
||
- **Sport ID = `data-sport-treeId` = breadcrumb canonical ID.** Confirmed:
|
||
Basketball=6, Football=11, Tennis=22723, Hockey=43658. URL by ID:
|
||
`/su/betting/<Sport>+-+<id>` (preferred over `/su/popular/<Sport>` because the
|
||
ID is stable).
|
||
- **`EventCode` = `data-event-eventId`** (numeric, ~26-million range, stable).
|
||
`TreeId` = `data-event-treeId` (URL-routing ID, less stable). Use `EventCode`
|
||
as the entity primary key in SQLite.
|
||
- **Selection key format:** `{eventId}@{MarketName}{LineIndex?}.{Outcome}`.
|
||
Outcomes: `1`/`draw`/`3` for 3-way, `HB_H`/`HB_A` for handicap, `Under_<X>`/
|
||
`Over_<X>` for totals. Total threshold is encoded in the outcome string;
|
||
handicap value lives in `<span class="middle-simple">` text.
|
||
- **Tennis has no Draw outcome.** Domain `Bet_Match_Draw` must be nullable; Excel
|
||
exporter writes empty cell when null.
|
||
- **Date parsing:** listing shows `HH:MM` (today) or `DD <ru-month> HH:MM` (future).
|
||
Anchor with `initData.serverTime` (Moscow TZ, format `YYYY,MM,DD,HH,MM,SS`)
|
||
parsed from the embedded `<script>` blob on every scraped page.
|
||
- **Live updates:** site polls `/su/liveupdate/popular/?treeIds=...` every 3 s but
|
||
response is just `{"modified":[{"type":"refreshPage"}],...}` — re-scrape the
|
||
full event detail HTML for actual odds. Our analyzer cadence: pre-match 30 s,
|
||
live 5–10 s.
|
||
- **No public results / archive page** (`/su/results` → 404). Final scores must
|
||
be harvested by polling the event detail page until
|
||
`eventJsonInfo.matchIsComplete=true`, then storing `resultDescription`. Phase 8
|
||
cannot back-fill from a public archive.
|
||
- **Period scope vocabulary varies by sport:** football=`1st_Half`, basketball=
|
||
`1st_Half`/`1st_Quarter`, tennis=`1st_Set`, hockey=`1st_Period`. Domain stores
|
||
`PeriodNumber:int` and a sport-aware `PeriodScopeMapper` resolves the correct
|
||
market token at parse time.
|