Files
maraphon-app/plans/initial-implementation/CONTEXT.md
T
alexei.dolgolyov a6ff368015 feat(phase-7-backend): implement anomaly detection — SuspensionFlip detector, use case, poller, and tests
- AnomalyDetector (pure domain): detects odds-flip pattern from live snapshot
  timelines using implied-probability vectors (p=1/rate, normalised), flip score
  = max(|p_post−p_pre|), gated by both threshold AND favourite-changed test
- SuspensionInterval record: typed pair of (pre, post) OddsSnapshot bracketing a gap
- AnomalyOptions POCO (Application layer): bound to Anomaly:* config section with
  four fields (SuspensionGapSeconds=60, OddsFlipThreshold=0.30, MinSnapshotCount=3,
  DetectionIntervalSeconds=60)
- DetectAnomaliesUseCase: iterates all events, loads last-24h live snapshots, runs
  detector, persists new anomalies with 1-minute dedup window
- AnomalyDetectionPoller: BackgroundService polling every DetectionIntervalSeconds,
  gated by WorkerOptions.AnomalyDetectionEnabled (default true)
- DI wiring: DetectAnomaliesUseCase registered Scoped in ApplicationModule;
  AnomalyOptions bound + AnomalyDetectionPoller hosted in InfrastructureModule
- WorkerOptions.AnomalyDetectionEnabled added; appsettings.json updated
- 13 domain tests + 4 application tests; total 245/245 passing (no regression)
2026-05-05 13:15:50 +03:00

245 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Feature Context: Initial Implementation
## Configuration
- **Development mode:** Automated
- **Execution mode:** Orchestrator
- **Strategy:** Big Bang
- **Build:** `dotnet build Marathon.sln`
- **Test:** `dotnet test Marathon.sln`
- **Lint:** `dotnet format Marathon.sln --verify-no-changes`
- **Run:** `dotnet run --project src/Marathon.Hosts.WpfBlazor`
- **Implementer models:** Sonnet 4.6 (backend), Opus (frontend)
- **Reviewer model:** Sonnet 4.6
## Customer Constraints
- Source: marathonbet.by — anonymous scraping (no login). ToS risk acknowledged by customer.
- Output: Excel files matching customer's wide-column spec (`Bet_Match_Win_1`,
`Bet_Period-1_Win_Fora_2_Value`, etc.) with date-range filenames.
- Storage: customer accepted SQLite-with-Excel-export instead of Excel-as-database
(decided 2026-05-05).
- UI tech: Blazor Hybrid (changed from initial WPF assumption — better for web migration).
- Locale: RU + EN.
- Scope: analyze-only initially; design `IBetPlacer` extension point for future betting.
- Configurability: every variable parameter (polling, concurrency, retry, UA, retention,
thresholds, locale) goes in `appsettings.json` + Settings UI page.
## Current State
Repo just initialized. Single `main` commit with `.gitignore` + `README.md` + `CLAUDE.md`.
Working on `feature/initial-implementation` branch. No source code yet — Phase 0 starts
with scraping research, no implementation.
## Temporary Workarounds
(none yet)
## Cross-Phase Dependencies
- **Phase 1 (Domain)** is the foundation; all later phases reference domain types.
- **Phase 2 (Storage)** & **Phase 3 (Scraping)** depend only on Phase 1 — can run in parallel.
- **Phase 4 (Application + Workers)** depends on Phase 2 + Phase 3.
- **Phase 5 (UI Shell)** depends on Phase 1 only — can run in parallel with 2/3.
- **Phase 6 (Event Browsing UI)** depends on Phase 4 + Phase 5.
- **Phase 7 (Anomaly)** depends on Phase 4 (snapshot storage) + Phase 6 (UI patterns).
- **Phase 8 (Results)** depends on Phase 6.
- **Phase 9 (Packaging)** is final — runs full build + test suite.
## Deferred Work
- Bet placing (explicit out-of-scope; design extension point only).
- Authenticated scraping (anonymous now; `IOddsScraper` impl is swappable).
- Multi-bookmaker support (only marathonbet.by; abstraction allows future expansion).
- PostgreSQL backend (SQLite for now; `IRepository<T>` abstraction allows swap).
## Failed Approaches
- **Public results / archive endpoint** — does NOT exist. Tested
`https://www.marathonbet.by/su/results`, `/su/results/`, `/su/results.htm`
all return HTTP 404. No `/archive`, `/history` links anywhere in the public
HTML either. **Phase 8 deviation:** the Results loader cannot back-fill from
an archive — it must poll each event detail page until
`eventJsonInfo.matchIsComplete=true` and snapshot `resultDescription` at that
moment. Phase 8 implementer must revise the subplan accordingly.
- **JSONP `/su/liveupdate/popular/` endpoint** — exposes only refresh signals
(`{"modified":[{"type":"refreshPage"}],"updated":<ts>}`), not actual odds. Cannot
be used as a JSON odds source. Use it only as a "something changed" hint to
trigger a full event-detail re-scrape.
- **Anonymous WebSocket (STOMP)** at `/su/websocket/endpoint` is documented in
`initData.stomp` but appears to require an authenticated session
(`PUNTER-SESSION-HASH` cookie); we did not test it but the customer's anonymous
scraping constraint makes it unsuitable anyway.
## Review Findings Log
(populated by reviewers)
## Phase Execution Log
| Phase | Agent | Model | Test Writer | Parallel | Notes |
|---|---|---|---|---|---|
| Phase 0 | phase-implementer | Opus | ⏭️ Skipped (research only) | — | ✅ Done 2026-05-05. Outputs: spike/SCRAPE_FINDINGS.md + spike/SCHEMA_DRAFT.md + 7 local fixtures. Anonymous scraping confirmed feasible; HttpClient+AngleSharp recommended; no Playwright needed; no public results page found (Phase 8 deviation noted). |
| Phase 1 | phase-implementer | Sonnet 4.6 | ⏭️ Skipped (Big Bang) | — | ✅ Done 2026-05-05. 9 projects (5 src + 4 test). 96 domain tests passed. Key decisions: BetScope sealed hierarchy, ScheduledAt=UTC+3 (Moscow), OddsValue rejects zero. Deviations: slnx auto-created alongside sln, WPF App.xaml.cs needs FQ Application type. |
| Phase 2 | phase-implementer | Sonnet 4.6 | ⏭️ Skipped (Big Bang) | ✅ With 3 + 5 | — |
| Phase 3 | phase-implementer | Sonnet 4.6 | ⏭️ Skipped (Big Bang) | ✅ With 2 + 5 | — |
| Phase 4 | phase-implementer | Sonnet 4.6 | ⏭️ Skipped (Big Bang) | — | ✅ Done 2026-05-05. 4 use cases, 3 BackgroundService pollers, InfrastructureModule, ApplicationModule, reflection wiring removed. 202/202 tests green (+17 new). |
| Phase 5 | phase-implementer-frontend | Opus | ⏭️ Skipped (Big Bang) | ✅ With 2 + 3 | Uses frontend-design skill |
| Phase 6 | phase-implementer-frontend | Opus | ⏭️ Skipped (Big Bang) | — | ✅ Done 2026-05-05. PreMatch + Live + Events/Detail pages, EventListShell, SportIcon, OddsCell, OddsTimeline (Plotly.Blazor wrap), ExportDialog. EventBrowsingState + IEventBrowsingService facade. RU+EN strings under PreMatch.* / Live.* / Detail.* / Export.* / Sport.*. 228/228 tests green (+26 new bUnit). |
| Phase 7 | phase-implementer (split if needed) | Sonnet/Opus | ⏭️ Skipped (Big Bang) | — | UI portion uses Opus |
| Phase 8 | phase-implementer (split if needed) | Sonnet/Opus | ⏭️ Skipped (Big Bang) | — | UI portion uses Opus |
| Phase 9 | phase-implementer | Sonnet 4.6 | ✅ Final phase tests | — | Full build + test enforced |
## Environment & Runtime Notes
- Windows 10, PowerShell 5.1 default shell, Bash also available.
- `git` configured globally; remote `origin` = `https://git.dolgolyov-family.by/alexei.dolgolyov/maraphon-app.git`.
- Note: home directory (`C:\Users\Alexei`) is itself a git repo (likely accidental).
The maraphon-app local `.git` overrides it for this directory tree.
- .NET SDK assumed installed; if Phase 1 fails on `dotnet --version`, install or
document in CONTEXT.md.
## Implementation Notes
### Phase 7 Backend (Anomaly detection, 2026-05-05)
- **`AnomalyDetector` is pure domain — no I/O, no DI.** Constructor takes three ints/decimals
from `AnomalyOptions`; the caller (use case) materialises it per cycle.
The UI evidence panel can reconstruct the same probabilities from `EvidenceJson` without
needing to re-invoke the detector.
- **Implied probability formula:** `p_i = 1 / rate_i`, then normalise so all `p_i` sum to 1
(divorcé of the bookmaker's margin). This is the standard European odds conversion.
- **Flip score** = `max(|p_post[i] p_pre[i]|)` over Match-Win sides (p1, pDraw?, p2).
Score is clamped to `[0, 1]` before constructing `Anomaly` (domain invariant enforces ≤1).
- **Two-part gate** — an anomaly requires BOTH: (a) `flipScore ≥ OddsFlipThreshold` AND
(b) `argmax(p_pre) != argmax(p_post)`. This prevents spurious detections when one side's
probability shifts a lot but it was never the favourite.
- **Tennis / 2-way markets** — `pDraw` is `null` when no `BetType.Draw` bet is present.
The detector and `EvidenceJson` gracefully handle this (JSON field is omitted when null via
`DefaultIgnoreCondition.WhenWritingNull`).
- **`EvidenceJson` uses `System.Text.Json` with custom `JsonPropertyName` attributes** on
sealed nested records (`EvidencePayload`, `SnapshotEvidence`). Source generation was not
used at this scale — the payload is small and created infrequently.
- **`DetectAnomaliesUseCase` loads all events + last-24-h live snapshots per cycle.**
This is a deliberate simplification; a future optimisation is to track `last_run_at` per
event. Documented as 🟡 in the handoff.
- **Dedup strategy:** two anomalies are considered duplicates if they share `EventId`, `Kind`,
and their `DetectedAt` values fall within a 1-minute window. This prevents the same
suspension triggering re-insertion on consecutive detection cycles while the gap snapshot
pair remains in the 24-hour window.
- **`AnomalyOptions` placed in `Marathon.Application/Configuration/`** (not Infrastructure).
The `AnomalyDetector` itself is in `Marathon.Domain/AnomalyDetection/` but requires no
options binding — it takes plain constructor parameters.
- **`AnomalyDetectionPoller` reads `IOptionsMonitor<AnomalyOptions>` per cycle** so that
hot-reload of `DetectionIntervalSeconds` takes effect without a restart. Same pattern as
`LiveOddsPoller` reading `WorkerOptions`.
- **`Workers:AnomalyDetectionEnabled`** added to `WorkerOptions` (default `true`) and
`appsettings.json`. UI agent must add a Settings toggle for this flag.
- **New test count: +17** (13 domain + 4 application). Total: 245/245 passing.
- **Test note:** rates 1.5/2.5 produce a flip score of ~0.25 — BELOW the 0.30 threshold.
Always use 1.3/4.0 (flip score ~0.51) or steeper to guarantee detection in tests.
### Phase 6 (Event browsing UI, 2026-05-05)
- **Plotly.Blazor pinned to 5.4.1.** v7.x exists but introduces breaking changes;
5.4.1 is the latest on the .NET 8 line and works with our existing MudBlazor
7.15.0 / .NET 8.0.12 stack. The `Plotly.Blazor.LayoutLib.Margin` type clashes
with `MudBlazor.Margin` — fully qualify the layout-side type.
- **Razor source generator does NOT accept C# 11 raw string literals (`"""…"""`)**
inside `@code` blocks. The parser sees the leading `"""` as the start of a
normal string and never finds the close, producing an "Unterminated string
literal" RZ1000. Use concatenated single-quoted attribute strings instead
(see `SportIcon.razor` SVG constants).
- **Razor reserves the identifier `code`.** A `@foreach (var code in ...)`
loop is parsed as the `@code` directive, not as iteration. Use any other
identifier (`var sportCode in ...`).
- **`MudBlazor.DateRange` shadows `Marathon.Application.Storage.DateRange`**
in any file whose `_Imports.razor` brings both namespaces in. Add
`using AppDateRange = Marathon.Application.Storage.DateRange;` per-file
where the application's `DateRange` is constructed (already done in
`ExportDialog.razor` and `ExportDialogTests.cs`).
- **EventBrowsingService is Scoped, EventBrowsingState is Singleton.** The
service captures the per-circuit DI scope so EF Core's `DbContext` lifetime
works correctly; the state object holds the per-page filter records and
fires `OnChange` only when the new value !equals the old one. This split
matches Phase 5's split between `ThemeState` (singleton) and per-circuit
data services.
- **View-models, not domain entities, cross the UI boundary.** Pages bind to
`EventListItem` / `EventDetail` / `BetRow` / `OddsTimelinePoint`
records (defined in `Marathon.UI.Services`). Repositories are not exposed
to Razor components. This keeps the UI free of EF tracked graphs and
preserves Phase 5's "RCL is host-agnostic" invariant.
- **Live page reads polling cadence from `IOptionsMonitor<ScrapingSettingsForm>`.**
Phase 4's `WorkerOptions.LivePollIntervalSeconds` (drives the poller) is a
separate setting from the UI's display refresh; the latter intentionally
follows `Scraping:PollingIntervalSeconds` per the Phase 6 subplan.
- **Plotly chart memoization.** Computed signature = `(count, first ticks,
last ticks, first/last rate triples)`. Sufficient to invalidate the trace
list on any meaningful change while staying cheap during live polling.
- **bUnit shared `MarathonTestContext` now registers the fake browsing service
and the browsing state.** Phase 7 tests can extend it directly or follow the same pattern.
`Support/TestData.MoscowToday(int hour)` produces correctly-offset
`DateTimeOffset` values — domain `Event.ScheduledAt` will reject any other
offset.
### Phase 1 (Solution skeleton + Domain model, 2026-05-05)
- **.NET 10 SDK creates `.slnx` by default.** `dotnet new sln` produces `Marathon.slnx`
(new XML format), not `Marathon.sln`. A hand-crafted `Marathon.sln` was added alongside
it so that `dotnet build Marathon.sln` works as specified in the plan. Both files are
kept; prefer `Marathon.sln` for CLI commands.
- **`BetScope` is a sealed record hierarchy:** `abstract record BetScope` with
`sealed record MatchScope : BetScope` (singleton `Instance`) and
`sealed record PeriodScope(int Number) : BetScope`. Use pattern matching, not
an enum+nullable approach.
- **`Event.ScheduledAt` must be UTC+3 (Moscow), not UTC.** The domain enforces
`Offset == TimeSpan.FromHours(3)`. Phase 3 must construct `DateTimeOffset` with
`+03:00` before passing to `Event`; do NOT convert to UTC first.
- **`Directory.Build.props` must NOT set `TargetFramework`** — WpfBlazor needs
`net8.0-windows` while all other projects use `net8.0`. Each csproj owns its TFM.
- **`Marathon.Application` namespace conflicts with `System.Windows.Application`**
in WPF `App.xaml.cs`. Fix: use `System.Windows.Application` fully qualified.
Phase 5 must keep this qualification.
- **Central package management:** all `PackageReference` elements in test csproj files
must NOT include `Version=`. Versions live exclusively in `Directory.Packages.props`.
- **96 domain tests, 0 failures.** All invariants covered: SportCode, EventId,
OddsRate, OddsValue, BetScope, Bet (all 4 type combinations), OddsSnapshot,
Event (ScheduledAt offset), Anomaly.
### Phase 0 (Scraping spike, 2026-05-05)
- **Anonymous scraping is feasible** from a non-Belarus IP. No Cloudflare, no JS
challenge, no UA filtering observed. `Server: nginx`. Standard cookies only.
- **Site is fully SSR.** All needed data (event grid, full odds, breadcrumbs,
period markets) is in the raw HTML. No SPA hydration required.
- **Recommended scraper stack: HttpClient + AngleSharp + Polly v8.** Playwright is
not required for read-only scraping — keep it as an optional fallback flag
(`Scraping:UsePlaywright`) for future-proofing only.
- **Polling cadence:** site itself polls live updates every 3 s; for our analyzer,
pre-match 30 s and live 510 s is sufficient.
- **Rate-limit:** 5 sequential requests at 1 req/s pacing all returned 200 in <1 s,
no throttling. Recommend default `RequestsPerSecond=1`, `MaxConcurrent=4`.
- **Sport ID semantics:** customer's "Sport_Code = 6" (Basketball) maps to
`data-sport-treeId="6"` in the breadcrumb-canonical sport listing
(`/su/betting/Basketball+-+6`). Some sports also have a separate "category tree
ID" used inside the live grouping (e.g., 45356 for Basketball-live) — ignore
those, use only the canonical breadcrumb ID.
- **Selection key format:** `<eventId>@<MarketName>{LineIndex?}.<Outcome>`. The
market name is sport-specific (`Match_Result`, `1st_Half_Result`, `Total_Goals`,
`Total_Points`, `Total_Games`, `To_Win_Match_With_Handicap`, etc.). Total
thresholds are encoded in the outcome (`Under_3.5`, `Over_213.5`). Handicap
values are NOT in the key — they're in `<span class="middle-simple">` text.
- **Tennis has no Draw outcome** — domain `Bet_Match_Draw` must be nullable.
- **Date display ambiguity:** listing shows `HH:MM` (today) or `DD <ru-month> HH:MM`
(future). Anchor the parser on `initData.serverTime` (Moscow TZ, format
`YYYY,MM,DD,HH,MM,SS`).
- **No public results page** (`/su/results` → 404). Final scores are exposed only
on the event detail page itself via `eventJsonInfo` JSON
(`matchIsComplete`, `resultDescription`). Phase 8 must poll until completion;
cannot back-fill from an archive endpoint.
- **Probe environment:** Windows 10 + curl, geo-routed as Poland (`countryCode: PL`).
Customer in Belarus may see slightly different KYC overlays — parser must be
defensive (treat missing markets as null, never throw).
- **Captures saved locally** at `spike/captures/*.html` (gitignored): 7 fixtures
for offline parser development in Phase 3.