Commit Graph

8 Commits

Author SHA1 Message Date
alexei.dolgolyov 004dbeae8b fix(scraping): live page lacks data-event-path and uses category sport IDs
Previously LiveEventsParser returned 0 events from /su/live because two
real differences between the live page and the pre-match listing weren't
handled:

1. Live rows omit data-event-path entirely. They expose only
   data-event-treeId, and the bookmaker routes live events under
   /su/live/<treeId> rather than /su/betting/<...>.

2. The closest data-sport-treeId ancestor on the live page is a
   category-tree wrapper (26418=Football-live, 45356=Basketball-live, …)
   instead of the canonical breadcrumb sport ID (11/6/22723/43658) the
   rest of the app uses. The pre-match listing carries the canonical
   ID directly.

Changes:

* EventListingParserBase.ParseRow: data-event-path becomes optional. For
  live rows we synthesize EventPath = "live/<treeId>" from
  data-event-treeId (validated as digits-only). Pre-match validation is
  unchanged.

* New ExtractSportCodeFromLive walks ancestors looking for a sport-tree
  ID and maps it through a small live-id → canonical-id table covering
  the four scoped sports. Out-of-scope sports (cybersport, volleyball,
  table tennis) are intentionally left unmapped — they keep their raw
  category ID and the UI renders them via SportLabels as "Sport <N>".

* MarathonbetScraper.ResolveEventDetailPath: dispatches between
  /su/live/<treeId> and /su/betting/<...> based on the EventPath prefix.
  Removes the duplicated path-building between ScrapeEventOddsAsync and
  ScrapeEventResultAsync.

* New regression tests covering all three behaviors against a real
  /su/live capture (16 events, 5 sport categories).

Also: rewrites the stale "Disabled until Phase 8" hint copy on the
Settings.Workers.ResultsPollerEnabled flag — Phase 8 shipped, so the
results poller is safe to enable.
2026-05-09 16:07:03 +03:00
alexei.dolgolyov 286b55986b perf(scraping): parallel HTTP fan-out, sequential DB persist (HIGH)
The Pull*UseCase implementations issued one HTTP request at a time despite
Scraping:MaxConcurrentRequests=4. With 30–80 live events and ~1s per
fetch, a 5–10s live cadence target was unreachable; cycles overflowed
the configured interval.

* New Marathon.Application.Configuration.ScrapingThrottle bound from the
  shared Scraping:* section. Exposes only MaxConcurrentRequests so the
  Application layer doesn't pull in the Infrastructure-side ScrapingOptions.
* PullLiveOddsUseCase + PullUpcomingEventsUseCase split into two phases:
  - Phase 1 — Parallel.ForEachAsync over the event list with
    MaxDegreeOfParallelism = throttle.MaxConcurrentRequests. The scraper's
    Polly rate limiter still throttles to RequestsPerSecond underneath
    this fan-out, so spikes are smoothed before they hit the bookmaker.
  - Phase 2 — sequential foreach over the (Event, Snapshot) tuples
    captured in Phase 1, doing event upsert + snapshot insert. EF Core
    DbContext is not thread-safe so all DB writes stay on a single thread.
* InfrastructureModule binds ScrapingThrottle alongside AnomalyOptions.
* Failed snapshot scrapes in Phase 1 mean the event row is also NOT
  persisted in Phase 2 — previously we'd persist the row even when the
  snapshot scrape failed, leaving an orphan event with no odds. Updated
  the regression test accordingly.
* Test fixture exposes TestFixtures.Throttle(maxConcurrentRequests=1) for
  deterministic sequential test runs.
* One existing NSubstitute setup that chained Arg.Is<>() across two
  configurations was rewritten to use a single Arg.Any<>() with inline
  branching — chained matchers were leaking and returning wrong results.
2026-05-09 15:27:06 +03:00
alexei.dolgolyov 9c5d3df1f2 feat(phase-8-backend): per-event results harvesting + EventPath plumbing
Implements Phase 8 Amendment 1: marathonbet.by has no public results archive
endpoint, so results must be harvested per-event by re-fetching the event
detail page until eventJsonInfo.matchIsComplete=true.

Backend changes:

* IOddsScraper:
  - ScrapeResultsAsync(DateRange) replaced with ScrapeEventResultAsync(Event)
    returning a nullable EventResult — null when match still in progress.
  - ScrapeEventOddsAsync now takes the full Event (so EventPath drives URL
    construction) instead of bare EventId.
  - New ScrapeLiveAsync() for the /su/live listing.

* Domain:
  - Event gains EventPath (nullable string) — the data-event-path attribute
    captured during scraping; required for reliable URL construction.

* Infrastructure:
  - New migration 20260506000000_AddEventPath adds the column.
  - EventEntity / EventConfiguration / Mapping / model-snapshot updated.
  - MarathonbetScraper: new ScrapeLiveAsync + ScrapeEventResultAsync; URL
    builder prefers EventPath, falls back to numeric ID for legacy rows.
  - EventListingParserBase extracts data-event-path on every listing row.

* Application:
  - PullResultsUseCase: branches on selection vs date-range, emits IProgress<
    PullResultsProgress>, returns ResultLoadOutcome (Loaded / AlreadyLoaded /
    NotYetComplete / Failed); idempotent (skips events whose result already
    exists).
  - PullLiveOddsUseCase now drives off the live listing (auto-discovers
    events that go live without ever appearing in the upcoming list) and
    backfills EventPath on legacy rows.
  - PullUpcomingEventsUseCase wires EventPath on persisted events.

* Workers: UpcomingEventsPoller updates persistence path accordingly.

* Tests: 17 net-new tests across Application + Infrastructure + Domain;
  all 293 still pass.
2026-05-09 15:10:27 +03:00
alexei.dolgolyov 2acbaa5b77 feat(phase-4): application layer + background workers — 202/202 tests green
Use cases (Marathon.Application/UseCases/):
- PullUpcomingEventsUseCase: scrape + persist new events + capture pre-match snapshots
- PullLiveOddsUseCase: refresh live snapshots for all stored events
- PullResultsUseCase: Phase 4 scaffold; delegates to ScrapeResultsAsync (Phase 3 no-op);
  Phase 8 will replace with watch-list polling
- ExportToExcelUseCase: resolves export dir from StorageOptions, delegates to IExcelExporter

ApplicationModule.AddMarathonApplication(IServiceCollection) — no IConfiguration needed.

Background workers (Marathon.Infrastructure/Workers/):
- UpcomingEventsPoller: Cronos 6-field cron schedule (default every 6 h)
- LiveOddsPoller: fixed interval (WorkerOptions.LivePollIntervalSeconds, default 30 s)
- ResultsWatchListPoller: scaffold, disabled by default (WorkerOptions.ResultsPollerEnabled=false)
All three: exception-swallowing, cancellation-aware, scoped DI via CreateAsyncScope().

InfrastructureModule.AddMarathonInfrastructure(IServiceCollection, IConfiguration):
- Composes AddMarathonPersistence + AddMarathonScraping + WorkerOptions + 3 hosted services

App.xaml.cs: replace reflection-based TryAddApplicationAndInfrastructure with direct
AddMarathonApplication() + AddMarathonInfrastructure(config) calls.

Resolved Phase 3 TODO: bind Sports:Basketball:QuarterMode from config in ScrapingModule.

appsettings.json: add Workers.LivePollIntervalSeconds, ResultsPollIntervalSeconds,
ResultsPollerEnabled; add Sports.Basketball.QuarterMode.

Settings.razor + WorkerOptions (UI) + SharedResource.*.resx: surface new Workers fields.

Tests: +14 Application use-case tests, +3 Infrastructure worker tests (185 → 202 total).
2026-05-05 12:28:15 +03:00
alexei.dolgolyov c4d87b59d6 fix(initial-implementation): close P2/P3/P5 review blockers — 185/185 tests green
Combined-batch reviewer flagged three real blockers + two test-infra
issues across the parallel P2/P3/P5 batch. All resolved:

PHASE 3 — DateTimeOffset UTC-kind constructor (3 sites)
  EventListingParserBase.cs:39, EventOddsParser.cs:72, ResultsParser.cs:104
  Replaced `new DateTimeOffset(DateTimeOffset.UtcNow.UtcDateTime, MoscowOffset)`
  (throws ArgumentException because UtcDateTime has Kind=Utc) with
  `DateTimeOffset.UtcNow.ToOffset(MoscowOffset)`.

PHASE 2 — EF string.Compare not translatable (3 sites)
  EventRepository.cs:34, SnapshotRepository.cs:46, ExcelExporter.cs:35
  Replaced `string.Compare(col, str, StringComparison.Ordinal)` with
  `col.CompareTo(str)` so EF Core's SQLite provider can translate the
  expression. Semantics unchanged (SQLite default collation = BINARY = ordinal).

PHASE 3 — ServerTimeProvider regex misses JSON-quoted key
  Regex `serverTime\s*:\s*"..."` only matched bare-key form. Updated to
  `"?serverTime"?\s*:\s*"..."` so the JSON-quoted form (the actual
  marathonbet.by production format) is matched.

PHASE 3 — fixture: orphan <td> elements stripped by HTML5 parser
  tests/.../Fixtures/marathonbet/event-football-sample.html — wrapped
  the <td> blocks in a proper <table><tbody><tr> hierarchy so AngleSharp
  preserves them and `td.Closest("td")` succeeds in the parser.

PHASE 2 — InMemoryDbFixture shared state across parallel tests
  All fixture instances used `Data Source=marathon_tests` causing xUnit's
  parallel-within-class runs to contaminate each other's data. Each fixture
  now uses a Guid-suffixed unique data source name.

PLAN.md — P2/P3/P5 rows updated to  Done with batch commit reference.

Test status:
  Domain.Tests:         96/96 
  Application.Tests:     1/1  
  Infrastructure.Tests: 77/77 
  UI.Tests:             11/11 
  TOTAL:               185/185 
Build: 0 warnings, 0 errors.

Deferred to later phases (per reviewer 🟡 / 🔵 notes):
- SnapshotRepository.GetAsync(Guid) uses lossy GetHashCode workaround;
  Phase 4 to fix or remove from interface.
- Excel Sport name column writes string.Empty (need lookup join in Phase 6).
- PeriodScopeMapper football n>2 falls through to "Quarter" token;
  guarded by MaxPeriods today, but defensive cleanup at Phase 9.
- Settings.razor duplicate m-rise-5 class on Localization section.
2026-05-05 12:09:44 +03:00
alexei.dolgolyov 686550d697 fix(initial-implementation): resolve P2/P3 cross-phase build issues
Three minimal fixes to make Marathon.sln build with 0/0:

1. Marathon.Infrastructure.csproj — add InternalsVisibleTo for
   Marathon.Infrastructure.Tests so test code can reference internal
   repository and exporter classes (Phase 2 issue blocking Phase 3 tests).
2. EventOddsParserTests.cs — add 'using Marathon.Domain.ValueObjects' so
   MatchScope/PeriodScope resolve.
3. RoundTripTests.cs — add 'using Microsoft.EntityFrameworkCore' so the
   ExecuteSqlRawAsync extension method on DatabaseFacade resolves.

Phase 5's anticipated LocalizationOptions / Serilog issues were already
resolved by its agent before being killed — no changes needed there.

Build status: 0 warnings, 0 errors.
Test status: Domain 96/96, UI 11/11, Infrastructure 42/77 (35 failing —
parser fixture issues + a real DateTimeOffset bug; reviewer will assess).
2026-05-05 11:35:42 +03:00
alexei.dolgolyov e4d8476782 WIP(initial-implementation): parallel batch P2/P3/P5 — code complete, unreviewed
Snapshot of the parallel batch (Phases 2 + 3 + 5) at session pause. Solution does
NOT build cleanly yet — known cross-phase compile issues remain to be resolved
before review. See plans/initial-implementation/PLAN.md "Resume Notes" section
for the exact tomorrow-morning action list.

Phase 2 (Storage):
- Repository interfaces in Marathon.Application/Abstractions
- DateRange, ExportKind, StorageOptions in Marathon.Application/Storage
- EF Core 8 + SQLite (WAL) persistence: 7 entities + configurations + 4 repos
- Hand-written InitialCreate migration (dotnet ef blocked by parallel work)
- ClosedXML ExcelExporter with exact customer-spec wide columns
- PersistenceModule.AddMarathonPersistence DI extension
- Round-trip + export tests (cannot run yet — see cross-phase issues)

Phase 3 (Scraping):
- IOddsScraper, IBetPlacer in Marathon.Application/Abstractions
- ScrapingOptions in Marathon.Infrastructure/Configuration
- MarathonbetScraper with 4 parsers (Upcoming, Live, EventOdds, Results)
- Helpers: ServerTimeProvider, PeriodScopeMapper, OutcomeCodeMapper, MoscowDateParser
- UserAgentRotatorHandler + Polly v8 resilience pipeline
- ScrapingModule.AddMarathonScraping DI extension
- GlobalUsings.cs aliases for EventId / Configuration disambiguation
- Parser tests with trimmed HTML fixtures
- ScrapeResultsAsync interim no-op (Phase 8 will replace via watch-list polling)

Phase 5 (UI shell — killed mid-final-verify, assumed ~95%):
- Marathon.UI populated: MainLayout, App.razor, Pages (Home, Settings),
  Components, Theme (MarathonTheme.cs + Tokens.cs + app.css), Resources
  (SharedResource.{cs,ru.resx,en.resx}), Services (ISettingsWriter), wwwroot
- WPF host: App.xaml(.cs), MainWindow.xaml(.cs), Marathon.Hosts.WpfBlazor.csproj
  with Microsoft.AspNetCore.Components.WebView.Wpf + MudBlazor + Serilog
- appsettings.json + appsettings.Development.json with all sections wired
- bUnit tests: MainLayoutTests, LocaleSwitcherTests, ThemeToggleTests,
  JsonSettingsWriterTests + Support helpers

Cross-phase issues to resolve at next session:
1. Phase 2 repository classes are 'internal' — Phase 3's tests can't reference
   them. Fix: add InternalsVisibleTo to Marathon.Infrastructure.csproj.
2. Phase 5: LocalizationOptions namespace ambiguity (AspNetCore vs Extensions).
3. Phase 5: WpfBlazor Serilog API mismatch.

Reviewer has NOT run on this batch. Move to Phase 4 only after build is green
and a combined parallel-batch reviewer passes.
2026-05-05 01:56:53 +03:00
alexei.dolgolyov 61114ea31b feat: implement Phase 1 — solution skeleton and domain model
Creates the 9-project .NET 8 solution (5 src + 4 test) with Marathon.Domain
fully implemented: value objects (SportCode, EventId, OddsRate, OddsValue,
BetScope hierarchy), enums (Side, BetType, OddsSource, AnomalyKind), and
entities (Sport, Country, League, Event, Bet, OddsSnapshot, EventResult,
Anomaly) with all invariants enforced in constructors. 96 domain tests pass
(FluentAssertions + xUnit). Directory.Build.props and Directory.Packages.props
centralise build settings and NuGet versions. Both Marathon.sln and Marathon.slnx
are committed; dotnet build Marathon.sln succeeds with 0 warnings/errors.
2026-05-05 01:20:28 +03:00