maraphon-app

alexei.dolgolyov/maraphon-app

Fork 0

Commit Graph

Author	SHA1	Message	Date
alexei.dolgolyov	286b55986b	perf(scraping): parallel HTTP fan-out, sequential DB persist (HIGH) The PullUseCase implementations issued one HTTP request at a time despite Scraping:MaxConcurrentRequests=4. With 30–80 live events and ~1s per fetch, a 5–10s live cadence target was unreachable; cycles overflowed the configured interval. New Marathon.Application.Configuration.ScrapingThrottle bound from the shared Scraping:* section. Exposes only MaxConcurrentRequests so the Application layer doesn't pull in the Infrastructure-side ScrapingOptions. * PullLiveOddsUseCase + PullUpcomingEventsUseCase split into two phases: - Phase 1 — Parallel.ForEachAsync over the event list with MaxDegreeOfParallelism = throttle.MaxConcurrentRequests. The scraper's Polly rate limiter still throttles to RequestsPerSecond underneath this fan-out, so spikes are smoothed before they hit the bookmaker. - Phase 2 — sequential foreach over the (Event, Snapshot) tuples captured in Phase 1, doing event upsert + snapshot insert. EF Core DbContext is not thread-safe so all DB writes stay on a single thread. * InfrastructureModule binds ScrapingThrottle alongside AnomalyOptions. * Failed snapshot scrapes in Phase 1 mean the event row is also NOT persisted in Phase 2 — previously we'd persist the row even when the snapshot scrape failed, leaving an orphan event with no odds. Updated the regression test accordingly. * Test fixture exposes TestFixtures.Throttle(maxConcurrentRequests=1) for deterministic sequential test runs. * One existing NSubstitute setup that chained Arg.Is<>() across two configurations was rewritten to use a single Arg.Any<>() with inline branching — chained matchers were leaking and returning wrong results.	2026-05-09 15:27:06 +03:00
alexei.dolgolyov	9c5d3df1f2	feat(phase-8-backend): per-event results harvesting + EventPath plumbing Implements Phase 8 Amendment 1: marathonbet.by has no public results archive endpoint, so results must be harvested per-event by re-fetching the event detail page until eventJsonInfo.matchIsComplete=true. Backend changes: * IOddsScraper: - ScrapeResultsAsync(DateRange) replaced with ScrapeEventResultAsync(Event) returning a nullable EventResult — null when match still in progress. - ScrapeEventOddsAsync now takes the full Event (so EventPath drives URL construction) instead of bare EventId. - New ScrapeLiveAsync() for the /su/live listing. * Domain: - Event gains EventPath (nullable string) — the data-event-path attribute captured during scraping; required for reliable URL construction. * Infrastructure: - New migration 20260506000000_AddEventPath adds the column. - EventEntity / EventConfiguration / Mapping / model-snapshot updated. - MarathonbetScraper: new ScrapeLiveAsync + ScrapeEventResultAsync; URL builder prefers EventPath, falls back to numeric ID for legacy rows. - EventListingParserBase extracts data-event-path on every listing row. * Application: - PullResultsUseCase: branches on selection vs date-range, emits IProgress< PullResultsProgress>, returns ResultLoadOutcome (Loaded / AlreadyLoaded / NotYetComplete / Failed); idempotent (skips events whose result already exists). - PullLiveOddsUseCase now drives off the live listing (auto-discovers events that go live without ever appearing in the upcoming list) and backfills EventPath on legacy rows. - PullUpcomingEventsUseCase wires EventPath on persisted events. * Workers: UpcomingEventsPoller updates persistence path accordingly. * Tests: 17 net-new tests across Application + Infrastructure + Domain; all 293 still pass.	2026-05-09 15:10:27 +03:00
alexei.dolgolyov	2acbaa5b77	feat(phase-4): application layer + background workers — 202/202 tests green Use cases (Marathon.Application/UseCases/): - PullUpcomingEventsUseCase: scrape + persist new events + capture pre-match snapshots - PullLiveOddsUseCase: refresh live snapshots for all stored events - PullResultsUseCase: Phase 4 scaffold; delegates to ScrapeResultsAsync (Phase 3 no-op); Phase 8 will replace with watch-list polling - ExportToExcelUseCase: resolves export dir from StorageOptions, delegates to IExcelExporter ApplicationModule.AddMarathonApplication(IServiceCollection) — no IConfiguration needed. Background workers (Marathon.Infrastructure/Workers/): - UpcomingEventsPoller: Cronos 6-field cron schedule (default every 6 h) - LiveOddsPoller: fixed interval (WorkerOptions.LivePollIntervalSeconds, default 30 s) - ResultsWatchListPoller: scaffold, disabled by default (WorkerOptions.ResultsPollerEnabled=false) All three: exception-swallowing, cancellation-aware, scoped DI via CreateAsyncScope(). InfrastructureModule.AddMarathonInfrastructure(IServiceCollection, IConfiguration): - Composes AddMarathonPersistence + AddMarathonScraping + WorkerOptions + 3 hosted services App.xaml.cs: replace reflection-based TryAddApplicationAndInfrastructure with direct AddMarathonApplication() + AddMarathonInfrastructure(config) calls. Resolved Phase 3 TODO: bind Sports:Basketball:QuarterMode from config in ScrapingModule. appsettings.json: add Workers.LivePollIntervalSeconds, ResultsPollIntervalSeconds, ResultsPollerEnabled; add Sports.Basketball.QuarterMode. Settings.razor + WorkerOptions (UI) + SharedResource.*.resx: surface new Workers fields. Tests: +14 Application use-case tests, +3 Infrastructure worker tests (185 → 202 total).	2026-05-05 12:28:15 +03:00

Author

SHA1

Message

Date

alexei.dolgolyov

286b55986b

perf(scraping): parallel HTTP fan-out, sequential DB persist (HIGH)

The Pull*UseCase implementations issued one HTTP request at a time despite
Scraping:MaxConcurrentRequests=4. With 30–80 live events and ~1s per
fetch, a 5–10s live cadence target was unreachable; cycles overflowed
the configured interval.

* New Marathon.Application.Configuration.ScrapingThrottle bound from the
  shared Scraping:* section. Exposes only MaxConcurrentRequests so the
  Application layer doesn't pull in the Infrastructure-side ScrapingOptions.
* PullLiveOddsUseCase + PullUpcomingEventsUseCase split into two phases:
  - Phase 1 — Parallel.ForEachAsync over the event list with
    MaxDegreeOfParallelism = throttle.MaxConcurrentRequests. The scraper's
    Polly rate limiter still throttles to RequestsPerSecond underneath
    this fan-out, so spikes are smoothed before they hit the bookmaker.
  - Phase 2 — sequential foreach over the (Event, Snapshot) tuples
    captured in Phase 1, doing event upsert + snapshot insert. EF Core
    DbContext is not thread-safe so all DB writes stay on a single thread.
* InfrastructureModule binds ScrapingThrottle alongside AnomalyOptions.
* Failed snapshot scrapes in Phase 1 mean the event row is also NOT
  persisted in Phase 2 — previously we'd persist the row even when the
  snapshot scrape failed, leaving an orphan event with no odds. Updated
  the regression test accordingly.
* Test fixture exposes TestFixtures.Throttle(maxConcurrentRequests=1) for
  deterministic sequential test runs.
* One existing NSubstitute setup that chained Arg.Is<>() across two
  configurations was rewritten to use a single Arg.Any<>() with inline
  branching — chained matchers were leaking and returning wrong results.

2026-05-09 15:27:06 +03:00

alexei.dolgolyov

9c5d3df1f2

feat(phase-8-backend): per-event results harvesting + EventPath plumbing

Implements Phase 8 Amendment 1: marathonbet.by has no public results archive
endpoint, so results must be harvested per-event by re-fetching the event
detail page until eventJsonInfo.matchIsComplete=true.

Backend changes:

* IOddsScraper:
  - ScrapeResultsAsync(DateRange) replaced with ScrapeEventResultAsync(Event)
    returning a nullable EventResult — null when match still in progress.
  - ScrapeEventOddsAsync now takes the full Event (so EventPath drives URL
    construction) instead of bare EventId.
  - New ScrapeLiveAsync() for the /su/live listing.

* Domain:
  - Event gains EventPath (nullable string) — the data-event-path attribute
    captured during scraping; required for reliable URL construction.

* Infrastructure:
  - New migration 20260506000000_AddEventPath adds the column.
  - EventEntity / EventConfiguration / Mapping / model-snapshot updated.
  - MarathonbetScraper: new ScrapeLiveAsync + ScrapeEventResultAsync; URL
    builder prefers EventPath, falls back to numeric ID for legacy rows.
  - EventListingParserBase extracts data-event-path on every listing row.

* Application:
  - PullResultsUseCase: branches on selection vs date-range, emits IProgress<
    PullResultsProgress>, returns ResultLoadOutcome (Loaded / AlreadyLoaded /
    NotYetComplete / Failed); idempotent (skips events whose result already
    exists).
  - PullLiveOddsUseCase now drives off the live listing (auto-discovers
    events that go live without ever appearing in the upcoming list) and
    backfills EventPath on legacy rows.
  - PullUpcomingEventsUseCase wires EventPath on persisted events.

* Workers: UpcomingEventsPoller updates persistence path accordingly.

* Tests: 17 net-new tests across Application + Infrastructure + Domain;
  all 293 still pass.

2026-05-09 15:10:27 +03:00

alexei.dolgolyov

2acbaa5b77

feat(phase-4): application layer + background workers — 202/202 tests green

Use cases (Marathon.Application/UseCases/):
- PullUpcomingEventsUseCase: scrape + persist new events + capture pre-match snapshots
- PullLiveOddsUseCase: refresh live snapshots for all stored events
- PullResultsUseCase: Phase 4 scaffold; delegates to ScrapeResultsAsync (Phase 3 no-op);
  Phase 8 will replace with watch-list polling
- ExportToExcelUseCase: resolves export dir from StorageOptions, delegates to IExcelExporter

ApplicationModule.AddMarathonApplication(IServiceCollection) — no IConfiguration needed.

Background workers (Marathon.Infrastructure/Workers/):
- UpcomingEventsPoller: Cronos 6-field cron schedule (default every 6 h)
- LiveOddsPoller: fixed interval (WorkerOptions.LivePollIntervalSeconds, default 30 s)
- ResultsWatchListPoller: scaffold, disabled by default (WorkerOptions.ResultsPollerEnabled=false)
All three: exception-swallowing, cancellation-aware, scoped DI via CreateAsyncScope().

InfrastructureModule.AddMarathonInfrastructure(IServiceCollection, IConfiguration):
- Composes AddMarathonPersistence + AddMarathonScraping + WorkerOptions + 3 hosted services

App.xaml.cs: replace reflection-based TryAddApplicationAndInfrastructure with direct
AddMarathonApplication() + AddMarathonInfrastructure(config) calls.

Resolved Phase 3 TODO: bind Sports:Basketball:QuarterMode from config in ScrapingModule.

appsettings.json: add Workers.LivePollIntervalSeconds, ResultsPollIntervalSeconds,
ResultsPollerEnabled; add Sports.Basketball.QuarterMode.

Settings.razor + WorkerOptions (UI) + SharedResource.*.resx: surface new Workers fields.

Tests: +14 Application use-case tests, +3 Infrastructure worker tests (185 → 202 total).

2026-05-05 12:28:15 +03:00

3 Commits