Snapshot of the parallel batch (Phases 2 + 3 + 5) at session pause. Solution does
NOT build cleanly yet — known cross-phase compile issues remain to be resolved
before review. See plans/initial-implementation/PLAN.md "Resume Notes" section
for the exact tomorrow-morning action list.
Phase 2 (Storage):
- Repository interfaces in Marathon.Application/Abstractions
- DateRange, ExportKind, StorageOptions in Marathon.Application/Storage
- EF Core 8 + SQLite (WAL) persistence: 7 entities + configurations + 4 repos
- Hand-written InitialCreate migration (dotnet ef blocked by parallel work)
- ClosedXML ExcelExporter with exact customer-spec wide columns
- PersistenceModule.AddMarathonPersistence DI extension
- Round-trip + export tests (cannot run yet — see cross-phase issues)
Phase 3 (Scraping):
- IOddsScraper, IBetPlacer in Marathon.Application/Abstractions
- ScrapingOptions in Marathon.Infrastructure/Configuration
- MarathonbetScraper with 4 parsers (Upcoming, Live, EventOdds, Results)
- Helpers: ServerTimeProvider, PeriodScopeMapper, OutcomeCodeMapper, MoscowDateParser
- UserAgentRotatorHandler + Polly v8 resilience pipeline
- ScrapingModule.AddMarathonScraping DI extension
- GlobalUsings.cs aliases for EventId / Configuration disambiguation
- Parser tests with trimmed HTML fixtures
- ScrapeResultsAsync interim no-op (Phase 8 will replace via watch-list polling)
Phase 5 (UI shell — killed mid-final-verify, assumed ~95%):
- Marathon.UI populated: MainLayout, App.razor, Pages (Home, Settings),
Components, Theme (MarathonTheme.cs + Tokens.cs + app.css), Resources
(SharedResource.{cs,ru.resx,en.resx}), Services (ISettingsWriter), wwwroot
- WPF host: App.xaml(.cs), MainWindow.xaml(.cs), Marathon.Hosts.WpfBlazor.csproj
with Microsoft.AspNetCore.Components.WebView.Wpf + MudBlazor + Serilog
- appsettings.json + appsettings.Development.json with all sections wired
- bUnit tests: MainLayoutTests, LocaleSwitcherTests, ThemeToggleTests,
JsonSettingsWriterTests + Support helpers
Cross-phase issues to resolve at next session:
1. Phase 2 repository classes are 'internal' — Phase 3's tests can't reference
them. Fix: add InternalsVisibleTo to Marathon.Infrastructure.csproj.
2. Phase 5: LocalizationOptions namespace ambiguity (AspNetCore vs Extensions).
3. Phase 5: WpfBlazor Serilog API mismatch.
Reviewer has NOT run on this batch. Move to Phase 4 only after build is green
and a combined parallel-batch reviewer passes.
12 KiB
Phase 3: Infrastructure — Scraping
Status: ✅ Done Parent plan: PLAN.md Domain: backend
Objective
Implement the scraping pipeline: HttpClient + AngleSharp for HTML pages with a Playwright
fallback for JS-rendered content, all wrapped in resilient policies (retry, circuit
breaker, rate limiter). All parsing logic is informed by Phase 0's SCRAPE_FINDINGS.md
and SCHEMA_DRAFT.md.
Tasks
- Read
spike/SCRAPE_FINDINGS.mdandspike/SCHEMA_DRAFT.mdfrom Phase 0 to determine which strategy applies (HTML / Playwright / hybrid). - Add packages:
AngleSharpMicrosoft.Extensions.HttpMicrosoft.Extensions.Http.Resilience(Polly v8 wrapper)Microsoft.Playwright(only if Phase 0 decided Playwright is needed)
- Define abstractions in
Marathon.Application/Abstractions/:IOddsScraper:Task<IReadOnlyList<Event>> ScrapeUpcomingAsync(SportCode? filter, CancellationToken ct)Task<OddsSnapshot> ScrapeEventOddsAsync(EventId id, OddsSource source, CancellationToken ct)Task<IReadOnlyList<EventResult>> ScrapeResultsAsync(DateRange range, CancellationToken ct)
IBetPlacer— empty marker interface for future betting feature (extension point)
- Implement
Marathon.Infrastructure/Scraping/MarathonbetScraper.cs:- Composes parsers + HttpClient + (optional) Playwright per Phase 0 strategy
- Constructor takes
IHttpClientFactory,IOptions<ScrapingOptions>,ILogger - Methods correspond to
IOddsScraperinterface
- Implement parsers in
Marathon.Infrastructure/Scraping/Parsers/:UpcomingEventsParser— parses listing page →IReadOnlyList<Event>LiveEventsParser— parses live listing →IReadOnlyList<Event>EventOddsParser— parses event detail page →OddsSnapshot(handles all bet types in spec: Win/Draw/WinFora/Total at Match + Period-N scope)ResultsParser— parses completed events →IReadOnlyList<EventResult>- Each parser is unit-testable: takes
string html(orIDocument), returns domain types
ScrapingOptionsPOCO bound toappsettings.jsonScraping:*section:public sealed class ScrapingOptions { public int PollingIntervalSeconds { get; init; } = 30; public int MaxConcurrentRequests { get; init; } = 4; public string[] UserAgents { get; init; } = Array.Empty<string>(); public RetryPolicyOptions RetryPolicy { get; init; } = new(); public RateLimitOptions RateLimit { get; init; } = new(); public bool EnablePlaywrightFallback { get; init; } = false; public string BaseUrl { get; init; } = "https://www.marathonbet.by"; }- Configure named
HttpClient"marathonbet" in DI with:BaseAddress=Scraping:BaseUrlUser-Agentrotation viaDelegatingHandler(UserAgentRotatorHandler)- Polly resilience (
AddResilienceHandlerfromMicrosoft.Extensions.Http.Resilience):- Retry: exponential backoff, max attempts from config
- Circuit breaker: 5 failures → 30s open
- Rate limiter: token bucket (configurable RPS)
- Timeout: per-request from config
- (Optional, if Phase 0 needs it) Implement
PlaywrightScraperfor SPA-rendered pages — used as fallback if HTML scraping detects empty/dynamic content. - Add DI registration in
Marathon.Infrastructure/DependencyInjection.cs:services.AddOptions<ScrapingOptions>().Bind(config.GetSection("Scraping"))services.AddHttpClient("marathonbet").AddResilienceHandler(...)services.AddSingleton<IOddsScraper, MarathonbetScraper>()services.AddSingleton<UserAgentRotatorHandler>()
- Add
appsettings.jsontemplate undersrc/Marathon.Hosts.WpfBlazor/appsettings.json(will move when host phase runs):{ "Scraping": { "PollingIntervalSeconds": 30, "MaxConcurrentRequests": 4, "UserAgents": [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ..." ], "RetryPolicy": { "MaxAttempts": 3, "BaseDelayMs": 500 }, "RateLimit": { "RequestsPerSecond": 1 }, "EnablePlaywrightFallback": false, "BaseUrl": "https://www.marathonbet.by" } } - Tests in
Marathon.Infrastructure.Tests/Scraping/:- Use recorded HTML fixtures (committed under
tests/Marathon.Infrastructure.Tests/Fixtures/marathonbet/*.html— small samples only) — copy fromspike/captures/if appropriate - Test each parser produces expected domain output for the fixtures
- Test
MarathonbetScraperhandles network errors gracefully (Polly mock) - DO NOT make real network calls in tests
- Use recorded HTML fixtures (committed under
Files to Modify/Create
src/Marathon.Application/Abstractions/IOddsScraper.cssrc/Marathon.Application/Abstractions/IBetPlacer.cs(marker interface)src/Marathon.Infrastructure/Scraping/MarathonbetScraper.cssrc/Marathon.Infrastructure/Scraping/Parsers/*.cs— 4 parserssrc/Marathon.Infrastructure/Scraping/UserAgentRotatorHandler.cssrc/Marathon.Infrastructure/Scraping/Playwright/PlaywrightScraper.cs(conditional)src/Marathon.Infrastructure/Configuration/ScrapingOptions.cstests/Marathon.Infrastructure.Tests/Scraping/Parsers/*Tests.cstests/Marathon.Infrastructure.Tests/Fixtures/marathonbet/*.html
Acceptance Criteria
- Compiles (Big Bang).
- All parser logic is unit-testable without network.
IOddsScraperis the only public surface used by Application layer.appsettings.jsontemplate covers every variable parameter.IBetPlacerexists as a future-proof extension point.
Notes
- This phase is parallelizable with Phase 2 — disjoint files.
- DO NOT hammer marathonbet.by — tests use local fixtures.
- If Phase 0 found that scraping requires headless browser only, skip the AngleSharp parsers and implement Playwright-only.
- Big Bang: compile-only smoke check after this phase; tests deferred to Phase 9.
Review Checklist
- Compiles
- Parser interface is clean (
string html → domain types) - All
Scraping:*config keys are wired throughScrapingOptions - No real network calls in tests
Review Checklist (filled)
- Compiles (
dotnet build src/Marathon.Infrastructure— 0 errors) - Parser interface is clean (
string html → domain types) - All
Scraping:*config keys are wired throughScrapingOptions - No real network calls in tests (all tests use local HTML fixtures)
Handoff to Next Phase
For Phase 4 (Application + Workers)
Calling ScrapingModule.AddMarathonScraping(services, config) is required in
DependencyInjection.cs to wire all scraping services. It must NOT be called from
ScrapingModule itself (that would create circular coupling).
IOddsScraper.ScrapeResultsAsync is a no-op (returns empty list + logs a warning).
Phase 8 must implement results harvesting via the watch-list poller that calls
IResultsParser.ParseAsync on individual event-detail pages.
IOddsScraper.ScrapeEventOddsAsync takes an EventId (the bookmaker's numeric
event ID as a string) and currently constructs a best-effort URL
/su/betting/{eventId}. Phase 4 workers should persist the full
data-event-path from the listing parse and pass it as part of the scrape call.
A TODO comment marks this location in MarathonbetScraper.cs.
Basketball period mode defaults to halves (Period-1, Period-2). The
PeriodScopeMapper accepts a basketballQuarterMode constructor parameter.
Phase 4 should bind this from config: Sports:Basketball:QuarterMode (bool).
A TODO comment is present in ScrapingModule.cs.
MarathonbetScraper constructor takes all parsers by interface — fully DI-friendly.
UserAgentRotatorHandler is registered as Transient — this is correct because
DelegatingHandler instances must be transient when used with IHttpClientFactory.
Named HttpClient "marathonbet" is registered. Resilience pipeline:
- Timeout (per-attempt)
- Retry (exp backoff + jitter, configurable MaxAttempts + BaseDelayMs)
- Circuit Breaker (5 failures / 30s window → 30s break)
- Rate Limiter (token bucket, configurable RequestsPerSecond)
appsettings.scraping.sample.json in src/Marathon.Infrastructure/Scraping/ is
a documentation-only sample. Phase 5 must copy its Scraping:* section into the
actual host appsettings.json.
EventId disambiguation (IMPORTANT)
Marathon.Domain.ValueObjects.EventId conflicts with Microsoft.Extensions.Logging.EventId.
The Infrastructure project resolves this via:
GlobalUsings.cs:global using LogEventId = Microsoft.Extensions.Logging.EventId;- Local file aliases:
using DomainEventId = Marathon.Domain.ValueObjects.EventId;in parser files that use both namespaces. MarathonbetScraper.ScrapeEventOddsAsyncuses the fully qualified nameMarathon.Domain.ValueObjects.EventIdfor the parameter type.
Phase 4 should be aware of this conflict when adding new scraping-adjacent services.
Test status
Phase 3 scraping tests (tests/Marathon.Infrastructure.Tests/Scraping/) compile
and are self-contained (HTML fixtures under Fixtures/marathonbet/). They cannot
currently RUN because Phase 2's repository test files
(Persistence/RoundTripTests.cs, Export/ExcelExporterTests.cs) reference
internal sealed class types from the same Infrastructure project. Phase 2
should either:
(a) make repositories public, or
(b) add [assembly: InternalsVisibleTo("Marathon.Infrastructure.Tests")]
to the Infrastructure project.
Option (b) is preferred: add to Marathon.Infrastructure.csproj or a GlobalUsings.cs:
<ItemGroup>
<InternalsVisibleTo Include="Marathon.Infrastructure.Tests" />
</ItemGroup>
Files created (Phase 3 scope)
src/Marathon.Application/Abstractions/IOddsScraper.cs
src/Marathon.Application/Abstractions/IBetPlacer.cs
src/Marathon.Infrastructure/Configuration/ScrapingOptions.cs
src/Marathon.Infrastructure/GlobalUsings.cs (EventId disambiguation)
src/Marathon.Infrastructure/Scraping/MarathonbetScraper.cs
src/Marathon.Infrastructure/Scraping/ScrapingModule.cs
src/Marathon.Infrastructure/Scraping/UserAgentRotatorHandler.cs
src/Marathon.Infrastructure/Scraping/appsettings.scraping.sample.json
src/Marathon.Infrastructure/Scraping/Parsers/IServerTimeProvider.cs
src/Marathon.Infrastructure/Scraping/Parsers/ServerTimeProvider.cs
src/Marathon.Infrastructure/Scraping/Parsers/MoscowDateParser.cs
src/Marathon.Infrastructure/Scraping/Parsers/OutcomeCodeMapper.cs
src/Marathon.Infrastructure/Scraping/Parsers/PeriodScopeMapper.cs
src/Marathon.Infrastructure/Scraping/Parsers/EventListingParserBase.cs
src/Marathon.Infrastructure/Scraping/Parsers/IUpcomingEventsParser.cs
src/Marathon.Infrastructure/Scraping/Parsers/UpcomingEventsParser.cs
src/Marathon.Infrastructure/Scraping/Parsers/ILiveEventsParser.cs
src/Marathon.Infrastructure/Scraping/Parsers/LiveEventsParser.cs
src/Marathon.Infrastructure/Scraping/Parsers/IEventOddsParser.cs
src/Marathon.Infrastructure/Scraping/Parsers/EventOddsParser.cs
src/Marathon.Infrastructure/Scraping/Parsers/IResultsParser.cs
src/Marathon.Infrastructure/Scraping/Parsers/ResultsParser.cs
tests/Marathon.Infrastructure.Tests/Scraping/OutcomeCodeMapperTests.cs
tests/Marathon.Infrastructure.Tests/Scraping/MoscowDateParserTests.cs
tests/Marathon.Infrastructure.Tests/Scraping/ServerTimeProviderTests.cs
tests/Marathon.Infrastructure.Tests/Scraping/UpcomingEventsParserTests.cs
tests/Marathon.Infrastructure.Tests/Scraping/EventOddsParserTests.cs
tests/Marathon.Infrastructure.Tests/Scraping/ResultsParserTests.cs
tests/Marathon.Infrastructure.Tests/Fixtures/marathonbet/listing-sample.html
tests/Marathon.Infrastructure.Tests/Fixtures/marathonbet/event-football-sample.html
tests/Marathon.Infrastructure.Tests/Fixtures/marathonbet/event-basketball-sample.html
tests/Marathon.Infrastructure.Tests/Fixtures/marathonbet/event-completed-sample.html