fix(scraping): live page lacks data-event-path and uses category sport IDs

Previously LiveEventsParser returned 0 events from /su/live because two
real differences between the live page and the pre-match listing weren't
handled:

1. Live rows omit data-event-path entirely. They expose only
   data-event-treeId, and the bookmaker routes live events under
   /su/live/<treeId> rather than /su/betting/<...>.

2. The closest data-sport-treeId ancestor on the live page is a
   category-tree wrapper (26418=Football-live, 45356=Basketball-live, …)
   instead of the canonical breadcrumb sport ID (11/6/22723/43658) the
   rest of the app uses. The pre-match listing carries the canonical
   ID directly.

Changes:

* EventListingParserBase.ParseRow: data-event-path becomes optional. For
  live rows we synthesize EventPath = "live/<treeId>" from
  data-event-treeId (validated as digits-only). Pre-match validation is
  unchanged.

* New ExtractSportCodeFromLive walks ancestors looking for a sport-tree
  ID and maps it through a small live-id → canonical-id table covering
  the four scoped sports. Out-of-scope sports (cybersport, volleyball,
  table tennis) are intentionally left unmapped — they keep their raw
  category ID and the UI renders them via SportLabels as "Sport <N>".

* MarathonbetScraper.ResolveEventDetailPath: dispatches between
  /su/live/<treeId> and /su/betting/<...> based on the EventPath prefix.
  Removes the duplicated path-building between ScrapeEventOddsAsync and
  ScrapeEventResultAsync.

* New regression tests covering all three behaviors against a real
  /su/live capture (16 events, 5 sport categories).

Also: rewrites the stale "Disabled until Phase 8" hint copy on the
Settings.Workers.ResultsPollerEnabled flag — Phase 8 shipped, so the
results poller is safe to enable.
This commit is contained in:
2026-05-09 16:07:03 +03:00
parent 537b78ab83
commit 004dbeae8b
8 changed files with 20459 additions and 45 deletions
File diff suppressed because one or more lines are too long
@@ -0,0 +1,78 @@
using FluentAssertions;
using Marathon.Infrastructure.Scraping.Parsers;
using Microsoft.Extensions.Logging.Abstractions;
namespace Marathon.Infrastructure.Tests.Scraping;
/// <summary>
/// Regression test for the live-listing parser. The fixture
/// <c>diag-live-sample.html</c> is a real /su/live capture from
/// 2026-05-09 with 16 in-progress matches. Pre-fix the parser returned
/// 0 because:
/// <list type="bullet">
/// <item>Live rows omit <c>data-event-path</c> — the pre-match-only
/// attribute the parser made mandatory.</item>
/// <item>The closest <c>data-sport-treeId</c> ancestor on the live
/// page is a category-tree wrapper (e.g. 26418=Football), not
/// the canonical breadcrumb sport ID (11=Football) the rest of
/// the app uses.</item>
/// </list>
/// </summary>
public sealed class LiveEventsParserTests
{
private static readonly string FixturePath = Path.Combine(
AppContext.BaseDirectory,
"Fixtures", "marathonbet", "diag-live-sample.html");
private readonly LiveEventsParser _sut;
public LiveEventsParserTests()
{
var serverTimeProvider = new ServerTimeProvider(
NullLogger<ServerTimeProvider>.Instance);
_sut = new LiveEventsParser(
serverTimeProvider,
NullLogger<LiveEventsParser>.Instance);
}
[Fact]
public async Task ParseAsync_LiveSample_ReturnsAllSixteenLiveEvents()
{
var html = await File.ReadAllTextAsync(FixturePath);
var events = await _sut.ParseAsync(html);
events.Should().HaveCount(16);
}
[Fact]
public async Task ParseAsync_LiveSample_SynthesizesLiveTreeIdEventPaths()
{
var html = await File.ReadAllTextAsync(FixturePath);
var events = await _sut.ParseAsync(html);
events.Should().OnlyContain(e =>
e.EventPath != null &&
e.EventPath.StartsWith("live/"));
}
[Fact]
public async Task ParseAsync_LiveSample_MapsKnownSportNamesToCanonicalIds()
{
// The live page wraps rows in containers whose data-sport-treeId is a
// category ID (e.g. 26418 for Football-live). The parser resolves
// these to canonical breadcrumb IDs via the sport-category-label text
// for the known sports (Football=11, Basketball=6, Tennis=22723,
// Hockey=43658). Other sports (cybersport, table tennis, …) keep
// their category-tree ID and the UI renders them as "Sport <N>".
var html = await File.ReadAllTextAsync(FixturePath);
var events = await _sut.ParseAsync(html);
// The fixture has Эльче-Алавес under Футбол → must be sport=11
var football = events.SingleOrDefault(e => e.Id.Value == "26340575");
football.Should().NotBeNull();
football!.Sport.Value.Should().Be(11);
}
}