Files
maraphon-app/plans/initial-implementation/phase-2-storage.md
T

5.8 KiB

Phase 2: Infrastructure — Storage

Status: Not Started Parent plan: PLAN.md Domain: backend

Objective

Implement persistent storage: EF Core + SQLite (WAL) with migrations, repository implementations of the Application layer's interfaces, and a ClosedXML-based Excel exporter that produces files matching the customer's wide-column spec with date-range filenames.

Tasks

  • Add packages to Marathon.Infrastructure (via Directory.Packages.props):
    • Microsoft.EntityFrameworkCore
    • Microsoft.EntityFrameworkCore.Sqlite
    • Microsoft.EntityFrameworkCore.Design
    • ClosedXML
  • Add Application-layer abstractions in Marathon.Application/Abstractions/:
    • IRepository<TKey, TEntity> — generic CRUD: GetAsync, ListAsync, AddAsync, UpdateAsync, DeleteAsync, SaveChangesAsync
    • IEventRepository : IRepository<EventId, Event> — adds ListByDateRangeAsync, ListBySportAsync
    • ISnapshotRepository : IRepository<Guid, OddsSnapshot> — adds ListByEventAsync(EventId, DateTimeOffset from, DateTimeOffset to)
    • IResultRepository : IRepository<EventId, EventResult>
    • IAnomalyRepository : IRepository<Guid, Anomaly>
    • IExcelExporterExportAsync(DateRange range, ExportKind kind, string outputPath) where ExportKind = PreMatch | Live | Combined
  • Implement MarathonDbContext in Marathon.Infrastructure/Persistence/:
    • DbSet<EventEntity>, DbSet<SnapshotEntity>, DbSet<BetEntity>, DbSet<EventResultEntity>, DbSet<AnomalyEntity>, DbSet<SportEntity>, DbSet<LeagueEntity>
    • Configure SQLite with WAL via connection string
    • Use EntityTypeConfiguration<T> classes (one per entity in Configurations/)
    • Map domain types ↔ EF entities via mapping helpers (don't pollute domain)
    • Indexes: (EventId) on Snapshots and Bets; (Sport, ScheduledAt) on Events
  • Implement Migrations/InitialCreate migration (EF Core CLI):
    dotnet ef migrations add InitialCreate --project src/Marathon.Infrastructure
    
  • Implement repositories in Marathon.Infrastructure/Persistence/Repositories/:
    • EventRepository, SnapshotRepository, ResultRepository, AnomalyRepository
    • Each maps EF entity ↔ domain type at the boundary
  • Implement ExcelExporter in Marathon.Infrastructure/Export/:
    • Uses ClosedXML
    • Output filename: Marathon_<from yyyy-MM-dd>_to_<to yyyy-MM-dd>.xlsx
    • Two sheets: PreMatch and Live (or only the selected one based on ExportKind)
    • Wide columns matching customer spec exactly:
      • Event metadata: RowNum, SportCode, Sport, Country, League, Category, DateFull, Day, Month, Year, Time, EventId
      • Match-level bets: Bet_Match_Win_1, Bet_Match_Draw, Bet_Match_Win_2, Bet_Match_Win_Fora_1_Value, Bet_Match_Win_Fora_1_Rate, etc.
      • Period-N bets: dynamically generated for max periods seen (Bet_Period-1_Win_1, ...)
      • For Live export, prefix with Live_ instead of Bet_
      • Final column: WinnerSide (1 or 2 based on lowest pre-match Win rate, per spec §1.2.4 / §2.2.4)
    • Implement a BetRowDenormalizer helper that takes a List<Bet> and produces a flat Dictionary<string, object?> keyed by spec column names.
  • Add a DI extension AddMarathonInfrastructure(IServiceCollection, IConfiguration) in Marathon.Infrastructure/DependencyInjection.cs that wires up DbContext + repositories + exporter using IConfiguration for Storage:DatabasePath and Storage:ExportDirectory.
  • Tests in Marathon.Infrastructure.Tests:
    • In-memory SQLite (Microsoft.Data.Sqlite with Mode=Memory;Cache=Shared)
    • Test: insert + retrieve Event, OddsSnapshot, Anomaly round-trip preserves all domain fields
    • Test: ExcelExporter generates a workbook with the expected sheet names, headers matching spec, and row count matching event count
    • Test: filename pattern matches Marathon_yyyy-MM-dd_to_yyyy-MM-dd.xlsx
    • Test: WAL mode is enabled after open

Files to Modify/Create

  • src/Marathon.Application/Abstractions/I*.cs — repository interfaces
  • src/Marathon.Application/ExportKind.cs, DateRange.cs
  • src/Marathon.Infrastructure/Persistence/MarathonDbContext.cs
  • src/Marathon.Infrastructure/Persistence/Entities/*.cs
  • src/Marathon.Infrastructure/Persistence/Configurations/*Configuration.cs
  • src/Marathon.Infrastructure/Persistence/Repositories/*Repository.cs
  • src/Marathon.Infrastructure/Persistence/Mapping.cs — entity ↔ domain
  • src/Marathon.Infrastructure/Export/ExcelExporter.cs
  • src/Marathon.Infrastructure/Export/BetRowDenormalizer.cs
  • src/Marathon.Infrastructure/Migrations/* — EF migrations
  • src/Marathon.Infrastructure/DependencyInjection.cs
  • tests/Marathon.Infrastructure.Tests/**

Acceptance Criteria

  • All Infrastructure code compiles (Big Bang: compile-only smoke check OK).
  • DbContext + repositories cover all domain types.
  • Excel exporter output matches customer spec column names exactly (no typos in Bet_Match_Win_Fora_1_Value, hyphens in Period-1, etc.).
  • Filename includes inclusive date range from event scheduling.

Notes

  • This phase is parallelizable with Phase 3 (Scraping) — they touch disjoint files.
  • ExcelExporter uses normalized DB data and produces wide columns — DO NOT store data in wide format in SQLite.
  • Big Bang: do NOT run full test suite. A dotnet build smoke check is acceptable.

Review Checklist

  • Solution builds (compile-only)
  • Excel column names match customer spec exactly (cross-check against TZ §1.2 / §2.2)
  • Filename pattern matches Marathon_yyyy-MM-dd_to_yyyy-MM-dd.xlsx
  • No domain types polluted with EF attributes — mapping is in Configurations/
  • WAL mode enabled in connection string

Handoff to Next Phase