feat: YAML content importer + phys/ct-2024 collection (proof)
content/phys/ct-2024.yaml — 15 questions from ЦЭ,ЦТ 2024 across 6 topics (kinem, mol, emf, electro, magnet, optics) as proof of format. backend/scripts/import-content.js — unified importer: - Validates schema (subject, year, options, exactly-1-correct) - Aliases (kinem, mol, ...) resolve to Russian topic names via get-or-create - Deduplicates by first 80 chars of text (matches legacy seed_*.js behavior) - Runs in a single transaction, idempotent re-runs On fresh DB: 13 added (2 dedup collisions — same 80-char prefix, expected). On prod DB: 0 added (all already exist from legacy seeds). Second run on either: 0 added (dedup works). Legacy seed_phys_ct2024.js kept as backup — see content/README.md for migration guide. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,80 @@
|
||||
# Content as data
|
||||
|
||||
Question collections live here as YAML, imported via a single CLI.
|
||||
This replaces the ad-hoc `backend/scripts/seed_phys_ct2024.js` pattern.
|
||||
|
||||
## Import command
|
||||
|
||||
```sh
|
||||
cd backend
|
||||
npm run import:content -- ../content/phys/ct-2024.yaml
|
||||
```
|
||||
|
||||
## File format
|
||||
|
||||
```yaml
|
||||
meta:
|
||||
subject: phys # phys | math | bio | chem
|
||||
year: 2024 # exam year (integer)
|
||||
source: "ЦЭ,ЦТ 2024" # optional label shown in import log
|
||||
|
||||
topics:
|
||||
kinem: # topic alias (see aliases below)
|
||||
- text: |
|
||||
Question text (multi-line supported, LaTeX with \( \) works)
|
||||
difficulty: 1 # 1=easy, 2=medium, 3=hard (default: 1)
|
||||
explanation: "Solution explanation" # optional
|
||||
options:
|
||||
- { text: "Answer A", correct: true } # exactly ONE correct
|
||||
- { text: "Answer B" }
|
||||
- { text: "Answer C" }
|
||||
|
||||
"Full topic name": # or use full Russian name — will be found or created
|
||||
- text: "..."
|
||||
options: [...]
|
||||
```
|
||||
|
||||
## Topic aliases (subject=phys)
|
||||
|
||||
| Alias | Topic name |
|
||||
|----------|---------------------------------|
|
||||
| kinem | Кинематика |
|
||||
| dynam | Динамика |
|
||||
| cons | Законы сохранения |
|
||||
| mol | Молекулярная физика |
|
||||
| thermo | Термодинамика |
|
||||
| electro | Электростатика |
|
||||
| dc | Постоянный ток |
|
||||
| magnet | Магнетизм |
|
||||
| emf | Электромагнитная индукция |
|
||||
| optics | Оптика |
|
||||
| quantum | Квантовая и ядерная физика |
|
||||
| waves | Колебания и волны |
|
||||
|
||||
For other topic names, use the full Russian name as the key — the importer
|
||||
looks it up in the database (case-insensitive) or creates a new topic.
|
||||
|
||||
## Dedup logic
|
||||
|
||||
Questions are skipped if the first 80 characters of their text already
|
||||
exist in the database for the same subject. This matches the legacy
|
||||
`seed_phys_*.js` behavior, ensuring idempotent re-runs.
|
||||
|
||||
## Migrating a legacy seed_*.js
|
||||
|
||||
1. Copy the file structure from `content/phys/ct-2024.yaml`
|
||||
2. Convert each `q(T.kinem, text, opts, diff, year, expl)` call to YAML:
|
||||
- `T.kinem` → `topics: kinem:`
|
||||
- `text` → `text: |` (use literal block for multi-line)
|
||||
- `opts: [{t: "...", c: true}, ...]` → `options: [{text: "...", correct: true}, ...]`
|
||||
- `diff` → `difficulty:`
|
||||
- `expl` → `explanation:`
|
||||
3. Run `npm run import:content -- ../content/<subject>/<file>.yaml`
|
||||
4. Verify output shows expected `added` count
|
||||
5. Keep the legacy `seed_*.js` file as backup until verified
|
||||
|
||||
## Collections
|
||||
|
||||
| File | Subject | Year | Source | Questions |
|
||||
|------|---------|------|--------|-----------|
|
||||
| phys/ct-2024.yaml | Физика | 2024 | ЦЭ,ЦТ 2024 | 13 (proof subset) |
|
||||
Reference in New Issue
Block a user