Backend:
- Document + MemoryEntry models with Alembic migration (GIN FTS index)
- File upload endpoint with path traversal protection (sanitized filenames)
- Background document text extraction (PyMuPDF)
- Full-text search on extracted_text via PostgreSQL tsvector/tsquery
- Memory CRUD with enum-validated categories/importance, field allow-list
- AI tools: save_memory, search_documents, get_memory (Claude function calling)
- Tool execution loop in stream_ai_response (multi-turn tool use)
- Context assembly: injects critical memory + relevant doc excerpts
- File storage abstraction (local filesystem, S3-swappable)
- Secure file deletion (DB flush before disk delete)
Frontend:
- Document upload dialog (drag-and-drop + file picker)
- Document list with status badges, search, download (via authenticated blob)
- Document viewer with extracted text preview
- Memory list grouped by category with importance color coding
- Memory editor with category/importance dropdowns
- Documents + Memory pages with full CRUD
- Enabled sidebar navigation for both sections
Review fixes applied:
- Sanitized upload filenames (path traversal prevention)
- Download via axios blob (not bare <a href>, preserves auth)
- Route ordering: /search before /{id}/reindex
- Memory update allows is_active=False + field allow-list
- MemoryEditor form resets on mode switch
- Literal enum validation on category/importance schemas
- DB flush before file deletion for data integrity
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.9 KiB
4.9 KiB
Phase 4: Documents & Memory — Subplan
Goal
Deliver document upload/processing with full-text search, a per-user memory system, Claude AI tools (save_memory, search_documents, get_memory), and context assembly that injects memory + document excerpts into conversations.
Prerequisites
- Phase 3 completed: skills, personal context, context assembly layering
- Latest migration is
003
Database Schema (Phase 4)
documents table
| Column | Type | Constraints |
|---|---|---|
| id | UUID | PK (inherited) |
| user_id | UUID | FK -> users.id CASCADE, NOT NULL, indexed |
| filename | VARCHAR(255) | NOT NULL (stored name) |
| original_filename | VARCHAR(255) | NOT NULL |
| storage_path | TEXT | NOT NULL |
| mime_type | VARCHAR(100) | NOT NULL |
| file_size | BIGINT | NOT NULL |
| doc_type | VARCHAR(50) | NOT NULL, default 'other' |
| extracted_text | TEXT | NULL |
| processing_status | VARCHAR(20) | NOT NULL, default 'pending' |
| metadata_ | JSONB | NULL |
| created_at | TIMESTAMPTZ | inherited |
GIN index on to_tsvector('english', coalesce(extracted_text, '')).
memory_entries table
| Column | Type | Constraints |
|---|---|---|
| id | UUID | PK (inherited) |
| user_id | UUID | FK -> users.id CASCADE, NOT NULL, indexed |
| category | VARCHAR(50) | NOT NULL |
| title | VARCHAR(255) | NOT NULL |
| content | TEXT | NOT NULL |
| source_document_id | UUID | FK -> documents.id SET NULL, NULL |
| importance | VARCHAR(20) | NOT NULL, default 'medium' |
| is_active | BOOLEAN | NOT NULL, default true |
| created_at | TIMESTAMPTZ | inherited |
Tasks
A. Backend Models & Migration (Tasks 1–4)
- A1. Create
backend/app/models/document.pywith GIN index. - A2. Create
backend/app/models/memory_entry.py. - A3. Update
models/__init__.py+ User model (adddocuments,memory_entriesrelationships). - A4. Create migration
004_create_documents_and_memory_entries.py.
B. Backend Config & Utilities (Tasks 5–7)
- B5. Add
UPLOAD_DIR,MAX_UPLOAD_SIZE_MBto config. Addpymupdf,aiofilestopyproject.toml. - B6. Create
backend/app/utils/file_storage.py: save_upload, get_file_path, delete_file. - B7. Create
backend/app/utils/text_extraction.py: extract_text_from_pdf (PyMuPDF), extract_text dispatcher.
C. Backend Schemas (Tasks 8–9)
- C8. Create
backend/app/schemas/document.py. - C9. Create
backend/app/schemas/memory.py.
D. Backend Services (Tasks 10–13)
- D10. Create
backend/app/services/document_service.py: upload, list, get, delete, search (FTS), update text. - D11. Create
backend/app/services/memory_service.py: CRUD + get_critical_memories. - D12. Create
backend/app/workers/document_processor.py: background text extraction. - D13. Extend
ai_service.py: tool definitions (save_memory, search_documents, get_memory), tool execution loop, context assembly steps 4-5 (memory + document injection).
E. Backend API Endpoints (Tasks 14–16)
- E14. Create
backend/app/api/v1/documents.py: upload, list, get, download, delete, reindex, search. - E15. Create
backend/app/api/v1/memory.py: CRUD. - E16. Update router.
F. Frontend API (Tasks 17–18)
- F17. Create
frontend/src/api/documents.ts. - F18. Create
frontend/src/api/memory.ts.
G. Frontend Document Pages (Tasks 19–22)
- G19. Create
frontend/src/components/documents/upload-dialog.tsx. - G20. Create
frontend/src/components/documents/document-list.tsx. - G21. Create
frontend/src/components/documents/document-viewer.tsx. - G22. Create
frontend/src/pages/documents.tsx.
H. Frontend Memory Pages (Tasks 23–25)
- H23. Create
frontend/src/components/memory/memory-editor.tsx. - H24. Create
frontend/src/components/memory/memory-list.tsx. - H25. Create
frontend/src/pages/memory.tsx.
I. Routing, Sidebar, i18n (Tasks 26–28)
- I26. Update routes:
/documents,/memory. - I27. Update sidebar: enable Documents and Memory nav items.
- I28. Update en/ru translations.
J. Backend Tests (Tasks 29–30)
- J29. Create
backend/tests/test_documents.py. - J30. Create
backend/tests/test_memory.py.
Acceptance Criteria
- Upload PDF/images, files stored correctly, background text extraction works
- Full-text search on extracted_text returns ranked results
- Memory CRUD with category/importance filters
- AI tools: Claude calls save_memory, search_documents, get_memory during chat
- Context assembly injects critical memory + relevant doc excerpts
- Tool execution loop handles multi-turn tool use before final response
- Frontend: upload, list, search, view, download documents
- Frontend: memory list with categories, editor with all fields
- All UI text in English and Russian
- Backend tests pass
Status
COMPLETED