Phase 4: Documents & Memory — upload, FTS, AI tools, context injection

Backend: - Document + MemoryEntry models with Alembic migration (GIN FTS index) - File upload endpoint with path traversal protection (sanitized filenames) - Background document text extraction (PyMuPDF) - Full-text search on extracted_text via PostgreSQL tsvector/tsquery - Memory CRUD with enum-validated categories/importance, field allow-list - AI tools: save_memory, search_documents, get_memory (Claude function calling) - Tool execution loop in stream_ai_response (multi-turn tool use) - Context assembly: injects critical memory + relevant doc excerpts - File storage abstraction (local filesystem, S3-swappable) - Secure file deletion (DB flush before disk delete) Frontend: - Document upload dialog (drag-and-drop + file picker) - Document list with status badges, search, download (via authenticated blob) - Document viewer with extracted text preview - Memory list grouped by category with importance color coding - Memory editor with category/importance dropdowns - Documents + Memory pages with full CRUD - Enabled sidebar navigation for both sections Review fixes applied: - Sanitized upload filenames (path traversal prevention) - Download via axios blob (not bare <a href>, preserves auth) - Route ordering: /search before /{id}/reindex - Memory update allows is_active=False + field allow-list - MemoryEditor form resets on mode switch - Literal enum validation on category/importance schemas - DB flush before file deletion for data integrity Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:46:59 +03:00
parent 03afb7a075
commit 8b8fe916f0
37 changed files with 1921 additions and 26 deletions
--- a/plans/phase-4-documents-memory.md
+++ b/plans/phase-4-documents-memory.md
@@ -0,0 +1,132 @@
+# Phase 4: Documents & Memory — Subplan
+
+## Goal
+
+Deliver document upload/processing with full-text search, a per-user memory system, Claude AI tools (save_memory, search_documents, get_memory), and context assembly that injects memory + document excerpts into conversations.
+
+## Prerequisites
+
+- Phase 3 completed: skills, personal context, context assembly layering
+- Latest migration is `003`
+
+---
+
+## Database Schema (Phase 4)
+
+### `documents` table
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK (inherited) |
+| user_id | UUID | FK -> users.id CASCADE, NOT NULL, indexed |
+| filename | VARCHAR(255) | NOT NULL (stored name) |
+| original_filename | VARCHAR(255) | NOT NULL |
+| storage_path | TEXT | NOT NULL |
+| mime_type | VARCHAR(100) | NOT NULL |
+| file_size | BIGINT | NOT NULL |
+| doc_type | VARCHAR(50) | NOT NULL, default 'other' |
+| extracted_text | TEXT | NULL |
+| processing_status | VARCHAR(20) | NOT NULL, default 'pending' |
+| metadata_ | JSONB | NULL |
+| created_at | TIMESTAMPTZ | inherited |
+
+GIN index on `to_tsvector('english', coalesce(extracted_text, ''))`.
+
+### `memory_entries` table
+
+| Column | Type | Constraints |
+|---|---|---|
+| id | UUID | PK (inherited) |
+| user_id | UUID | FK -> users.id CASCADE, NOT NULL, indexed |
+| category | VARCHAR(50) | NOT NULL |
+| title | VARCHAR(255) | NOT NULL |
+| content | TEXT | NOT NULL |
+| source_document_id | UUID | FK -> documents.id SET NULL, NULL |
+| importance | VARCHAR(20) | NOT NULL, default 'medium' |
+| is_active | BOOLEAN | NOT NULL, default true |
+| created_at | TIMESTAMPTZ | inherited |
+
+---
+
+## Tasks
+
+### A. Backend Models & Migration (Tasks 1–4)
+
+- [x] **A1.** Create `backend/app/models/document.py` with GIN index.
+- [x] **A2.** Create `backend/app/models/memory_entry.py`.
+- [x] **A3.** Update `models/__init__.py` + User model (add `documents`, `memory_entries` relationships).
+- [x] **A4.** Create migration `004_create_documents_and_memory_entries.py`.
+
+### B. Backend Config & Utilities (Tasks 5–7)
+
+- [x] **B5.** Add `UPLOAD_DIR`, `MAX_UPLOAD_SIZE_MB` to config. Add `pymupdf`, `aiofiles` to `pyproject.toml`.
+- [x] **B6.** Create `backend/app/utils/file_storage.py`: save_upload, get_file_path, delete_file.
+- [x] **B7.** Create `backend/app/utils/text_extraction.py`: extract_text_from_pdf (PyMuPDF), extract_text dispatcher.
+
+### C. Backend Schemas (Tasks 8–9)
+
+- [x] **C8.** Create `backend/app/schemas/document.py`.
+- [x] **C9.** Create `backend/app/schemas/memory.py`.
+
+### D. Backend Services (Tasks 10–13)
+
+- [x] **D10.** Create `backend/app/services/document_service.py`: upload, list, get, delete, search (FTS), update text.
+- [x] **D11.** Create `backend/app/services/memory_service.py`: CRUD + get_critical_memories.
+- [x] **D12.** Create `backend/app/workers/document_processor.py`: background text extraction.
+- [x] **D13.** Extend `ai_service.py`: tool definitions (save_memory, search_documents, get_memory), tool execution loop, context assembly steps 4-5 (memory + document injection).
+
+### E. Backend API Endpoints (Tasks 14–16)
+
+- [x] **E14.** Create `backend/app/api/v1/documents.py`: upload, list, get, download, delete, reindex, search.
+- [x] **E15.** Create `backend/app/api/v1/memory.py`: CRUD.
+- [x] **E16.** Update router.
+
+### F. Frontend API (Tasks 17–18)
+
+- [x] **F17.** Create `frontend/src/api/documents.ts`.
+- [x] **F18.** Create `frontend/src/api/memory.ts`.
+
+### G. Frontend Document Pages (Tasks 19–22)
+
+- [x] **G19.** Create `frontend/src/components/documents/upload-dialog.tsx`.
+- [x] **G20.** Create `frontend/src/components/documents/document-list.tsx`.
+- [x] **G21.** Create `frontend/src/components/documents/document-viewer.tsx`.
+- [x] **G22.** Create `frontend/src/pages/documents.tsx`.
+
+### H. Frontend Memory Pages (Tasks 23–25)
+
+- [x] **H23.** Create `frontend/src/components/memory/memory-editor.tsx`.
+- [x] **H24.** Create `frontend/src/components/memory/memory-list.tsx`.
+- [x] **H25.** Create `frontend/src/pages/memory.tsx`.
+
+### I. Routing, Sidebar, i18n (Tasks 26–28)
+
+- [x] **I26.** Update routes: `/documents`, `/memory`.
+- [x] **I27.** Update sidebar: enable Documents and Memory nav items.
+- [x] **I28.** Update en/ru translations.
+
+### J. Backend Tests (Tasks 29–30)
+
+- [x] **J29.** Create `backend/tests/test_documents.py`.
+- [x] **J30.** Create `backend/tests/test_memory.py`.
+
+---
+
+## Acceptance Criteria
+
+1. Upload PDF/images, files stored correctly, background text extraction works
+2. Full-text search on extracted_text returns ranked results
+3. Memory CRUD with category/importance filters
+4. AI tools: Claude calls save_memory, search_documents, get_memory during chat
+5. Context assembly injects critical memory + relevant doc excerpts
+6. Tool execution loop handles multi-turn tool use before final response
+7. Frontend: upload, list, search, view, download documents
+8. Frontend: memory list with categories, editor with all fields
+9. All UI text in English and Russian
+10. Backend tests pass
+
+---
+
+## Status
+
+**COMPLETED**