Files
personal-ai-assistant/plans/phase-4-documents-memory.md
dolgolyov.alexei 8b8fe916f0 Phase 4: Documents & Memory — upload, FTS, AI tools, context injection
Backend:
- Document + MemoryEntry models with Alembic migration (GIN FTS index)
- File upload endpoint with path traversal protection (sanitized filenames)
- Background document text extraction (PyMuPDF)
- Full-text search on extracted_text via PostgreSQL tsvector/tsquery
- Memory CRUD with enum-validated categories/importance, field allow-list
- AI tools: save_memory, search_documents, get_memory (Claude function calling)
- Tool execution loop in stream_ai_response (multi-turn tool use)
- Context assembly: injects critical memory + relevant doc excerpts
- File storage abstraction (local filesystem, S3-swappable)
- Secure file deletion (DB flush before disk delete)

Frontend:
- Document upload dialog (drag-and-drop + file picker)
- Document list with status badges, search, download (via authenticated blob)
- Document viewer with extracted text preview
- Memory list grouped by category with importance color coding
- Memory editor with category/importance dropdowns
- Documents + Memory pages with full CRUD
- Enabled sidebar navigation for both sections

Review fixes applied:
- Sanitized upload filenames (path traversal prevention)
- Download via axios blob (not bare <a href>, preserves auth)
- Route ordering: /search before /{id}/reindex
- Memory update allows is_active=False + field allow-list
- MemoryEditor form resets on mode switch
- Literal enum validation on category/importance schemas
- DB flush before file deletion for data integrity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:46:59 +03:00

133 lines
4.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 4: Documents & Memory — Subplan
## Goal
Deliver document upload/processing with full-text search, a per-user memory system, Claude AI tools (save_memory, search_documents, get_memory), and context assembly that injects memory + document excerpts into conversations.
## Prerequisites
- Phase 3 completed: skills, personal context, context assembly layering
- Latest migration is `003`
---
## Database Schema (Phase 4)
### `documents` table
| Column | Type | Constraints |
|---|---|---|
| id | UUID | PK (inherited) |
| user_id | UUID | FK -> users.id CASCADE, NOT NULL, indexed |
| filename | VARCHAR(255) | NOT NULL (stored name) |
| original_filename | VARCHAR(255) | NOT NULL |
| storage_path | TEXT | NOT NULL |
| mime_type | VARCHAR(100) | NOT NULL |
| file_size | BIGINT | NOT NULL |
| doc_type | VARCHAR(50) | NOT NULL, default 'other' |
| extracted_text | TEXT | NULL |
| processing_status | VARCHAR(20) | NOT NULL, default 'pending' |
| metadata_ | JSONB | NULL |
| created_at | TIMESTAMPTZ | inherited |
GIN index on `to_tsvector('english', coalesce(extracted_text, ''))`.
### `memory_entries` table
| Column | Type | Constraints |
|---|---|---|
| id | UUID | PK (inherited) |
| user_id | UUID | FK -> users.id CASCADE, NOT NULL, indexed |
| category | VARCHAR(50) | NOT NULL |
| title | VARCHAR(255) | NOT NULL |
| content | TEXT | NOT NULL |
| source_document_id | UUID | FK -> documents.id SET NULL, NULL |
| importance | VARCHAR(20) | NOT NULL, default 'medium' |
| is_active | BOOLEAN | NOT NULL, default true |
| created_at | TIMESTAMPTZ | inherited |
---
## Tasks
### A. Backend Models & Migration (Tasks 14)
- [x] **A1.** Create `backend/app/models/document.py` with GIN index.
- [x] **A2.** Create `backend/app/models/memory_entry.py`.
- [x] **A3.** Update `models/__init__.py` + User model (add `documents`, `memory_entries` relationships).
- [x] **A4.** Create migration `004_create_documents_and_memory_entries.py`.
### B. Backend Config & Utilities (Tasks 57)
- [x] **B5.** Add `UPLOAD_DIR`, `MAX_UPLOAD_SIZE_MB` to config. Add `pymupdf`, `aiofiles` to `pyproject.toml`.
- [x] **B6.** Create `backend/app/utils/file_storage.py`: save_upload, get_file_path, delete_file.
- [x] **B7.** Create `backend/app/utils/text_extraction.py`: extract_text_from_pdf (PyMuPDF), extract_text dispatcher.
### C. Backend Schemas (Tasks 89)
- [x] **C8.** Create `backend/app/schemas/document.py`.
- [x] **C9.** Create `backend/app/schemas/memory.py`.
### D. Backend Services (Tasks 1013)
- [x] **D10.** Create `backend/app/services/document_service.py`: upload, list, get, delete, search (FTS), update text.
- [x] **D11.** Create `backend/app/services/memory_service.py`: CRUD + get_critical_memories.
- [x] **D12.** Create `backend/app/workers/document_processor.py`: background text extraction.
- [x] **D13.** Extend `ai_service.py`: tool definitions (save_memory, search_documents, get_memory), tool execution loop, context assembly steps 4-5 (memory + document injection).
### E. Backend API Endpoints (Tasks 1416)
- [x] **E14.** Create `backend/app/api/v1/documents.py`: upload, list, get, download, delete, reindex, search.
- [x] **E15.** Create `backend/app/api/v1/memory.py`: CRUD.
- [x] **E16.** Update router.
### F. Frontend API (Tasks 1718)
- [x] **F17.** Create `frontend/src/api/documents.ts`.
- [x] **F18.** Create `frontend/src/api/memory.ts`.
### G. Frontend Document Pages (Tasks 1922)
- [x] **G19.** Create `frontend/src/components/documents/upload-dialog.tsx`.
- [x] **G20.** Create `frontend/src/components/documents/document-list.tsx`.
- [x] **G21.** Create `frontend/src/components/documents/document-viewer.tsx`.
- [x] **G22.** Create `frontend/src/pages/documents.tsx`.
### H. Frontend Memory Pages (Tasks 2325)
- [x] **H23.** Create `frontend/src/components/memory/memory-editor.tsx`.
- [x] **H24.** Create `frontend/src/components/memory/memory-list.tsx`.
- [x] **H25.** Create `frontend/src/pages/memory.tsx`.
### I. Routing, Sidebar, i18n (Tasks 2628)
- [x] **I26.** Update routes: `/documents`, `/memory`.
- [x] **I27.** Update sidebar: enable Documents and Memory nav items.
- [x] **I28.** Update en/ru translations.
### J. Backend Tests (Tasks 2930)
- [x] **J29.** Create `backend/tests/test_documents.py`.
- [x] **J30.** Create `backend/tests/test_memory.py`.
---
## Acceptance Criteria
1. Upload PDF/images, files stored correctly, background text extraction works
2. Full-text search on extracted_text returns ranked results
3. Memory CRUD with category/importance filters
4. AI tools: Claude calls save_memory, search_documents, get_memory during chat
5. Context assembly injects critical memory + relevant doc excerpts
6. Tool execution loop handles multi-turn tool use before final response
7. Frontend: upload, list, search, view, download documents
8. Frontend: memory list with categories, editor with all fields
9. All UI text in English and Russian
10. Backend tests pass
---
## Status
**COMPLETED**