Phase 4: Documents & Memory — upload, FTS, AI tools, context injection

Backend: - Document + MemoryEntry models with Alembic migration (GIN FTS index) - File upload endpoint with path traversal protection (sanitized filenames) - Background document text extraction (PyMuPDF) - Full-text search on extracted_text via PostgreSQL tsvector/tsquery - Memory CRUD with enum-validated categories/importance, field allow-list - AI tools: save_memory, search_documents, get_memory (Claude function calling) - Tool execution loop in stream_ai_response (multi-turn tool use) - Context assembly: injects critical memory + relevant doc excerpts - File storage abstraction (local filesystem, S3-swappable) - Secure file deletion (DB flush before disk delete) Frontend: - Document upload dialog (drag-and-drop + file picker) - Document list with status badges, search, download (via authenticated blob) - Document viewer with extracted text preview - Memory list grouped by category with importance color coding - Memory editor with category/importance dropdowns - Documents + Memory pages with full CRUD - Enabled sidebar navigation for both sections Review fixes applied: - Sanitized upload filenames (path traversal prevention) - Download via axios blob (not bare <a href>, preserves auth) - Route ordering: /search before /{id}/reindex - Memory update allows is_active=False + field allow-list - MemoryEditor form resets on mode switch - Literal enum validation on category/importance schemas - DB flush before file deletion for data integrity Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:46:59 +03:00
parent 03afb7a075
commit 8b8fe916f0
37 changed files with 1921 additions and 26 deletions
@@ -0,0 +1,34 @@
+import uuid
+from pathlib import Path
+
+import aiofiles
+
+from app.config import settings
+
+
+def _get_upload_dir(user_id: uuid.UUID, doc_id: uuid.UUID) -> Path:
+    path = Path(settings.UPLOAD_DIR) / str(user_id) / str(doc_id)
+    path.mkdir(parents=True, exist_ok=True)
+    return path
+
+
+async def save_upload(user_id: uuid.UUID, doc_id: uuid.UUID, filename: str, content: bytes) -> str:
+    directory = _get_upload_dir(user_id, doc_id)
+    file_path = directory / filename
+    async with aiofiles.open(file_path, "wb") as f:
+        await f.write(content)
+    return str(file_path)
+
+
+def get_file_path(storage_path: str) -> Path:
+    return Path(storage_path)
+
+
+def delete_file(storage_path: str) -> None:
+    path = Path(storage_path)
+    if path.exists():
+        path.unlink()
+    # Clean up empty parent dirs
+    parent = path.parent
+    if parent.exists() and not any(parent.iterdir()):
+        parent.rmdir()
@@ -0,0 +1,19 @@
+from pathlib import Path
+
+
+def extract_text_from_pdf(file_path: str) -> str:
+    import fitz  # PyMuPDF
+
+    text_parts = []
+    with fitz.open(file_path) as doc:
+        for page in doc:
+            text_parts.append(page.get_text())
+    return "\n".join(text_parts).strip()
+
+
+def extract_text(file_path: str, mime_type: str) -> str:
+    if mime_type == "application/pdf":
+        return extract_text_from_pdf(file_path)
+    # For images, we'd use pytesseract but skip for now as it requires system deps
+    # For other types, return empty
+    return ""