Phase 4: Documents & Memory — upload, FTS, AI tools, context injection
Backend:
- Document + MemoryEntry models with Alembic migration (GIN FTS index)
- File upload endpoint with path traversal protection (sanitized filenames)
- Background document text extraction (PyMuPDF)
- Full-text search on extracted_text via PostgreSQL tsvector/tsquery
- Memory CRUD with enum-validated categories/importance, field allow-list
- AI tools: save_memory, search_documents, get_memory (Claude function calling)
- Tool execution loop in stream_ai_response (multi-turn tool use)
- Context assembly: injects critical memory + relevant doc excerpts
- File storage abstraction (local filesystem, S3-swappable)
- Secure file deletion (DB flush before disk delete)
Frontend:
- Document upload dialog (drag-and-drop + file picker)
- Document list with status badges, search, download (via authenticated blob)
- Document viewer with extracted text preview
- Memory list grouped by category with importance color coding
- Memory editor with category/importance dropdowns
- Documents + Memory pages with full CRUD
- Enabled sidebar navigation for both sections
Review fixes applied:
- Sanitized upload filenames (path traversal prevention)
- Download via axios blob (not bare <a href>, preserves auth)
- Route ordering: /search before /{id}/reindex
- Memory update allows is_active=False + field allow-list
- MemoryEditor form resets on mode switch
- Literal enum validation on category/importance schemas
- DB flush before file deletion for data integrity
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
0
backend/app/utils/__init__.py
Normal file
0
backend/app/utils/__init__.py
Normal file
34
backend/app/utils/file_storage.py
Normal file
34
backend/app/utils/file_storage.py
Normal file
@@ -0,0 +1,34 @@
|
||||
import uuid
|
||||
from pathlib import Path
|
||||
|
||||
import aiofiles
|
||||
|
||||
from app.config import settings
|
||||
|
||||
|
||||
def _get_upload_dir(user_id: uuid.UUID, doc_id: uuid.UUID) -> Path:
|
||||
path = Path(settings.UPLOAD_DIR) / str(user_id) / str(doc_id)
|
||||
path.mkdir(parents=True, exist_ok=True)
|
||||
return path
|
||||
|
||||
|
||||
async def save_upload(user_id: uuid.UUID, doc_id: uuid.UUID, filename: str, content: bytes) -> str:
|
||||
directory = _get_upload_dir(user_id, doc_id)
|
||||
file_path = directory / filename
|
||||
async with aiofiles.open(file_path, "wb") as f:
|
||||
await f.write(content)
|
||||
return str(file_path)
|
||||
|
||||
|
||||
def get_file_path(storage_path: str) -> Path:
|
||||
return Path(storage_path)
|
||||
|
||||
|
||||
def delete_file(storage_path: str) -> None:
|
||||
path = Path(storage_path)
|
||||
if path.exists():
|
||||
path.unlink()
|
||||
# Clean up empty parent dirs
|
||||
parent = path.parent
|
||||
if parent.exists() and not any(parent.iterdir()):
|
||||
parent.rmdir()
|
||||
19
backend/app/utils/text_extraction.py
Normal file
19
backend/app/utils/text_extraction.py
Normal file
@@ -0,0 +1,19 @@
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def extract_text_from_pdf(file_path: str) -> str:
|
||||
import fitz # PyMuPDF
|
||||
|
||||
text_parts = []
|
||||
with fitz.open(file_path) as doc:
|
||||
for page in doc:
|
||||
text_parts.append(page.get_text())
|
||||
return "\n".join(text_parts).strip()
|
||||
|
||||
|
||||
def extract_text(file_path: str, mime_type: str) -> str:
|
||||
if mime_type == "application/pdf":
|
||||
return extract_text_from_pdf(file_path)
|
||||
# For images, we'd use pytesseract but skip for now as it requires system deps
|
||||
# For other types, return empty
|
||||
return ""
|
||||
Reference in New Issue
Block a user