Phase 4: Documents & Memory — upload, FTS, AI tools, context injection
Backend:
- Document + MemoryEntry models with Alembic migration (GIN FTS index)
- File upload endpoint with path traversal protection (sanitized filenames)
- Background document text extraction (PyMuPDF)
- Full-text search on extracted_text via PostgreSQL tsvector/tsquery
- Memory CRUD with enum-validated categories/importance, field allow-list
- AI tools: save_memory, search_documents, get_memory (Claude function calling)
- Tool execution loop in stream_ai_response (multi-turn tool use)
- Context assembly: injects critical memory + relevant doc excerpts
- File storage abstraction (local filesystem, S3-swappable)
- Secure file deletion (DB flush before disk delete)
Frontend:
- Document upload dialog (drag-and-drop + file picker)
- Document list with status badges, search, download (via authenticated blob)
- Document viewer with extracted text preview
- Memory list grouped by category with importance color coding
- Memory editor with category/importance dropdowns
- Documents + Memory pages with full CRUD
- Enabled sidebar navigation for both sections
Review fixes applied:
- Sanitized upload filenames (path traversal prevention)
- Download via axios blob (not bare <a href>, preserves auth)
- Route ordering: /search before /{id}/reindex
- Memory update allows is_active=False + field allow-list
- MemoryEditor form resets on mode switch
- Literal enum validation on category/importance schemas
- DB flush before file deletion for data integrity
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
32
backend/app/workers/document_processor.py
Normal file
32
backend/app/workers/document_processor.py
Normal file
@@ -0,0 +1,32 @@
|
||||
import uuid
|
||||
|
||||
from app.database import async_session_factory
|
||||
from app.services.document_service import update_document_text
|
||||
from app.utils.text_extraction import extract_text
|
||||
|
||||
|
||||
async def process_document(doc_id: uuid.UUID, storage_path: str, mime_type: str) -> None:
|
||||
"""Background task: extract text from uploaded document."""
|
||||
async with async_session_factory() as db:
|
||||
try:
|
||||
# Update status to processing
|
||||
from sqlalchemy import select
|
||||
from app.models.document import Document
|
||||
|
||||
result = await db.execute(select(Document).where(Document.id == doc_id))
|
||||
doc = result.scalar_one_or_none()
|
||||
if not doc:
|
||||
return
|
||||
doc.processing_status = "processing"
|
||||
await db.commit()
|
||||
|
||||
# Extract text
|
||||
text = extract_text(storage_path, mime_type)
|
||||
|
||||
# Update with extracted text
|
||||
await update_document_text(db, doc_id, text, "completed" if text else "failed")
|
||||
await db.commit()
|
||||
|
||||
except Exception:
|
||||
await update_document_text(db, doc_id, "", "failed")
|
||||
await db.commit()
|
||||
Reference in New Issue
Block a user