Files
personal-ai-assistant/backend/app/models/document.py
dolgolyov.alexei 8b8fe916f0 Phase 4: Documents & Memory — upload, FTS, AI tools, context injection
Backend:
- Document + MemoryEntry models with Alembic migration (GIN FTS index)
- File upload endpoint with path traversal protection (sanitized filenames)
- Background document text extraction (PyMuPDF)
- Full-text search on extracted_text via PostgreSQL tsvector/tsquery
- Memory CRUD with enum-validated categories/importance, field allow-list
- AI tools: save_memory, search_documents, get_memory (Claude function calling)
- Tool execution loop in stream_ai_response (multi-turn tool use)
- Context assembly: injects critical memory + relevant doc excerpts
- File storage abstraction (local filesystem, S3-swappable)
- Secure file deletion (DB flush before disk delete)

Frontend:
- Document upload dialog (drag-and-drop + file picker)
- Document list with status badges, search, download (via authenticated blob)
- Document viewer with extracted text preview
- Memory list grouped by category with importance color coding
- Memory editor with category/importance dropdowns
- Documents + Memory pages with full CRUD
- Enabled sidebar navigation for both sections

Review fixes applied:
- Sanitized upload filenames (path traversal prevention)
- Download via axios blob (not bare <a href>, preserves auth)
- Route ordering: /search before /{id}/reindex
- Memory update allows is_active=False + field allow-list
- MemoryEditor form resets on mode switch
- Literal enum validation on category/importance schemas
- DB flush before file deletion for data integrity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:46:59 +03:00

34 lines
1.4 KiB
Python

import uuid
from sqlalchemy import BigInteger, ForeignKey, Index, String, Text, func, text
from sqlalchemy.dialects.postgresql import JSONB, UUID
from sqlalchemy.orm import Mapped, mapped_column, relationship
from app.database import Base
class Document(Base):
__tablename__ = "documents"
__table_args__ = (
Index(
"ix_documents_fts",
text("to_tsvector('english', coalesce(extracted_text, ''))"),
postgresql_using="gin",
),
)
user_id: Mapped[uuid.UUID] = mapped_column(
UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True
)
filename: Mapped[str] = mapped_column(String(255), nullable=False)
original_filename: Mapped[str] = mapped_column(String(255), nullable=False)
storage_path: Mapped[str] = mapped_column(Text, nullable=False)
mime_type: Mapped[str] = mapped_column(String(100), nullable=False)
file_size: Mapped[int] = mapped_column(BigInteger, nullable=False)
doc_type: Mapped[str] = mapped_column(String(50), nullable=False, default="other")
extracted_text: Mapped[str | None] = mapped_column(Text, nullable=True)
processing_status: Mapped[str] = mapped_column(String(20), nullable=False, default="pending")
metadata_: Mapped[dict | None] = mapped_column("metadata", JSONB, nullable=True)
user: Mapped["User"] = relationship(back_populates="documents") # noqa: F821