Phase 4: Documents & Memory — upload, FTS, AI tools, context injection

Backend:
- Document + MemoryEntry models with Alembic migration (GIN FTS index)
- File upload endpoint with path traversal protection (sanitized filenames)
- Background document text extraction (PyMuPDF)
- Full-text search on extracted_text via PostgreSQL tsvector/tsquery
- Memory CRUD with enum-validated categories/importance, field allow-list
- AI tools: save_memory, search_documents, get_memory (Claude function calling)
- Tool execution loop in stream_ai_response (multi-turn tool use)
- Context assembly: injects critical memory + relevant doc excerpts
- File storage abstraction (local filesystem, S3-swappable)
- Secure file deletion (DB flush before disk delete)

Frontend:
- Document upload dialog (drag-and-drop + file picker)
- Document list with status badges, search, download (via authenticated blob)
- Document viewer with extracted text preview
- Memory list grouped by category with importance color coding
- Memory editor with category/importance dropdowns
- Documents + Memory pages with full CRUD
- Enabled sidebar navigation for both sections

Review fixes applied:
- Sanitized upload filenames (path traversal prevention)
- Download via axios blob (not bare <a href>, preserves auth)
- Route ordering: /search before /{id}/reindex
- Memory update allows is_active=False + field allow-list
- MemoryEditor form resets on mode switch
- Literal enum validation on category/importance schemas
- DB flush before file deletion for data integrity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-19 13:46:59 +03:00
parent 03afb7a075
commit 8b8fe916f0
37 changed files with 1921 additions and 26 deletions

View File

@@ -4,5 +4,7 @@ from app.models.chat import Chat
from app.models.message import Message
from app.models.context_file import ContextFile
from app.models.skill import Skill
from app.models.document import Document
from app.models.memory_entry import MemoryEntry
__all__ = ["User", "Session", "Chat", "Message", "ContextFile", "Skill"]
__all__ = ["User", "Session", "Chat", "Message", "ContextFile", "Skill", "Document", "MemoryEntry"]

View File

@@ -0,0 +1,33 @@
import uuid
from sqlalchemy import BigInteger, ForeignKey, Index, String, Text, func, text
from sqlalchemy.dialects.postgresql import JSONB, UUID
from sqlalchemy.orm import Mapped, mapped_column, relationship
from app.database import Base
class Document(Base):
__tablename__ = "documents"
__table_args__ = (
Index(
"ix_documents_fts",
text("to_tsvector('english', coalesce(extracted_text, ''))"),
postgresql_using="gin",
),
)
user_id: Mapped[uuid.UUID] = mapped_column(
UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True
)
filename: Mapped[str] = mapped_column(String(255), nullable=False)
original_filename: Mapped[str] = mapped_column(String(255), nullable=False)
storage_path: Mapped[str] = mapped_column(Text, nullable=False)
mime_type: Mapped[str] = mapped_column(String(100), nullable=False)
file_size: Mapped[int] = mapped_column(BigInteger, nullable=False)
doc_type: Mapped[str] = mapped_column(String(50), nullable=False, default="other")
extracted_text: Mapped[str | None] = mapped_column(Text, nullable=True)
processing_status: Mapped[str] = mapped_column(String(20), nullable=False, default="pending")
metadata_: Mapped[dict | None] = mapped_column("metadata", JSONB, nullable=True)
user: Mapped["User"] = relationship(back_populates="documents") # noqa: F821

View File

@@ -0,0 +1,26 @@
import uuid
from sqlalchemy import Boolean, ForeignKey, String, Text
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.orm import Mapped, mapped_column, relationship
from app.database import Base
class MemoryEntry(Base):
__tablename__ = "memory_entries"
user_id: Mapped[uuid.UUID] = mapped_column(
UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False, index=True
)
category: Mapped[str] = mapped_column(String(50), nullable=False)
title: Mapped[str] = mapped_column(String(255), nullable=False)
content: Mapped[str] = mapped_column(Text, nullable=False)
source_document_id: Mapped[uuid.UUID | None] = mapped_column(
UUID(as_uuid=True), ForeignKey("documents.id", ondelete="SET NULL"), nullable=True
)
importance: Mapped[str] = mapped_column(String(20), nullable=False, default="medium")
is_active: Mapped[bool] = mapped_column(Boolean, nullable=False, default=True)
user: Mapped["User"] = relationship(back_populates="memory_entries") # noqa: F821
source_document: Mapped["Document | None"] = relationship() # noqa: F821

View File

@@ -27,3 +27,5 @@ class User(Base):
sessions: Mapped[list["Session"]] = relationship(back_populates="user", cascade="all, delete-orphan") # noqa: F821
chats: Mapped[list["Chat"]] = relationship(back_populates="user", cascade="all, delete-orphan") # noqa: F821
skills: Mapped[list["Skill"]] = relationship(back_populates="user", cascade="all, delete-orphan") # noqa: F821
documents: Mapped[list["Document"]] = relationship(back_populates="user", cascade="all, delete-orphan") # noqa: F821
memory_entries: Mapped[list["MemoryEntry"]] = relationship(back_populates="user", cascade="all, delete-orphan") # noqa: F821