Files
personal-ai-assistant/backend/tests/test_documents.py
dolgolyov.alexei 8b8fe916f0 Phase 4: Documents & Memory — upload, FTS, AI tools, context injection
Backend:
- Document + MemoryEntry models with Alembic migration (GIN FTS index)
- File upload endpoint with path traversal protection (sanitized filenames)
- Background document text extraction (PyMuPDF)
- Full-text search on extracted_text via PostgreSQL tsvector/tsquery
- Memory CRUD with enum-validated categories/importance, field allow-list
- AI tools: save_memory, search_documents, get_memory (Claude function calling)
- Tool execution loop in stream_ai_response (multi-turn tool use)
- Context assembly: injects critical memory + relevant doc excerpts
- File storage abstraction (local filesystem, S3-swappable)
- Secure file deletion (DB flush before disk delete)

Frontend:
- Document upload dialog (drag-and-drop + file picker)
- Document list with status badges, search, download (via authenticated blob)
- Document viewer with extracted text preview
- Memory list grouped by category with importance color coding
- Memory editor with category/importance dropdowns
- Documents + Memory pages with full CRUD
- Enabled sidebar navigation for both sections

Review fixes applied:
- Sanitized upload filenames (path traversal prevention)
- Download via axios blob (not bare <a href>, preserves auth)
- Route ordering: /search before /{id}/reindex
- Memory update allows is_active=False + field allow-list
- MemoryEditor form resets on mode switch
- Literal enum validation on category/importance schemas
- DB flush before file deletion for data integrity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:46:59 +03:00

98 lines
3.2 KiB
Python

import io
import pytest
from httpx import AsyncClient
@pytest.fixture
async def auth_headers(client: AsyncClient):
resp = await client.post("/api/v1/auth/register", json={
"email": "docuser@example.com",
"username": "docuser",
"password": "testpass123",
})
assert resp.status_code == 201
return {"Authorization": f"Bearer {resp.json()['access_token']}"}
async def test_upload_document(client: AsyncClient, auth_headers: dict):
resp = await client.post(
"/api/v1/documents/?doc_type=lab_result",
headers=auth_headers,
files={"file": ("test.pdf", b"%PDF-1.4 test content", "application/pdf")},
)
assert resp.status_code == 201
data = resp.json()
assert data["original_filename"] == "test.pdf"
assert data["doc_type"] == "lab_result"
assert data["processing_status"] == "pending"
async def test_upload_invalid_type(client: AsyncClient, auth_headers: dict):
resp = await client.post(
"/api/v1/documents/",
headers=auth_headers,
files={"file": ("test.exe", b"MZ...", "application/x-msdownload")},
)
assert resp.status_code == 400
async def test_list_documents(client: AsyncClient, auth_headers: dict):
# Upload first
await client.post(
"/api/v1/documents/",
headers=auth_headers,
files={"file": ("list_test.pdf", b"%PDF-1.4 content", "application/pdf")},
)
resp = await client.get("/api/v1/documents/", headers=auth_headers)
assert resp.status_code == 200
assert len(resp.json()["documents"]) >= 1
async def test_get_document(client: AsyncClient, auth_headers: dict):
resp = await client.post(
"/api/v1/documents/",
headers=auth_headers,
files={"file": ("get_test.pdf", b"%PDF-1.4 content", "application/pdf")},
)
doc_id = resp.json()["id"]
resp = await client.get(f"/api/v1/documents/{doc_id}", headers=auth_headers)
assert resp.status_code == 200
assert resp.json()["id"] == doc_id
async def test_delete_document(client: AsyncClient, auth_headers: dict):
resp = await client.post(
"/api/v1/documents/",
headers=auth_headers,
files={"file": ("del_test.pdf", b"%PDF-1.4 content", "application/pdf")},
)
doc_id = resp.json()["id"]
resp = await client.delete(f"/api/v1/documents/{doc_id}", headers=auth_headers)
assert resp.status_code == 204
resp = await client.get(f"/api/v1/documents/{doc_id}", headers=auth_headers)
assert resp.status_code == 404
async def test_document_ownership_isolation(client: AsyncClient, auth_headers: dict):
resp = await client.post(
"/api/v1/documents/",
headers=auth_headers,
files={"file": ("private.pdf", b"%PDF-1.4 content", "application/pdf")},
)
doc_id = resp.json()["id"]
# Register another user
resp = await client.post("/api/v1/auth/register", json={
"email": "docother@example.com",
"username": "docother",
"password": "testpass123",
})
other_headers = {"Authorization": f"Bearer {resp.json()['access_token']}"}
resp = await client.get(f"/api/v1/documents/{doc_id}", headers=other_headers)
assert resp.status_code == 404