Files
notify-bridge/packages/server/src/notify_bridge_server/services/watcher.py
T
alexei.dolgolyov 920920bc67
Build and Test / test-frontend (push) Successful in 9m37s
Build and Test / test-backend (push) Successful in 10m53s
Build and Test / build-image (push) Failing after 14m52s
feat: production-readiness hardening across security, async, DB, ops
Security
- SSRF: async DNS resolver; allow_redirects=False on all outbound clients;
  matrix homeserver_url validated on create/update/test; update_provider
  and email_bot merge incoming config and reject ***-masked secrets.
- Auth: bcrypt offloaded to asyncio.to_thread; JWT now carries iss/aud +
  leeway and rejects missing claims; setup TOCTOU closed inside a
  transaction; rate limits extended (default 600/min, 10/min on password
  change, 30/min on needs-setup); constant-time login to prevent username
  enumeration.
- Config: rejects known dev secret keys; validates CORS origin schemes,
  port range, token lifetimes.
- Webhook handlers stream-read body with a 1 MiB cap; Discord 429 retries
  bounded (3 attempts, Retry-After capped at 60 s).
- CSP + HSTS added to SecurityHeadersMiddleware.

Async / runtime
- SQLite engine: WAL, synchronous=NORMAL, foreign_keys=ON, busy_timeout,
  pool_pre_ping, dispose on shutdown.
- Lifespan shutdown now stops scheduler before closing HTTP session and
  disposing the engine.
- Shared aiohttp session locked against concurrent first-caller races;
  core NotificationDispatcher accepts and reuses it.
- Storage and scheduled backup writes wrapped in asyncio.to_thread.
- NUT client writes bounded by asyncio.wait_for.
- Telegram poller switched from 3 s short-poll to 30 s interval + 25 s
  long-poll (~10x fewer API calls).

Database
- New performance-indexes migration covers every FK/owner column and
  hot-path composite (notification_tracker(provider_id, enabled);
  event_log(user_id, created_at DESC); webhook_payload_log(provider_id,
  created_at DESC); action_execution(action_id, started_at DESC)).
- New schema_version table for future upgrade gating.
- __system__ placeholder user (id=0) seeded so user_id=0 system defaults
  satisfy the newly enforced FK; filtered out of /auth/needs-setup,
  /api/users, and setup.
- list_notification_trackers rewritten to batched loads (was 1+N+N*M).
- Retention job extended to event_log, webhook_payload_log, and
  action_execution; retention days exposed as a setting.

Scheduler
- AsyncIOScheduler job_defaults: coalesce, misfire_grace_time=300,
  max_instances=1.

Ops
- uvicorn runs with proxy_headers, forwarded_allow_ips,
  timeout_graceful_shutdown; access log suppressed in non-debug.
- FastAPI version string now reads from importlib.metadata.
- New /api/ready endpoint separate from /api/health.
- docker-compose drops the ALLOW_PRIVATE_URLS=1 default, adds mem/cpu/pid
  limits, read_only + tmpfs, cap_drop:ALL, no-new-privileges; healthcheck
  targets /api/ready.
- CI now runs on push/PR with backend pytest, frontend svelte-check +
  build, and a non-push image build; release workflow gated on tests,
  publishes immutable sha-<commit> image tag, adds Trivy scan.

Tests
- New packages/server/tests/ with 29 passing tests: config validation,
  JWT round-trip + aud/alg=none rejection, SSRF scheme and private-range
  enforcement (sync + async), Discord bounded retry, and a lifespan-level
  /api/health + /api/ready smoke check.
- Renamed the misnamed services/test_dispatch.py to manual_dispatch.py so
  pytest never auto-collects production code.

Frontend
- /login now redirects already-authenticated users to /, shows a distinct
  'backend unreachable' banner (en/ru) when /auth/needs-setup fails.
2026-04-23 19:44:56 +03:00

419 lines
17 KiB
Python

"""Watcher service — orchestrates poll -> detect -> notify flow."""
from __future__ import annotations
import asyncio
import logging
from typing import Any
from sqlmodel import select
from sqlmodel.ext.asyncio.session import AsyncSession
from notify_bridge_core.models.events import ServiceEvent
from notify_bridge_core.notifications.dispatcher import NotificationDispatcher, TargetConfig
from notify_bridge_core.notifications.telegram.cache import TelegramFileCache
from notify_bridge_core.storage import JsonFileBackend
from ..database.engine import get_engine
from ..database.models import (
EventLog,
NotificationTracker,
NotificationTrackerState,
ServiceProvider,
)
from .dispatch_helpers import (
event_allowed_by_config,
get_app_timezone,
load_link_data,
)
_LOGGER = logging.getLogger(__name__)
# Module-level Telegram file caches — shared across dispatches for reuse
_url_cache: TelegramFileCache | None = None
_asset_cache: TelegramFileCache | None = None
_cache_lock = asyncio.Lock()
async def _load_cache_settings() -> tuple[int, int]:
"""Return (url_ttl_seconds, asset_max_entries) from app settings.
Defaults apply when the settings rows are missing. Reads in a short-lived
session to avoid coupling to the caller's transaction.
"""
from ..api.app_settings import get_setting
async with AsyncSession(get_engine()) as session:
ttl_hours_str = await get_setting(session, "telegram_cache_ttl_hours")
max_entries_str = await get_setting(session, "telegram_asset_cache_max_entries")
try:
ttl_hours = int(ttl_hours_str) if ttl_hours_str else 720
except ValueError:
ttl_hours = 720
try:
max_entries = int(max_entries_str) if max_entries_str else 5000
except ValueError:
max_entries = 5000
return ttl_hours * 3600, max_entries
async def _get_telegram_caches() -> tuple[TelegramFileCache | None, TelegramFileCache | None]:
"""Lazily initialize shared Telegram file caches using NOTIFY_BRIDGE_DATA_DIR.
The URL cache runs in TTL mode (URLs aren't content-addressable); the asset
cache runs in thumbhash mode so entries invalidate on visual change rather
than age. Both honor an LRU size cap from settings.
"""
global _url_cache, _asset_cache
if _url_cache is not None:
return _url_cache, _asset_cache
async with _cache_lock:
# Double-check after acquiring lock
if _url_cache is not None:
return _url_cache, _asset_cache
import os
from pathlib import Path
data_dir = os.environ.get("NOTIFY_BRIDGE_DATA_DIR")
if not data_dir:
return None, None
cache_dir = Path(data_dir) / "cache"
ttl_seconds, max_entries = await _load_cache_settings()
url_cache = TelegramFileCache(
JsonFileBackend(cache_dir / "telegram_url_cache.json"),
ttl_seconds=ttl_seconds,
max_entries=max_entries,
)
asset_cache = TelegramFileCache(
JsonFileBackend(cache_dir / "telegram_asset_cache.json"),
use_thumbhash=True,
max_entries=max_entries,
)
await url_cache.async_load()
await asset_cache.async_load()
_url_cache = url_cache
_asset_cache = asset_cache
_LOGGER.info(
"Initialized Telegram caches in %s (url ttl=%ds, max_entries=%d, asset thumbhash mode)",
cache_dir, ttl_seconds, max_entries,
)
return _url_cache, _asset_cache
async def reset_telegram_caches_in_memory() -> None:
"""Drop in-memory cache refs without touching files on disk.
Used after settings changes so the next dispatch re-initializes caches
with fresh parameters. Contrast with ``clear_telegram_caches`` which also
deletes cached file_ids.
"""
global _url_cache, _asset_cache
async with _cache_lock:
_url_cache = None
_asset_cache = None
_LOGGER.info("Reset Telegram cache refs in memory (files preserved)")
async def get_telegram_cache_stats() -> dict[str, Any]:
"""Return stats for the URL and asset Telegram caches.
Loads caches lazily if they haven't been touched by a dispatch yet.
Returns zero-counts when ``NOTIFY_BRIDGE_DATA_DIR`` is not configured.
"""
url_cache, asset_cache = await _get_telegram_caches()
empty = {"count": 0, "total_size_bytes": 0, "oldest": None, "newest": None}
return {
"url": url_cache.stats() if url_cache else empty,
"asset": asset_cache.stats() if asset_cache else empty,
}
async def clear_telegram_caches() -> dict[str, Any]:
"""Delete both Telegram file caches from disk and reset in-memory state.
Next dispatch re-initializes the caches via `_get_telegram_caches()`.
Returns a summary with the paths that were removed.
"""
global _url_cache, _asset_cache
async with _cache_lock:
removed: list[str] = []
for cache, label in ((_url_cache, "url"), (_asset_cache, "asset")):
if cache is not None:
await cache.async_remove()
removed.append(label)
# Also remove files from disk in case caches were never initialized
# in this process (data_dir set but dispatch never ran).
import os
from pathlib import Path
data_dir = os.environ.get("NOTIFY_BRIDGE_DATA_DIR")
if data_dir:
cache_dir = Path(data_dir) / "cache"
for name in ("telegram_url_cache.json", "telegram_asset_cache.json"):
path = cache_dir / name
if path.exists():
try:
path.unlink()
except OSError as e:
_LOGGER.warning("Failed to remove %s: %s", path, e)
_url_cache = None
_asset_cache = None
_LOGGER.info("Cleared Telegram file caches: %s", removed or "none in memory")
return {"cleared": True, "removed": removed}
async def check_tracker(tracker_id: int) -> dict[str, Any]:
"""Poll a tracker's provider for changes and dispatch notifications."""
engine = get_engine()
# Load all DB data eagerly before entering aiohttp context
async with AsyncSession(engine) as session:
tracker = await session.get(NotificationTracker, tracker_id)
if not tracker or not tracker.enabled:
return {"status": "skipped", "reason": "disabled or not found"}
provider = await session.get(ServiceProvider, tracker.provider_id)
if not provider:
return {"status": "error", "reason": "provider not found"}
# Load tracker state
result = await session.exec(
select(NotificationTrackerState).where(NotificationTrackerState.tracker_id == tracker_id)
)
states = result.all()
state_dict: dict[str, Any] = {}
for s in states:
state_dict[s.collection_id] = {
"name": s.collection_name or "",
"asset_ids": s.asset_ids,
"pending_asset_ids": s.pending_asset_ids,
"shared": bool(s.shared),
"meta_fingerprint": s.meta_fingerprint or {},
}
# Snapshot the original fingerprint per collection so we can skip the
# (expensive) asset_ids rewrite when nothing changed. For a 200k-asset
# album this avoids a ~7 MB JSON write to the state row every tick.
original_fingerprints: dict[str, dict[str, Any]] = {
cid: dict(cstate.get("meta_fingerprint") or {})
for cid, cstate in state_dict.items()
}
# Load tracker-target links
link_data = await load_link_data(session, tracker_id)
# Load app-level timezone for quiet-hours evaluation.
app_tz = await get_app_timezone(session)
# Snapshot the data we need
provider_type = provider.type
provider_config = dict(provider.config)
provider_name = provider.name
tracker_name = tracker.name
tracker_filters = dict(tracker.filters) if tracker.filters else {}
collection_ids = list(tracker.collection_ids or [])
# Now create aiohttp session and poll
events: list[ServiceEvent] = []
new_state: dict[str, Any] = {}
if provider_type == "immich":
from notify_bridge_core.providers.immich import ImmichServiceProvider
from .http_session import get_http_session
http_session = await get_http_session()
immich = ImmichServiceProvider(
http_session,
provider_config.get("url", ""),
provider_config.get("api_key", ""),
provider_config.get("external_domain"),
provider_name,
)
connected = await immich.connect()
if not connected:
return {"status": "error", "reason": "failed to connect to provider"}
events, new_state = await immich.poll(collection_ids, state_dict)
elif provider_type == "gitea":
# Gitea is webhook-based — events arrive via /api/webhooks/gitea endpoint.
# The scheduler still calls check_tracker but there's nothing to poll.
return {"status": "ok", "events_detected": 0, "collections_checked": 0}
elif provider_type == "planka":
# Planka is webhook-based — events arrive via /api/webhooks/planka endpoint.
return {"status": "ok", "events_detected": 0, "collections_checked": 0}
elif provider_type == "scheduler":
from notify_bridge_core.providers.scheduler import SchedulerServiceProvider
custom_vars = tracker_filters.get("custom_variables", {})
sched = SchedulerServiceProvider(
name=provider_name,
tracker_name=tracker_name,
custom_variables=custom_vars,
timezone_name=app_tz,
)
events, new_state = await sched.poll(collection_ids, state_dict)
elif provider_type == "nut":
from notify_bridge_core.providers.nut import NutServiceProvider
nut = NutServiceProvider(
host=provider_config.get("host", "localhost"),
port=provider_config.get("port", 3493),
username=provider_config.get("username"),
password=provider_config.get("password"),
name=provider_name,
)
events, new_state = await nut.poll(collection_ids, state_dict)
elif provider_type == "google_photos":
from notify_bridge_core.providers.google_photos import GooglePhotosServiceProvider
from .http_session import get_http_session
http_session = await get_http_session()
gp = GooglePhotosServiceProvider(
http_session,
provider_config.get("client_id", ""),
provider_config.get("client_secret", ""),
provider_config.get("refresh_token", ""),
provider_name,
)
connected = await gp.connect()
if not connected:
return {"status": "error", "reason": "failed to connect to Google Photos"}
events, new_state = await gp.poll(collection_ids, state_dict)
elif provider_type == "webhook":
# Webhook providers receive events via inbound HTTP; no polling needed.
return {"status": "ok", "events_detected": 0, "collections_checked": 0}
else:
return {"status": "error", "reason": f"unsupported provider type: {provider_type}"}
# Save updated state and log events
async with AsyncSession(engine) as session:
for cid, cstate in new_state.items():
existing = None
for s in states:
if s.collection_id == cid:
existing = s
break
current_fingerprint = dict(cstate.get("meta_fingerprint") or {})
prior_fingerprint = original_fingerprints.get(cid, {})
# Skip the DB update when the provider reported no meaningful
# change. ``existing`` is None on first-ever fetch for a
# collection — that path always writes so the row gets created.
if existing is not None and current_fingerprint == prior_fingerprint:
continue
if existing:
existing.asset_ids = cstate.get("asset_ids", [])
existing.pending_asset_ids = cstate.get("pending_asset_ids", [])
existing.collection_name = cstate.get("name", "")
existing.shared = cstate.get("shared", False)
existing.meta_fingerprint = current_fingerprint
session.add(existing)
else:
new_ts = NotificationTrackerState(
tracker_id=tracker_id,
collection_id=cid,
collection_name=cstate.get("name", ""),
shared=cstate.get("shared", False),
asset_ids=cstate.get("asset_ids", []),
pending_asset_ids=cstate.get("pending_asset_ids", []),
meta_fingerprint=current_fingerprint,
)
session.add(new_ts)
for event in events:
assets_count = event.added_count or event.removed_count or 0
details: dict[str, Any] = {
"added_count": event.added_count,
"removed_count": event.removed_count,
"provider_type": event.provider_type.value,
}
# Scheduler/periodic events carry the schedule context in ``extra``
# (cron expression, interval, timezone, fire count). Surface that
# in the event log so the dashboard and audit queries can show
# *why* the event fired, not just that it did.
if event.event_type.value == "scheduled_message":
sched_type = tracker_filters.get("schedule_type", "interval")
details["schedule_type"] = sched_type
if sched_type == "cron":
details["cron_expression"] = tracker_filters.get("cron_expression", "")
else:
details["interval_seconds"] = tracker.scan_interval
details["timezone"] = app_tz
fire_count = event.extra.get("fire_count") if event.extra else None
if fire_count is not None:
details["fire_count"] = fire_count
log = EventLog(
user_id=tracker.user_id,
tracker_id=tracker_id,
tracker_name=tracker.name,
provider_id=provider.id,
provider_name=provider_name,
event_type=event.event_type.value,
collection_id=event.collection_id,
collection_name=event.collection_name,
assets_count=assets_count,
details=details,
)
session.add(log)
await session.commit()
# Dispatch notifications — per-link config resolution
# Filter out empty events (e.g. assets_added with 0 added)
events = [
e for e in events
if not (e.event_type.value == "assets_added" and e.added_count == 0)
and not (e.event_type.value == "assets_removed" and e.removed_count == 0)
]
_LOGGER.info(
"Tracker %d: %d events after filter, %d links",
tracker_id, len(events), len(link_data),
)
if events and link_data:
url_cache, asset_cache = await _get_telegram_caches()
from .http_session import get_http_session
shared_session = await get_http_session()
dispatcher = NotificationDispatcher(
url_cache=url_cache,
asset_cache=asset_cache,
session=shared_session,
)
for event in events:
_LOGGER.info(
"Dispatching event %s for %s (added=%d removed=%d)",
event.event_type.value, event.collection_name,
event.added_count, event.removed_count,
)
target_configs = []
for ld in link_data:
# Apply per-link event filtering from tracking config
tc = ld["tracking_config"]
if tc and not event_allowed_by_config(event, tc, app_tz):
_LOGGER.info(" Skipped by tracking config filter")
continue
tmpl = ld["template_config"]
target_configs.append(TargetConfig(
type=ld["target_type"],
config=ld["target_config"],
template_slots=ld["template_slots"],
date_format=tmpl.date_format if tmpl else "%d.%m.%Y, %H:%M UTC",
date_only_format=tmpl.date_only_format if tmpl and tmpl.date_only_format else "%d.%m.%Y",
provider_api_key=provider_config.get("api_key"),
provider_internal_url=provider_config.get("url", ""),
provider_external_url=provider_config.get("external_domain", ""),
receivers=ld["receivers"],
))
if target_configs:
results = await dispatcher.dispatch(event, target_configs)
for r in results:
if r.get("success"):
_LOGGER.info(" Notification sent successfully")
else:
_LOGGER.error(" Notification failed: %s", r.get("error", "unknown"))
return {
"status": "ok",
"events_detected": len(events),
"collections_checked": len(collection_ids),
}