feat(logging): production-grade logging with context vars, secret masking, and runtime level control
Boot-time logging was a three-line basicConfig stub with no timestamps, no correlation, and silent drops at every layer of the Telegram send path — a /random command that delivered text but no media left zero evidence in the log. This replaces the setup and closes every silent drop encountered end-to-end. New infrastructure: - notify_bridge_core.log_context: request_id/command/chat_id/bot_id/dispatch_id ContextVars with a bind_log_context() context manager so deep call sites (TelegramClient, NotificationDispatcher) inherit the correlation tag without threading args through. - notify_bridge_server.logging_setup: dictConfig-based setup with a LogRecordFactory that tags every record, a SecretMaskingFilter that redacts /botN:TOKEN plus Authorization/x-api-key/password/secret in messages AND tracebacks, a JSON formatter for aggregators, text formatter with grep-friendly [req=... cmd=... bot=... chat=... disp=...] prefix, and default dampening for sqlalchemy/aiohttp/apscheduler/urllib3/PIL. Runtime control: - NOTIFY_BRIDGE_LOG_LEVEL / _FORMAT / _LEVELS env vars (boot). - DB-backed log_level / log_format / log_levels AppSettings, applied on boot after migrations and live via apply_log_levels() when edited in the settings UI (format still requires restart, logs a WARN). - Frontend settings page gains a Logging card (level dropdown, format dropdown, per-module overrides); en/ru i18n keys added. Call-site fixes (/random media-group blind spot and adjacent): - TelegramClient._fetch_asset: every silent drop now WARN-logs with reason (missing url, HTTP non-200, size/dimension limits, ClientError). - TelegramClient._send_media_group: WARN on "chunk had N items but 0 usable", ERROR on sendMediaGroup non-ok/transport with full context; returns success=False + "no_items_delivered" instead of success=True with an empty message_ids list so callers can distinguish. - TelegramClient.send_message / _upload_media / _send_from_cache: ERROR on non-ok + transport failures with status/code/desc; DEBUG for cache-hit fallbacks. - NotificationDispatcher.dispatch: generates a dispatch_id, binds it, logs start/finish with failure count, uses exc_info for target failures. - commands/handler: missing/failed templates -> ERROR + exc_info; send_reply and send_media_group errors upgraded WARNING -> ERROR with chat/error_code context; rate-limit and truncation cases logged with full context. - commands/webhook and services/telegram_poller: bind_log_context(request_id =tg:<update_id>, command, chat_id, bot_id), INFO on receive/dispatch/ completion with duration, exc_info on raise, INFO when commands disabled. - commands/immich: INFO when album scope is empty; WARN per asset dropped from media payload and a summary WARN when "N assets in, 0 out".
This commit is contained in:
@@ -9,13 +9,18 @@ from slowapi import _rate_limit_exceeded_handler
|
||||
from slowapi.errors import RateLimitExceeded
|
||||
from slowapi.middleware import SlowAPIMiddleware
|
||||
|
||||
# Ensure app-level loggers are visible
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
from .config import settings as _log_cfg
|
||||
_log_level = logging.DEBUG if _log_cfg.debug else logging.INFO
|
||||
logging.getLogger("notify_bridge_server").setLevel(_log_level)
|
||||
logging.getLogger("notify_bridge_core").setLevel(_log_level)
|
||||
from .logging_setup import setup_logging
|
||||
|
||||
# Boot logging from env-based config. DB-backed AppSetting rows (``log_level`` /
|
||||
# ``log_levels`` / ``log_format``) override this after migrations — see the
|
||||
# lifespan block below.
|
||||
setup_logging(
|
||||
level="DEBUG" if _log_cfg.debug else _log_cfg.log_level,
|
||||
fmt=_log_cfg.log_format,
|
||||
per_module_levels=_log_cfg.log_levels,
|
||||
)
|
||||
_LOGGER = logging.getLogger(__name__)
|
||||
|
||||
from .database.engine import init_db
|
||||
from .database.models import * # noqa: F401,F403 — ensure all models registered
|
||||
@@ -66,6 +71,24 @@ async def lifespan(app: FastAPI):
|
||||
await migrate_user_token_version(engine)
|
||||
from .database.seeds import seed_all
|
||||
await seed_all()
|
||||
# Apply DB-backed logging settings (override env-based boot config).
|
||||
# log_format still needs a restart — changing it means swapping the
|
||||
# handler formatter entirely.
|
||||
try:
|
||||
from sqlmodel.ext.asyncio.session import AsyncSession as _AS_log
|
||||
from .api.app_settings import get_setting as _get_log_setting
|
||||
from .logging_setup import apply_log_levels
|
||||
async with _AS_log(engine) as _log_session:
|
||||
db_level = await _get_log_setting(_log_session, "log_level")
|
||||
db_levels = await _get_log_setting(_log_session, "log_levels")
|
||||
apply_log_levels(level=db_level or None, per_module_levels=db_levels)
|
||||
_LOGGER.info(
|
||||
"Logging initialized: level=%s overrides=%r format=%s",
|
||||
db_level or _log_cfg.log_level, db_levels or _log_cfg.log_levels,
|
||||
_log_cfg.log_format,
|
||||
)
|
||||
except Exception: # pragma: no cover — never let logging setup abort boot
|
||||
_LOGGER.exception("Failed to apply DB-backed log settings; keeping env-based levels")
|
||||
# Apply any pending restore staged via /api/backup/prepare-restore
|
||||
from .services.pending_restore import apply_pending_restore_if_any
|
||||
await apply_pending_restore_if_any()
|
||||
|
||||
Reference in New Issue
Block a user