Files
notify-bridge/.gitea/workflows/release.yml
T
alexei.dolgolyov 10d30fc956 feat: production readiness — security, perf, bug fixes, bridge self-monitoring
Comprehensive multi-area pass driven by a parallel 8-agent production
review. Frontend, backend, database, security, performance, operational,
plus a new self-monitoring feature.

## Critical fixes
- Planka webhook: reads bounded raw body (was NameError on every call)
- HA quiet hours: ha_state_changed/automation_triggered/service_called/
  event_fired added to deferrable set (were silently dropped)
- DNS-rebinding SSRF: PinnedResolver wired into shared aiohttp session
- Telegram inbound webhook: secret now mandatory (401 without)
- Generic webhook: auth_mode="none" requires explicit
  acknowledge_unauthenticated=true; per-IP rate limit 60/min
- svelte-check: 5 null-narrowing errors in EventDetailModal fixed
- Provider hardcoding: Immich-only block extracted to descriptor
  featureDiscoveryHint
- command_sync: snapshot+expunge bot before exiting AsyncSession

## Bug fixes
- notifier asyncio.gather(return_exceptions=True) — one bad chat no longer
  cancels peer sends
- NotificationDispatcher hoisted out of per-tracker loop
- Provider credential resolution unified across all 5 dispatch sites
- HA asyncio.shield now drains inner task on cancellation
- Provider construction switched from if/elif ladder to factory registry
- NUT first poll seeds silently (no spurious ups_on_battery)
- Quiet-hours gate: event-type-disabled now wins over deferral
- APScheduler drain job ID resolution upgraded to seconds
- HA on_status_change wired through to EventLog
- Webhook payload rollback failures now logged (not swallowed)
- Batched receivers/chats/bots in load_link_data (was per-target N+1)
- flag_modified on JSON column reassignments in deferred_dispatch

## Database
- UNIQUE indexes on service_provider.webhook_token,
  telegram_bot.webhook_path_id, partial UNIQUE on telegram_bot.bot_id,
  telegram_chat(bot_id, chat_id), notification_tracker_target unique link,
  partial UNIQUE on bridge_self provider per user
- Composite ix_event_log_user_event_type_created index
- save_chat_from_webhook switched to ON CONFLICT DO UPDATE
- ondelete=CASCADE on user-id FKs (model annotation; app-side cascade
  delete added for existing data)
- delete_notification_tracker converted from N+1 to bulk DELETE/UPDATE
- Module-level asyncio.Lock replaced with lazy _get_lock() pattern
- VACUUM INTO snapshot now PRAGMA integrity_check verified

## Performance
- Jinja2 template compilation LRU cached (lru_cache maxsize=512)
- Per-locale render cache in NotificationDispatcher (skips re-rendering
  identical content for receivers sharing a locale)
- Tracker list cached per provider_id with 5s TTL + explicit invalidation
  on tracker CRUD (relieves HA chat-bus rate query pressure)
- Nav-counts collapsed from 16 round-trips to single UNION ALL
- HA event_log: skip persisting empty assets_added/removed events

## Security hardening
- Mass-assignment guard on Action create/update; cron sub-minute reject
- Backup JSON depth/node-count cap (depth ≤ 10, nodes ≤ 100k)
- _sanitize_config extended to all JSON-typed fields on backup import
- Telegram _safe_get walks redirects manually with SSRF revalidation
- Bcrypt 72-byte password length cap with clear 422
- Webhook payload body redaction; sensitive substring set extended with
  oauth/client_secret/webhook_secret/csrf in both header filter and
  template extras filter

## Frontend
- 76 catch (err: any) sites converted to errMsg(err) helper
- globalProviderFilter: pure getter; reconciliation moved to one-time
  $effect in +layout
- Provider-filter binding: removed paired $effects + _syncingFilter flag,
  now one-way derived
- entity-cache: separate _refreshing flag for background re-fetches
- api.ts 401 handling: AuthRedirectError class + dedup _redirecting flag,
  goto() instead of window.location.href
- a11y: aria-expanded on mobile More, role=switch + aria-checked on
  Telegram bot toggles

## Tests & operations
- CI pytest gate added to .gitea/workflows/build.yml + release.yml
  (wheel-built install to dodge editable-install slowness)
- /api/ready upgraded to deep healthcheck (db SELECT 1, scheduler.running,
  HA supervisor presence) returning {ready, checks, errors, version}
- /api/metrics endpoint with prometheus_client (deferred_pending,
  event_log_total, dispatch_duration, poll_failures, send_failures)
- New OPERATIONS.md covering deploy, healthchecks, metrics, backup/restore
  procedures, log handling, common scenarios, upgrade flow
- New tests: test_bridge_self (11), test_gitea_parser (9),
  test_planka_parser (6), test_immich_change_detector (6),
  test_backup_roundtrip (1)

## New feature: bridge self-monitoring
- New bridge_self provider type — internal sink for bridge health events
- Three event types: bridge_self_poll_failures (consecutive tracker poll
  failures), bridge_self_deferred_backlog (pending count crosses
  threshold), bridge_self_target_failures (consecutive 5xx/network
  failures per target)
- Per-user thresholds (defaults: 3 / 100 / 5) configurable via the
  provider config form
- Auto-seeded on user create + /setup + boot backfill for existing users
- Anti-spam: counters reset after emission; backlog uses transition latch
- Self-loop guard: bridge_self failures don't count toward target-failure
  thresholds (logged only) — wire to your own Telegram/Email/Matrix to
  get notified when polls/dispatches/sends fail
- 6 default templates (3 events × 2 locales), tracking config columns
  with backfill migration, frontend descriptor (excluded from "create
  provider" wizard since auto-managed)

Operator-visible behavior changes (call out in release notes):
- NOTIFY_BRIDGE_TELEGRAM_WEBHOOK_SECRET now REQUIRED for webhook mode
- Existing webhook providers with auth_mode="none" need explicit opt-in
- Generic webhook endpoint rate-limited 60/min per source IP
- HA disconnect/reconnect writes ha_status_* EventLog rows
- Every user gets a bridge_self provider — wire it to a target to
  receive failure alerts

Pre-existing test failures (test_ssrf, test_release_provider) on
Python 3.13 are unrelated; CI runs on 3.12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 02:16:49 +03:00

186 lines
6.6 KiB
YAML

name: Release
on:
push:
tags:
- 'v*'
env:
REGISTRY: git.dolgolyov-family.by
IMAGE_NAME: alexei.dolgolyov/notify-bridge
jobs:
test-backend:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
# Same wheel-first strategy as build.yml — editable install is too slow
# on the hosted runner.
- name: Build wheels
run: |
python -m pip install --upgrade pip build
mkdir -p /tmp/wheels
pip wheel --no-deps -w /tmp/wheels packages/core packages/server
- name: Install backend + test deps
run: |
pip install /tmp/wheels/*.whl pytest pytest-asyncio httpx aioresponses prometheus_client
- name: Run pytest
env:
NOTIFY_BRIDGE_DATA_DIR: /tmp/nb-test-data
NOTIFY_BRIDGE_SECRET_KEY: ci-secret-key-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
NOTIFY_BRIDGE_DEBUG: "false"
NOTIFY_BRIDGE_CORS_ALLOWED_ORIGINS: "http://localhost:8420"
run: |
cd packages/server
pytest tests --tb=short
release:
needs: [test-backend]
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Extract version from tag
id: version
run: |
TAG="${{ gitea.ref_name }}"
VERSION="${TAG#v}"
echo "tag=$TAG" >> "$GITHUB_OUTPUT"
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
IS_PRE="false"
if echo "$TAG" | grep -qE '(alpha|beta|rc)'; then
IS_PRE="true"
fi
echo "is_pre=$IS_PRE" >> "$GITHUB_OUTPUT"
- name: Login to Gitea Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ gitea.actor }}
password: ${{ secrets.DEPLOY_TOKEN }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.version.outputs.tag }}
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.version.outputs.version }}
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ gitea.sha }}
${{ steps.version.outputs.is_pre == 'false' && format('{0}/{1}:latest', env.REGISTRY, env.IMAGE_NAME) || '' }}
cache-from: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache
cache-to: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:buildcache,mode=max
- name: Trigger redeploy webhook
if: steps.version.outputs.is_pre == 'false'
continue-on-error: true
run: |
if [ -n "${{ secrets.DOCKER_REDEPLOY_WEBHOOK_URL }}" ]; then
echo "Triggering redeploy webhook..."
curl -sf -X POST "${{ secrets.DOCKER_REDEPLOY_WEBHOOK_URL }}" \
--max-time 30 || echo "::warning::Redeploy webhook failed"
else
echo "DOCKER_REDEPLOY_WEBHOOK_URL not set — skipping auto-deploy"
fi
- name: Generate changelog
id: changelog
run: |
PREV_TAG=$(git tag --sort=-v:refname | head -2 | tail -1)
if [ -z "$PREV_TAG" ] || [ "$PREV_TAG" = "${{ gitea.ref_name }}" ]; then
CHANGELOG=$(git log --oneline --no-decorate -n 20)
else
CHANGELOG=$(git log --oneline --no-decorate ${PREV_TAG}..HEAD)
fi
echo "$CHANGELOG" > /tmp/changelog.txt
- name: Create Gitea Release
env:
DEPLOY_TOKEN: ${{ secrets.DEPLOY_TOKEN }}
run: |
TAG="${{ steps.version.outputs.tag }}"
VERSION="${{ steps.version.outputs.version }}"
IS_PRE="${{ steps.version.outputs.is_pre }}"
BASE_URL="https://${{ env.REGISTRY }}/api/v1/repos/${{ env.IMAGE_NAME }}"
if [ -f RELEASE_NOTES.md ]; then
export RELEASE_NOTES=$(cat RELEASE_NOTES.md)
echo "Found RELEASE_NOTES.md"
else
export RELEASE_NOTES=""
echo "No RELEASE_NOTES.md found"
fi
# Build release payload (avoids shell escaping & CLI length limits)
export TAG VERSION IS_PRE
python3 <<'PY'
import json, os
release_notes = os.environ.get('RELEASE_NOTES', '')
changelog = open('/tmp/changelog.txt').read().strip()
sections = []
if release_notes.strip():
sections.append(release_notes.strip())
if changelog:
sections.append('## Changelog\n\n' + changelog)
payload = {
'tag_name': os.environ['TAG'],
'name': f"Notify Bridge {os.environ['VERSION']}",
'body': '\n\n'.join(sections),
'draft': False,
'prerelease': os.environ['IS_PRE'] == 'true',
}
with open('/tmp/release-payload.json', 'w') as f:
json.dump(payload, f)
PY
HTTP=$(curl -s -o /tmp/release-resp.json -w "%{http_code}" \
-X POST "$BASE_URL/releases" \
-H "Authorization: token $DEPLOY_TOKEN" \
-H "Content-Type: application/json" \
--data-binary @/tmp/release-payload.json)
echo "POST /releases → HTTP $HTTP"
echo "--- response ---"
head -c 2000 /tmp/release-resp.json; echo
echo "----------------"
if [ "$HTTP" = "201" ]; then
RELEASE_ID=$(python3 -c "import json; print(json.load(open('/tmp/release-resp.json'))['id'])")
echo "Created release $RELEASE_ID for $TAG"
elif [ "$HTTP" = "409" ] || grep -q "already exists" /tmp/release-resp.json; then
echo "::warning::Release already exists for tag $TAG — reusing"
HTTP2=$(curl -s -o /tmp/release-resp.json -w "%{http_code}" \
"$BASE_URL/releases/tags/$TAG" \
-H "Authorization: token $DEPLOY_TOKEN")
if [ "$HTTP2" != "200" ]; then
echo "::error::Failed to fetch existing release (HTTP $HTTP2)"
cat /tmp/release-resp.json
exit 1
fi
RELEASE_ID=$(python3 -c "import json; print(json.load(open('/tmp/release-resp.json'))['id'])")
echo "Reused release $RELEASE_ID for $TAG"
else
echo "::error::Failed to create release for $TAG (HTTP $HTTP)"
exit 1
fi