f1cfb61d13
Security hardening (CRITICAL/HIGH from production-readiness audit):
- Require strong JWT_SECRET + separate INTEGRATION_ENCRYPTION_KEY at boot;
refuse placeholder defaults. Integration key now derived via HKDF.
- SSRF guard (src/lib/server/utils/safeFetch.ts): DNS-resolves and rejects
RFC1918/loopback/link-local/IPv4-mapped IPv6/decimal-IP/cloud-metadata.
Manual redirect handling re-validates each 3xx Location hop. Applied to
healthcheck, RSS, calendar, metric, system-stats, camera, notifications,
discovery, apps/preview, and all integration clients.
- API tokens, session refresh tokens, invite tokens, password-reset tokens
switched from bcrypt to sha256 with @unique indexed lookup (O(1) instead
of O(N) bcrypt-compares; eliminates a trivial DoS).
- Refresh-token reuse detection via Session.previousTokenHash.
- Permission checks on App PATCH/DELETE and Widget/Section endpoints.
- /api/integrations/alerts now requires auth.
- SVG uploads sanitized through DOMPurify (svg profile, scheme allow-list).
- Custom CSS sanitizer + selector scoping (decodes CSS unicode escapes
before pattern match, drops forbidden at-rules incl. @import without
whitespace, strips dangerous url() args). Scoped to .custom-css-scope.
- Backup restore validates SQLite magic header, takes a safety snapshot,
uses atomic rename, re-applies pragmas.
- SQLite WAL + busy_timeout + foreign_keys + synchronous=NORMAL at startup.
- Healthcheck scheduler was dead code; wired in hooks.server.ts with
HMR-safe singleton, concurrency cap, overlap prevention, retention jobs
for AppClick/Notification/AuditLog. Composite indexes added on hot paths.
- Security headers (CSP, HSTS-on-https, X-Frame-Options, Permissions-Policy)
emitted on every response.
- Account-enumeration mitigation on login (dummy bcrypt on no-user/oauth
branches) + rate limiting on login/register/onboarding/refresh/invite/
password-reset.
- OAuth callback sanitizes IdP error_description before echoing.
New features:
- Custom +error.svelte pages (root + boards + admin) via shared
ErrorState component. Inverted hierarchy (status as label, title as hero).
- /forgot-password + /reset-password + admin-mediated /admin/password-resets
page. SHA256 tokens, 24h TTL, all sessions revoked on apply.
- /invite page for manual invite-token redemption.
- /api/metrics Prometheus exposition with optional METRICS_TOKEN bearer
auth. Counters for login/healthcheck/notification/integration; gauges
for users/boards/apps + per-status app counts.
- Webhook HMAC-SHA256 signing for HTTP notification channels (optional
shared secret + configurable signature header, default X-Signature-256).
- PATCH /api/users/me/password for self-service password change.
- Persistent uploads at /app/data/uploads with served-from-volume handler
at /uploads/[...path]. SVGs served with CSP: sandbox.
- /api/health does a DB ping; returns 503 on disconnect.
- Public /status filtered to guest-accessible-board apps when unauthenticated.
- Audit log coverage: LOGIN_SUCCESS/FAILED, LOGOUT, OAUTH_LOGIN,
OAUTH_USER_PROVISIONED, SESSION_REVOKED, API_TOKEN_*, INVITE_*,
APP_UPDATED, PASSWORD_CHANGED, PASSWORD_RESET_*.
Performance:
- Board page: removed double findAll() over-fetch; include links + appTags
in board query; widgets lazy-loaded via dynamic imports (marked,
DOMPurify, hls.js, integration renderers).
- uptimeService.getAllAppsUptime: single batched query instead of N+1.
- 30s in-memory user-locals cache; invalidated on user mutation.
- pruneOldStatuses: single window-function DELETE instead of N+1.
Code quality:
- Typed error classes (NotFoundError, PermissionError, RateLimitError,
IntegrationError) with toHttpError mapper.
- Locals.user shape exposes avatarUrl and narrows role via guard.
- App input types derived from Zod schemas via z.infer.
- 274 tests passing (up from 212); 62 new tests covering SSRF guard,
CSS sanitizer, SVG sanitizer, rate limiter.
CI / Docker / config:
- Test workflow adds build, docker-build, audit jobs. Release workflow
uses buildx multi-arch (amd64+arm64) with provenance + SBOM.
- Dockerfile uses tini, multi-stage prune, persistent uploads dir, single
prisma migrate deploy (no destructive db push fallback).
- docker-compose: JWT_SECRET + INTEGRATION_ENCRYPTION_KEY required at
startup, log rotation, resource limits.
- README documents breaking-change upgrade path.
Bug fixes from UI/UX review:
- ~55 missing i18n keys added to en/ru (auth flows, error pages, admin
nav, register invite banner, settings.card_style).
- Hardcoded English on login replaced with $t('auth.remember_me').
- Admin nav uses i18n keys; mobile horizontal-scroll layout.
- Page <title> tags standardized.
- Password-resets: separated error/info/success surfaces, ConfirmDialog
replaces window.confirm.
- Auth pages have matching lucide icon badges.
- Webhook secret has eye toggle and monospace input.
- text-green-500 → text-emerald-500 to match codebase convention.
Pre-existing CI lint failures cleaned up (31 errors → 0): each-key
attributes added, unused-svelte-ignore comments removed, two any casts
typed, dead skeleton components removed, /boards/[id]/edit redirect to
inline edit mode.
Tests: 274 / 274 passing
Type check: 0 errors / 0 warnings
Build: green
170 lines
5.7 KiB
TypeScript
170 lines
5.7 KiB
TypeScript
import { prisma } from '../prisma.js';
|
|
import { AppStatusValue } from '$lib/utils/constants.js';
|
|
|
|
/**
|
|
* Tiny Prometheus-text metrics gatherer. Avoids the prom-client dependency
|
|
* (~150KB + extra runtime memory) by emitting the exposition format directly.
|
|
* If we later want histograms or counters with labels at high cardinality,
|
|
* swap this out for prom-client.
|
|
*/
|
|
|
|
interface CounterSnapshot {
|
|
readonly name: string;
|
|
readonly help: string;
|
|
readonly value: number;
|
|
readonly labels?: Record<string, string>;
|
|
}
|
|
|
|
function escapeLabel(value: string): string {
|
|
return value.replace(/\\/g, '\\\\').replace(/"/g, '\\"').replace(/\n/g, '\\n');
|
|
}
|
|
|
|
function renderLabels(labels?: Record<string, string>): string {
|
|
if (!labels) return '';
|
|
const parts = Object.entries(labels).map(([k, v]) => `${k}="${escapeLabel(v)}"`);
|
|
return parts.length ? `{${parts.join(',')}}` : '';
|
|
}
|
|
|
|
/**
|
|
* In-memory counter / gauge state. Process-local — Prometheus is expected to
|
|
* scrape a single launcher instance (the app is SQLite-bound to one process
|
|
* anyway). Reset on restart, like most lightweight setups.
|
|
*/
|
|
class MetricRegistry {
|
|
private counters = new Map<string, number>();
|
|
private gauges = new Map<string, number>();
|
|
|
|
incCounter(name: string, by = 1): void {
|
|
this.counters.set(name, (this.counters.get(name) ?? 0) + by);
|
|
}
|
|
|
|
setGauge(name: string, value: number): void {
|
|
this.gauges.set(name, value);
|
|
}
|
|
|
|
getCounter(name: string): number {
|
|
return this.counters.get(name) ?? 0;
|
|
}
|
|
|
|
snapshot(): { counters: Map<string, number>; gauges: Map<string, number> } {
|
|
return { counters: new Map(this.counters), gauges: new Map(this.gauges) };
|
|
}
|
|
}
|
|
|
|
export const metricRegistry = new MetricRegistry();
|
|
|
|
// Counter names — keep them ASCII identifiers (Prometheus naming rules).
|
|
export const Counters = {
|
|
HEALTHCHECK_TOTAL: 'wal_healthcheck_total',
|
|
HEALTHCHECK_FAILED: 'wal_healthcheck_failed_total',
|
|
LOGIN_SUCCESS: 'wal_login_success_total',
|
|
LOGIN_FAILED: 'wal_login_failed_total',
|
|
NOTIFICATION_SENT: 'wal_notification_sent_total',
|
|
NOTIFICATION_FAILED: 'wal_notification_failed_total',
|
|
INTEGRATION_FETCH_TOTAL: 'wal_integration_fetch_total',
|
|
INTEGRATION_FETCH_FAILED: 'wal_integration_fetch_failed_total'
|
|
} as const;
|
|
|
|
/**
|
|
* Build the full exposition. Combines:
|
|
* - process-local counters (login attempts, healthcheck ticks, etc.)
|
|
* - DB-backed gauges (current online/offline app count, user count, etc.)
|
|
*/
|
|
export async function renderMetrics(): Promise<string> {
|
|
const lines: string[] = [];
|
|
|
|
// --- Static help/type lines + counter snapshots ---
|
|
const COUNTER_HELP: Record<string, string> = {
|
|
[Counters.HEALTHCHECK_TOTAL]: 'Total healthcheck ticks executed since process start',
|
|
[Counters.HEALTHCHECK_FAILED]: 'Healthcheck ticks where any app returned offline',
|
|
[Counters.LOGIN_SUCCESS]: 'Successful local logins since process start',
|
|
[Counters.LOGIN_FAILED]: 'Failed local logins since process start',
|
|
[Counters.NOTIFICATION_SENT]: 'Notification dispatch attempts',
|
|
[Counters.NOTIFICATION_FAILED]: 'Notification dispatch failures',
|
|
[Counters.INTEGRATION_FETCH_TOTAL]: 'Integration fetch attempts',
|
|
[Counters.INTEGRATION_FETCH_FAILED]: 'Integration fetch failures'
|
|
};
|
|
|
|
const { counters } = metricRegistry.snapshot();
|
|
for (const name of Object.values(Counters)) {
|
|
const value = counters.get(name) ?? 0;
|
|
lines.push(`# HELP ${name} ${COUNTER_HELP[name]}`);
|
|
lines.push(`# TYPE ${name} counter`);
|
|
lines.push(`${name} ${value}`);
|
|
}
|
|
|
|
// --- DB-backed gauges ---
|
|
const gauges: CounterSnapshot[] = [];
|
|
|
|
try {
|
|
const [totalApps, healthchecked, totalUsers, totalBoards] = await Promise.all([
|
|
prisma.app.count(),
|
|
prisma.app.count({ where: { healthcheckEnabled: true } }),
|
|
prisma.user.count(),
|
|
prisma.board.count()
|
|
]);
|
|
|
|
gauges.push(
|
|
{ name: 'wal_apps_total', help: 'Total apps registered', value: totalApps },
|
|
{
|
|
name: 'wal_apps_healthchecked_total',
|
|
help: 'Apps with healthcheck enabled',
|
|
value: healthchecked
|
|
},
|
|
{ name: 'wal_users_total', help: 'Total user accounts', value: totalUsers },
|
|
{ name: 'wal_boards_total', help: 'Total boards', value: totalBoards }
|
|
);
|
|
|
|
// Latest status per app — broken down by status value.
|
|
// Subquery: for each app, take the most recent AppStatus row.
|
|
const latest = await prisma.$queryRaw<{ status: string; count: number }[]>`
|
|
SELECT status, COUNT(*) AS count
|
|
FROM (
|
|
SELECT appId, status, ROW_NUMBER() OVER (PARTITION BY appId ORDER BY checkedAt DESC) AS rn
|
|
FROM AppStatus
|
|
)
|
|
WHERE rn = 1
|
|
GROUP BY status
|
|
`;
|
|
for (const status of Object.values(AppStatusValue)) {
|
|
const row = latest.find((r) => r.status === status);
|
|
gauges.push({
|
|
name: 'wal_app_status',
|
|
help: 'Current count of apps by latest status',
|
|
value: Number(row?.count ?? 0),
|
|
labels: { status }
|
|
});
|
|
}
|
|
} catch (err) {
|
|
// DB issue — emit an "up" gauge of 0 so scrapers can alert on it.
|
|
// eslint-disable-next-line no-console
|
|
console.warn('[metrics] failed to gather DB gauges:', err);
|
|
lines.push(`# HELP wal_db_up 1 if the metrics endpoint could read from the DB`);
|
|
lines.push(`# TYPE wal_db_up gauge`);
|
|
lines.push(`wal_db_up 0`);
|
|
lines.push('');
|
|
return lines.join('\n');
|
|
}
|
|
|
|
// Group same-name gauges so we emit HELP/TYPE once.
|
|
const grouped = new Map<string, CounterSnapshot[]>();
|
|
for (const g of gauges) {
|
|
const arr = grouped.get(g.name);
|
|
if (arr) arr.push(g);
|
|
else grouped.set(g.name, [g]);
|
|
}
|
|
for (const [name, samples] of grouped) {
|
|
lines.push(`# HELP ${name} ${samples[0].help}`);
|
|
lines.push(`# TYPE ${name} gauge`);
|
|
for (const s of samples) {
|
|
lines.push(`${name}${renderLabels(s.labels)} ${s.value}`);
|
|
}
|
|
}
|
|
|
|
lines.push(`# HELP wal_db_up 1 if the metrics endpoint could read from the DB`);
|
|
lines.push(`# TYPE wal_db_up gauge`);
|
|
lines.push(`wal_db_up 1`);
|
|
lines.push('');
|
|
return lines.join('\n');
|
|
}
|