docs(vex): document history --depth vs index --history-depth

Clarify the two git-history depth knobs and how to keep indexing fast on repos with thousands of commits: - vex history --depth N — query-time walk, per file (caps latency) - vex index --history-depth N — one-time build, global newest-N (like git log -nN) Added to the setup example, the .vex.toml history comment, and the command reference in vex.md.
2026-06-11 01:18:29 +03:00
parent 4c3b0188d8
commit 5a17fe960d
1 changed files with 8 additions and 2 deletions
@@ -95,6 +95,8 @@ vex index --path .               # plain index, sub-second, no downloads
 vex index --path . --semantic    # downloads ~86 MB ONNX model on first run
 # optional: bake a git-history sidecar so `vex history` is ~ms instead of shelling to git log:
 vex index --path . --semantic --history
+# thousands-of-commits repo? cap the one-time walk so the build stays fast (--history-depth implies --history):
+vex index --path . --semantic --history-depth 500
 ```

 Recommended `.vex.toml` for serious use:
@@ -114,7 +116,9 @@ device = "cuda"            # essential for jina-code; "auto" (default) falls bac
 # there is intentionally no .vex.toml key to enable it. To make it FST-fast (~ms)
 # on long-lived repos, build the opt-in indexed sidecar with `vex index --history`
 # (adds ~5-30s to a cold build and ~10% to index size); `vex status` then shows a
-# "History:" line. Re-run after a `vex self-update` so newer history extractors populate.
+# "History:" line. On a repo with thousands of commits, cap the build with
+# `vex index --history-depth 500` (global newest-N, like `git log -n500`) so a
+# full-history walk doesn't blow up index time. Re-run after a `vex self-update`.
 ```

 ## Register the MCP server
@@ -147,7 +151,9 @@ device = "cuda"            # essential for jina-code; "auto" (default) falls bac

 - **Call-graph & impact (call-graph index, ~ms queries):** `callers` / `callees` (direct), `paths --from A --to B` (all caller chains between two symbols), `reachable --target T` (everything that transitively reaches `T`), `bundle --mode {symbol,pr-impact,project}` (one call replaces the `show → callers → callees → similar` round trip; `pr-impact` bundles changed symbols + transitive callers + reachable tests for review), `diff --base <rev>` (symbol-level git-rev diff).

- **Git history (v1.16):** `history <Symbol>` walks git log and returns every historical version of a symbol — query-time, no index needed, works even on un-indexed repos. Flags: `--diff` (unified diffs between consecutive versions), `--since` / `--until` (date), `--author`, `--kind`, `--branch`, `--depth`, `--limit`. Opt into an indexed sidecar with `vex index --history` for ~ms lookups.
+- **Git history (v1.16):** `history <Symbol>` walks git log and returns every historical version of a symbol — query-time, no index needed, works even on un-indexed repos. Flags: `--diff` (unified diffs between consecutive versions), `--since` / `--until` (date), `--author`, `--kind`, `--branch`, `--depth <N>` (max commits **per file** — bump down to cap query latency), `--limit <N>` (cap total results). Opt into an indexed sidecar with `vex index --history` for ~ms lookups.
+
+  **Taming history cost on large repos:** the two depth knobs are different. `vex history --depth <N>` limits the *query-time* walk **per file**. `vex index --history-depth <N>` caps the *one-time index build* at the **global** newest-N commits (mirrors `git log -nN`, implies `--history`); both are unbounded by default. On a repo with thousands of commits, build with e.g. `vex index --history-depth 500` so a full-history walk doesn't blow up index time and size.

 - **Search scope & metadata filters:** `vex search` and `vex usages` accept `--kind fn,method`, `--include` / `--exclude '<glob>'` (repeatable), `--since <rev>` / `--since-branched` / `--changed-only` (restrict to git-changed files), plus on `search`: `--visibility pub|priv`, `--async-only` / `--no-async`, `--static-only`, `--sealed-only`, `--context-path <file>` (proximity boost), and `--why` (JSON trace of per-channel FST / BM25 / semantic hit counts — use when results look wrong).