Files
claude-code-facts/code-search-vex-vs-ast-index.md
alexei.dolgolyov 4c3b0188d8 docs(vex): split vex into its own reference, refresh for v1.16.0
- Add vex.md: install (prebuilt binaries + self-update), GPU/CUDA setup,
  jina-code+CUDA recommendation (CUDA essential, too slow on CPU),
  vex mcp install, full command set (bundle/paths/reachable/diff/history,
  search scope+metadata filters), CLAUDE.md integration, caveats
- Shrink claude-code-tools.md section vex to a blurb + links
- Note v1.16.0 capabilities in the vex-vs-ast-index benchmark (not re-benchmarked)
- README: bump date, index vex.md, refresh vex descriptions
2026-06-11 01:11:26 +03:00

178 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Search: vex vs ast-index — Benchmark Notes
> **Snapshot:** 2026-05-26 · **Tested versions:** `vex 1.9.1`, `ast-index 3.41.0`
>
> These tools evolve quickly. Results below are **point-in-time** and only
> describe the versions and the single repo tested. Re-run the benchmarks before
> citing them on a different repo, on later versions, or after either tool
> changes its index format.
>
> **Heads-up:** Several conclusions from the 2026-05-18 snapshot have flipped on
> this revision. See ["Changes since the 2026-05-18 snapshot"](#changes-since-the-2026-05-18-snapshot)
> at the bottom for a summary.
> **Capability update — vex 1.16.0 (capabilities only, NOT re-benchmarked):** the
> latency and quality tables below remain pinned to **vex 1.9.1**, but vex's feature
> surface has moved on. New since 1.9.1 (no fresh measurements taken here):
>
> - **`vex history <Symbol>`** (v1.16) — query-time git-log walker returning every
> historical version of a symbol (`--diff`, `--since`/`--until`, `--author`, `--kind`);
> opt-in indexed sidecar via `vex index --history`. No ast-index equivalent.
> - **`vex mcp install` / `uninstall` / `list`** (v1.15.0) — idempotent MCP-server
> registration for Claude Code / Cursor (replaces hand-editing the agent config).
> - **Scope & metadata search filters** on `vex search` / `vex usages` — `--kind`,
> `--include`/`--exclude`, `--since`/`--since-branched`/`--changed-only`, `--visibility`,
> `--async-only`, `--static-only`, `--sealed-only`, `--why`.
> - **`vex gpu`** execution-provider probe; **`vex watch`** continuous reindex;
> **`vex init --agents-md`**; prebuilt Windows `vex` + `vex-mcp` binaries.
>
> See **[vex.md](vex.md)** for the full current reference (install, GPU/CUDA, config,
> command set). Re-run the tables below on 1.16.0 before citing the numbers.
## Test environment
| Aspect | Value |
|---|---|
| Repo | `led-grab` (private, mixed-language LED capture/streaming app) |
| Total files indexed | 555 (ast-index); vex indexes a similar set |
| Total symbols indexed | ~15,596 (vex) / ~18,226 (ast-index) |
| Reference edges | n/a in vex `status` output / **62,625 refs** (ast-index) |
| Languages present | **Python**, **Kotlin** (Android), **TypeScript**, **JavaScript**, plus PowerShell/Bash scripts |
| Host | Single Windows 10 workstation, Git Bash, SSD |
| Index storage | `%LOCALAPPDATA%\vex\<hash>\index.vex` on Windows (`~/.cache/vex/` on Unix) / `%LOCALAPPDATA%\ast-index\<hash>\index.db` |
The repo size is "small/medium" by both tools' definitions. **Numbers on a 10× larger repo will not scale linearly** — semantic embeddings in particular grow with symbol count, and call-graph construction grows with edge count.
## Indexing & footprint
| Aspect | vex (structural) | vex (`--semantic`) | ast-index |
|---|---|---|---|
| Cold build time | **~12 s** | 5 m 20 s (one-time embeddings) | ~12 s |
| Symbols | 15,596 | 15,596 | 18,226 |
| Index size on disk | (structural-only smaller) | **26.4 MB** (with embeddings) | 10.3 MB |
| Incremental update | `vex update`, or `auto_update = true` in `.vex.toml` (auto-runs before queries when stale) | same | `ast-index update` (incremental) or `rebuild` |
| Call graph | Built into index, ~4 ms queries | same | **Now populated for Python** (was broken in 3.27) |
| Multi-language | 18+ via tree-sitter | same | 13+ |
| Branch-diff (symbol-level vs git rev) | **`vex diff --base <rev>`** (NEW in 1.7+) | same | **`ast-index changed --base <rev>`** |
| Self-update | **`vex self-update`** (NEW; works on Windows/macOS/Linux) | same | — (manual install) |
## Query latency (warm, sub-100 ms is "fast enough")
| Operation | vex | ast-index | Notes |
|---|---|---|---|
| Symbol definition | ~100 ms | ~3090 ms | Both fast |
| Usages | ~80 ms | ~35 ms | See "Notable findings" #2 for precision flip |
| Callers (direct) | ~45 ms | ~50 ms | Both now resolve real call sites for Python |
| Implementations / subclasses | ~80 ms | ~50 ms | Both work; vex generics gap fixed in v1.7.0 |
| Existence check (`vex check`) | ~30 ms | — (use `symbol`) | vex-only fast multi-symbol existence |
| Semantic (NL → symbol) | ~300 ms | — | only vex (requires `--semantic` index) |
| `similar SymName` | ~110 ms | — | only vex |
| Near-duplicate scan | ~18 s whole-repo | — | only vex |
| Multi-hop callers (`paths`, `reachable`) | hundreds of ms | — | only vex (transitive call graph) |
| Bundle (1-shot `show + callers + callees + similar`) | ~150 ms | — | only vex (`bundle --mode symbol`) |
## Query quality findings
Three real queries from the test repo (re-run on 2026-05-26):
| Query | vex 1.9.1 | ast-index 3.41.0 | Better fit |
|---|---|---|---|
| `usages BaseJsonStore` | **0 hits** (no structural refs — subclasses caught by `implementations`) | 4 hits, all in **comments/docstrings** | depends on intent — vex if you want structural-only, ast-index if you want any textual mention |
| `callers get_latest_frame` | 6 real call sites | **9 hits** (incl. 3 false positives in docstrings) | vex (cleaner) |
| `implementations BaseJsonStore` | **2 hits** (`_TestStore`, `_LegacyStore`) | n/a (`class` is closest) | vex |
| `diff --base HEAD~5` | 211 changes (incl. heading/markdown moves) | "No supported files changed" (Python/Kotlin/TS only) | depends — vex broader, ast-index narrower-by-design |
| Semantic `"WLED device discovery over mDNS"` | finds `wled_provider.discover`, `wled_client` | n/a | vex only |
| Semantic `"JSON storage migration logic"` | finds `BaseJsonStore`, `TestLegacyKeyMigration`, `_LegacyStore` | n/a | vex only |
## Notable findings
1. **ast-index's Python call graph now works.** In the 2026-05-18 run on v3.27.0 it returned 0 for several functions; on v3.41.0 it returns real call sites for the same queries. Whatever was broken upstream is fixed. (Watch out: it still includes textual mentions in docstrings as "callers" — see #2.)
2. **`usages` precision is now the inverse of last snapshot.**
- **vex 1.9.1**: T1-language `usages` (Python/TS/Rust/C#/C++) is an AST identifier walk, optionally backed by persisted reference edges with `--strict` (Phase 11.1 / v1.8.0). It returned **0 hits** for `usages BaseJsonStore` — correctly, since there are 0 structural usages outside subclass declarations (those are picked up by `implementations`).
- **ast-index 3.41.0**: returned 4 hits, all of which are in comments or docstrings.
- The old "vex catches comments and docstrings, ast-index doesn't" advice has *swapped*. Today, vex is the stricter tool by default and ast-index is the textual one. Use `vex grep` (regex) or `ast-index usages` if you actually want comment/docstring mentions.
3. **vex's `implementations` for generic-parameterized subclasses now works.** `class Foo(Base[T])` is detected as of v1.7.0. The old workaround (`vex pattern 'class $NAME($BASE[$$_]):' --lang python`) is no longer needed in this case. Remaining gap (per CLAUDE.md): decorator-based dispatch is not linked.
4. **vex now has `diff --base <rev>` (symbol-level git rev diff).** This replaces the previous "ast-index has branch-diff, vex does not" finding. The two tools differ in scope, not in capability:
- `vex diff` covers all parsed symbol kinds across all indexed languages, including headings in markdown — broader and noisier.
- `ast-index changed` covers Python/Kotlin/TS/JS code symbols only — narrower and cleaner if you only care about code changes.
Use vex when you want everything; use ast-index when you only want code-symbol churn.
5. **vex's semantic index still has a one-time setup cost.** ~5 minutes to embed ~15k symbols and ~86 MB ONNX model download on first run. Worth it for natural-language queries and `similar`/`duplicates`, but you must commit to it upfront. After that it lives in the same `index.vex` (~26 MB total with embeddings included).
6. **vex now ships prebuilt Windows binaries.** `vex self-update` works on Windows/macOS/Linux in v1.9.1 — no more building from source on first install. Update `claude-code-tools.md` § vex accordingly.
7. **New vex commands worth knowing (added since v1.5):**
- `vex diff --base <rev>` — symbol-level branch diff (#4 above).
- `vex paths --from A --to B` — enumerate caller chains between two symbols (multi-hop `callers`).
- `vex reachable --target T` — find all symbols that transitively reach `T` via the call graph.
- `vex check sym1,sym2,...` — fast multi-symbol existence check.
- `vex bundle --mode {symbol,pr-impact,project}` — one call replaces the `show → callers → callees → similar` round trip; pr-impact mode bundles changed symbols + transitive callers + reachable tests for code review.
- `vex eval` — built-in ranking eval harness for CI regression.
- `vex capabilities` — machine-readable feature matrix.
- **Since v1.10v1.16 (see the capability-update note at the top + [vex.md](vex.md)):** `vex history` (symbol-level git archaeology), `vex mcp install`/`uninstall`/`list` (idempotent MCP registration), `vex gpu` (EP probe), `vex watch` (continuous reindex), and scope/metadata search filters (`--kind`, `--include`/`--exclude`, `--since`/`--changed-only`, `--visibility`, `--async-only`, `--why`).
## Practical recommendation
The chain in the recommended global `CLAUDE.md` snippet still applies:
```
vex → ast-index → Grep/Glob
```
…but the *reasons* for the fallback have shifted on this version:
- **Default to vex** for symbol search, usages, callers, callees, implementations, semantic, similar, duplicates, bundle, and branch-diff. The call graph, the precision improvements (T1 AST walk + `--strict`), and the new `bundle`/`paths`/`reachable`/`diff` primitives cover most agent code-search needs in one tool.
- **Fall back to ast-index** when:
- You want only code-symbol churn on a branch (`changed --base` is narrower than `vex diff`).
- You want textual matches in comments/docstrings (vex T1 `usages` is now strict and may miss those — `vex grep` is the alternative but ast-index `usages` is sometimes more convenient).
- vex is not installed on the host (rare now that prebuilt binaries exist).
- **Fall through to Grep/Glob** for regex, config files (YAML/JSON/TOML), pure prose, or unparsed languages.
Neither tool fully resolves: decorator-based dispatch, string-resolved targets (uvicorn factory strings, Celery task names), reflection / `getattr`, dynamic imports, and macro-expanded references. Before any rename or delete, backstop structural results with `vex grep '\bName\b'` regardless of which tool you started with.
## Re-running these benchmarks
If you want to validate on a different repo or newer versions:
```bash
# 1. Build both indices fresh
vex init && vex index # vex structural (set auto_update = true)
vex index --semantic # vex semantic (slow, one-time)
ast-index rebuild # ast-index
# 2. Run identical queries through both
SYM="SomeClassInYourRepo"
vex search "$SYM" --format compact ; ast-index symbol "$SYM"
vex usages "$SYM" --format compact ; ast-index usages "$SYM"
vex callers "$SYM" --format compact ; ast-index callers "$SYM"
# 3. Branch-diff — both tools now support this
vex diff --base master --format compact
ast-index changed --base master
# 4. Multi-hop and bundle (vex-only)
vex paths --from caller_fn --to callee_fn
vex reachable --target some_critical_fn
vex bundle --mode symbol --symbol "$SYM"
vex bundle --mode pr-impact --base master
```
Record the tool versions and timestamps alongside the numbers — see this document's header for the template.
## Changes since the 2026-05-18 snapshot
What flipped, what got fixed, what's new:
| Finding from 2026-05-18 | Status on 2026-05-26 |
|---|---|
| ast-index's Python call graph is empty | **Fixed** — now returns real call sites in v3.41.0 |
| vex's `implementations` misses generic-parameterized subclasses | **Fixed** — works as of v1.7.0 |
| vex's `usages` is text-flavored (catches comments/docstrings) | **Reversed** — T1 `usages` is now AST-precise (Phase 11.1 / v1.8.0); ast-index is now the textual one |
| ast-index has `changed --base`, vex does not | **Obsolete**`vex diff --base <rev>` shipped in 1.7+; tools differ in scope, not capability |
| Windows install of vex requires building from source | **Fixed** — prebuilt Windows binaries; `vex self-update` works |
| 4-round-trip agent loop `show → callers → callees → similar` | **Collapsed**`vex bundle --mode symbol` is one call |
| Multi-hop call-graph queries unsupported | **Added**`vex paths` and `vex reachable` |