- Add vex.md: install (prebuilt binaries + self-update), GPU/CUDA setup, jina-code+CUDA recommendation (CUDA essential, too slow on CPU), vex mcp install, full command set (bundle/paths/reachable/diff/history, search scope+metadata filters), CLAUDE.md integration, caveats - Shrink claude-code-tools.md section vex to a blurb + links - Note v1.16.0 capabilities in the vex-vs-ast-index benchmark (not re-benchmarked) - README: bump date, index vex.md, refresh vex descriptions
12 KiB
Code Search: vex vs ast-index — Benchmark Notes
Snapshot: 2026-05-26 · Tested versions:
vex 1.9.1,ast-index 3.41.0These tools evolve quickly. Results below are point-in-time and only describe the versions and the single repo tested. Re-run the benchmarks before citing them on a different repo, on later versions, or after either tool changes its index format.
Heads-up: Several conclusions from the 2026-05-18 snapshot have flipped on this revision. See "Changes since the 2026-05-18 snapshot" at the bottom for a summary.
Capability update — vex 1.16.0 (capabilities only, NOT re-benchmarked): the latency and quality tables below remain pinned to vex 1.9.1, but vex's feature surface has moved on. New since 1.9.1 (no fresh measurements taken here):
vex history <Symbol>(v1.16) — query-time git-log walker returning every historical version of a symbol (--diff,--since/--until,--author,--kind); opt-in indexed sidecar viavex index --history. No ast-index equivalent.vex mcp install/uninstall/list(v1.15.0) — idempotent MCP-server registration for Claude Code / Cursor (replaces hand-editing the agent config).- Scope & metadata search filters on
vex search/vex usages—--kind,--include/--exclude,--since/--since-branched/--changed-only,--visibility,--async-only,--static-only,--sealed-only,--why.vex gpuexecution-provider probe;vex watchcontinuous reindex;vex init --agents-md; prebuilt Windowsvex+vex-mcpbinaries.See vex.md for the full current reference (install, GPU/CUDA, config, command set). Re-run the tables below on 1.16.0 before citing the numbers.
Test environment
| Aspect | Value |
|---|---|
| Repo | led-grab (private, mixed-language LED capture/streaming app) |
| Total files indexed | 555 (ast-index); vex indexes a similar set |
| Total symbols indexed | ~15,596 (vex) / ~18,226 (ast-index) |
| Reference edges | n/a in vex status output / 62,625 refs (ast-index) |
| Languages present | Python, Kotlin (Android), TypeScript, JavaScript, plus PowerShell/Bash scripts |
| Host | Single Windows 10 workstation, Git Bash, SSD |
| Index storage | %LOCALAPPDATA%\vex\<hash>\index.vex on Windows (~/.cache/vex/ on Unix) / %LOCALAPPDATA%\ast-index\<hash>\index.db |
The repo size is "small/medium" by both tools' definitions. Numbers on a 10× larger repo will not scale linearly — semantic embeddings in particular grow with symbol count, and call-graph construction grows with edge count.
Indexing & footprint
| Aspect | vex (structural) | vex (--semantic) |
ast-index |
|---|---|---|---|
| Cold build time | ~1–2 s | 5 m 20 s (one-time embeddings) | ~1–2 s |
| Symbols | 15,596 | 15,596 | 18,226 |
| Index size on disk | (structural-only smaller) | 26.4 MB (with embeddings) | 10.3 MB |
| Incremental update | vex update, or auto_update = true in .vex.toml (auto-runs before queries when stale) |
same | ast-index update (incremental) or rebuild |
| Call graph | Built into index, ~4 ms queries | same | Now populated for Python (was broken in 3.27) |
| Multi-language | 18+ via tree-sitter | same | 13+ |
| Branch-diff (symbol-level vs git rev) | vex diff --base <rev> (NEW in 1.7+) |
same | ast-index changed --base <rev> |
| Self-update | vex self-update (NEW; works on Windows/macOS/Linux) |
same | — (manual install) |
Query latency (warm, sub-100 ms is "fast enough")
| Operation | vex | ast-index | Notes |
|---|---|---|---|
| Symbol definition | ~100 ms | ~30–90 ms | Both fast |
| Usages | ~80 ms | ~35 ms | See "Notable findings" #2 for precision flip |
| Callers (direct) | ~45 ms | ~50 ms | Both now resolve real call sites for Python |
| Implementations / subclasses | ~80 ms | ~50 ms | Both work; vex generics gap fixed in v1.7.0 |
Existence check (vex check) |
~30 ms | — (use symbol) |
vex-only fast multi-symbol existence |
| Semantic (NL → symbol) | ~300 ms | — | only vex (requires --semantic index) |
similar SymName |
~110 ms | — | only vex |
| Near-duplicate scan | ~18 s whole-repo | — | only vex |
Multi-hop callers (paths, reachable) |
hundreds of ms | — | only vex (transitive call graph) |
Bundle (1-shot show + callers + callees + similar) |
~150 ms | — | only vex (bundle --mode symbol) |
Query quality findings
Three real queries from the test repo (re-run on 2026-05-26):
| Query | vex 1.9.1 | ast-index 3.41.0 | Better fit |
|---|---|---|---|
usages BaseJsonStore |
0 hits (no structural refs — subclasses caught by implementations) |
4 hits, all in comments/docstrings | depends on intent — vex if you want structural-only, ast-index if you want any textual mention |
callers get_latest_frame |
6 real call sites | 9 hits (incl. 3 false positives in docstrings) | vex (cleaner) |
implementations BaseJsonStore |
2 hits (_TestStore, _LegacyStore) |
n/a (class is closest) |
vex |
diff --base HEAD~5 |
211 changes (incl. heading/markdown moves) | "No supported files changed" (Python/Kotlin/TS only) | depends — vex broader, ast-index narrower-by-design |
Semantic "WLED device discovery over mDNS" |
finds wled_provider.discover, wled_client |
n/a | vex only |
Semantic "JSON storage migration logic" |
finds BaseJsonStore, TestLegacyKeyMigration, _LegacyStore |
n/a | vex only |
Notable findings
-
ast-index's Python call graph now works. In the 2026-05-18 run on v3.27.0 it returned 0 for several functions; on v3.41.0 it returns real call sites for the same queries. Whatever was broken upstream is fixed. (Watch out: it still includes textual mentions in docstrings as "callers" — see #2.)
-
usagesprecision is now the inverse of last snapshot.- vex 1.9.1: T1-language
usages(Python/TS/Rust/C#/C++) is an AST identifier walk, optionally backed by persisted reference edges with--strict(Phase 11.1 / v1.8.0). It returned 0 hits forusages BaseJsonStore— correctly, since there are 0 structural usages outside subclass declarations (those are picked up byimplementations). - ast-index 3.41.0: returned 4 hits, all of which are in comments or docstrings.
- The old "vex catches comments and docstrings, ast-index doesn't" advice has swapped. Today, vex is the stricter tool by default and ast-index is the textual one. Use
vex grep(regex) orast-index usagesif you actually want comment/docstring mentions.
- vex 1.9.1: T1-language
-
vex's
implementationsfor generic-parameterized subclasses now works.class Foo(Base[T])is detected as of v1.7.0. The old workaround (vex pattern 'class $NAME($BASE[$$_]):' --lang python) is no longer needed in this case. Remaining gap (per CLAUDE.md): decorator-based dispatch is not linked. -
vex now has
diff --base <rev>(symbol-level git rev diff). This replaces the previous "ast-index has branch-diff, vex does not" finding. The two tools differ in scope, not in capability:vex diffcovers all parsed symbol kinds across all indexed languages, including headings in markdown — broader and noisier.ast-index changedcovers Python/Kotlin/TS/JS code symbols only — narrower and cleaner if you only care about code changes. Use vex when you want everything; use ast-index when you only want code-symbol churn.
-
vex's semantic index still has a one-time setup cost. ~5 minutes to embed ~15k symbols and ~86 MB ONNX model download on first run. Worth it for natural-language queries and
similar/duplicates, but you must commit to it upfront. After that it lives in the sameindex.vex(~26 MB total with embeddings included). -
vex now ships prebuilt Windows binaries.
vex self-updateworks on Windows/macOS/Linux in v1.9.1 — no more building from source on first install. Updateclaude-code-tools.md§ vex accordingly. -
New vex commands worth knowing (added since v1.5):
vex diff --base <rev>— symbol-level branch diff (#4 above).vex paths --from A --to B— enumerate caller chains between two symbols (multi-hopcallers).vex reachable --target T— find all symbols that transitively reachTvia the call graph.vex check sym1,sym2,...— fast multi-symbol existence check.vex bundle --mode {symbol,pr-impact,project}— one call replaces theshow → callers → callees → similarround trip; pr-impact mode bundles changed symbols + transitive callers + reachable tests for code review.vex eval— built-in ranking eval harness for CI regression.vex capabilities— machine-readable feature matrix.- Since v1.10–v1.16 (see the capability-update note at the top + vex.md):
vex history(symbol-level git archaeology),vex mcp install/uninstall/list(idempotent MCP registration),vex gpu(EP probe),vex watch(continuous reindex), and scope/metadata search filters (--kind,--include/--exclude,--since/--changed-only,--visibility,--async-only,--why).
Practical recommendation
The chain in the recommended global CLAUDE.md snippet still applies:
vex → ast-index → Grep/Glob
…but the reasons for the fallback have shifted on this version:
- Default to vex for symbol search, usages, callers, callees, implementations, semantic, similar, duplicates, bundle, and branch-diff. The call graph, the precision improvements (T1 AST walk +
--strict), and the newbundle/paths/reachable/diffprimitives cover most agent code-search needs in one tool. - Fall back to ast-index when:
- You want only code-symbol churn on a branch (
changed --baseis narrower thanvex diff). - You want textual matches in comments/docstrings (vex T1
usagesis now strict and may miss those —vex grepis the alternative but ast-indexusagesis sometimes more convenient). - vex is not installed on the host (rare now that prebuilt binaries exist).
- You want only code-symbol churn on a branch (
- Fall through to Grep/Glob for regex, config files (YAML/JSON/TOML), pure prose, or unparsed languages.
Neither tool fully resolves: decorator-based dispatch, string-resolved targets (uvicorn factory strings, Celery task names), reflection / getattr, dynamic imports, and macro-expanded references. Before any rename or delete, backstop structural results with vex grep '\bName\b' regardless of which tool you started with.
Re-running these benchmarks
If you want to validate on a different repo or newer versions:
# 1. Build both indices fresh
vex init && vex index # vex structural (set auto_update = true)
vex index --semantic # vex semantic (slow, one-time)
ast-index rebuild # ast-index
# 2. Run identical queries through both
SYM="SomeClassInYourRepo"
vex search "$SYM" --format compact ; ast-index symbol "$SYM"
vex usages "$SYM" --format compact ; ast-index usages "$SYM"
vex callers "$SYM" --format compact ; ast-index callers "$SYM"
# 3. Branch-diff — both tools now support this
vex diff --base master --format compact
ast-index changed --base master
# 4. Multi-hop and bundle (vex-only)
vex paths --from caller_fn --to callee_fn
vex reachable --target some_critical_fn
vex bundle --mode symbol --symbol "$SYM"
vex bundle --mode pr-impact --base master
Record the tool versions and timestamps alongside the numbers — see this document's header for the template.
Changes since the 2026-05-18 snapshot
What flipped, what got fixed, what's new:
| Finding from 2026-05-18 | Status on 2026-05-26 |
|---|---|
| ast-index's Python call graph is empty | Fixed — now returns real call sites in v3.41.0 |
vex's implementations misses generic-parameterized subclasses |
Fixed — works as of v1.7.0 |
vex's usages is text-flavored (catches comments/docstrings) |
Reversed — T1 usages is now AST-precise (Phase 11.1 / v1.8.0); ast-index is now the textual one |
ast-index has changed --base, vex does not |
Obsolete — vex diff --base <rev> shipped in 1.7+; tools differ in scope, not capability |
| Windows install of vex requires building from source | Fixed — prebuilt Windows binaries; vex self-update works |
4-round-trip agent loop show → callers → callees → similar |
Collapsed — vex bundle --mode symbol is one call |
| Multi-hop call-graph queries unsupported | Added — vex paths and vex reachable |