Files
claude-code-facts/code-search-vex-vs-ast-index.md
T
alexei.dolgolyov 0129059830 docs(code-search): add vex vs ast-index benchmark notes
Point-in-time comparison of vex 1.5.0 vs ast-index 3.27.0 on a mixed-language
repo (Python/Kotlin/TS/JS, ~553 files, ~15-17k symbols). Documents:

- Indexing time, footprint, query latency for both tools
- Quality differences on real queries (usages, callers, symbol, semantic)
- Notable findings: ast-index's Python call graph was empty for this repo,
  vex's implementations misses generic-parameterized subclasses, vex usages
  catches comments/docstrings (text-flavored), ast-index uniquely has
  'changed --base <branch>' with no vex equivalent
- Re-run instructions for validating on a different repo or newer versions

Linked from claude-code-tools.md at the end of the vex section so readers
encounter the comparison right after learning about vex. Not surfaced as a
top-level README entry since it's narrower than the other root-level guides.
2026-05-18 00:48:29 +03:00

6.0 KiB
Raw Blame History

Code Search: vex vs ast-index — Benchmark Notes

Snapshot: 2026-05-18 · Tested versions: vex 1.5.0, ast-index 3.27.0

These tools evolve quickly. Results below are point-in-time and only describe the versions and the single repo tested. Re-run the benchmarks before citing them on a different repo, on later versions, or after either tool changes its index format.

Test environment

Aspect Value
Repo led-grab (private, mixed-language LED capture/streaming app)
Total files indexed 527555 (depends on tool's file filter)
Total symbols indexed ~14,969 (vex) / ~16,785 (ast-index)
Languages present Python, Kotlin (Android), TypeScript, JavaScript, plus PowerShell/Bash scripts
Host Single Windows 10 workstation, Git Bash, SSD
Index storage ~/.cache/vex/ (vex) / %LOCALAPPDATA%\ast-index\ (ast-index)

The repo size is "small/medium" by both tools' definitions. Numbers on a 10× larger repo will not scale linearly — semantic embeddings in particular grow with symbol count, and call-graph construction grows with edge count.

Indexing & footprint

Aspect vex (structural) vex (--semantic) ast-index
Cold build time 1.6 s 5 m 20 s (one-time embeddings) 1.2 s
Symbols 14,969 14,969 16,785
Index size on disk 5.8 MB larger (embeddings) 9.7 MB
Incremental update vex update, or auto_update = true in .vex.toml same rebuild only
Call graph Built into index, ~4 ms queries same Present but empty for Python in this repo (see "Notable findings")
Multi-language 18+ via tree-sitter same 13+
Branch-diff (changed --base master) Yes

Query latency (warm, sub-100 ms is "fast enough")

Operation vex ast-index Notes
Symbol definition ~107 ms 3591 ms Both fast
Usages ~117 ms (11 hits) ~35 ms (4 hits) vex catches comments/docstrings; ast-index returns only structural refs
Callers ~45 ms (6 hits) ~52 ms (0 hits) ast-index's Python call graph was empty for this repo
Implementations / subclasses ~200 ms (0 hits) n/a vex misses generic-parameterized form class Foo(Base[T])
Existence check ~50 ms ~30 ms Both fine
Semantic (NL → symbol) ~325 ms only vex (requires --semantic index)
similar SymName ~110 ms only vex
Near-duplicate scan ~18 s whole-repo only vex

Query quality findings

Three real queries from the test repo:

Query vex ast-index Better fit
usages BaseJsonStore 11 hits incl. tests + imports 4 hits, misses test files entirely vex
symbol ScreenCapture 9 hits incl. fields + Kotlin + fn signatures 5 hits, cleaner (class + imports only) ast-index (less noise)
callers get_latest_frame 6 real call sites correctly resolved 0 (broken) vex
implementations BaseJsonStore 0 (generics bug) n/a (class is closest) tie / neither
Semantic "WLED device discovery over mDNS" finds wled_provider.discover, wled_client n/a vex only
Semantic "JSON storage migration logic" finds BaseJsonStore, TestLegacyKeyMigration, _LegacyStore n/a vex only

Notable findings

  1. ast-index's call graph was empty for this repo's Python. ast-index callers <fn> returned 0 for several functions that vex correctly identified with 6+ real call sites. Whether this is a Python-language bug, an indexing edge case, or specific to this repo's structure was not investigated further — verify on your own repo before relying on ast-index callers.

  2. vex's usages is text-flavored, not structural. It catches matches in comments, docstrings, and even CLAUDE.md. That can be useful or noisy depending on intent. For "real references only," prefer vex callers / vex callees / vex pattern, or fall back to ast-index symbol which is stricter.

  3. vex's implementations misses generic-parameterized subclasses. class Foo(Base[T]) is not detected as an implementation of Base. Workaround: use vex pattern 'class $NAME($BASE[$$_]):' --lang python or a plain vex grep.

  4. ast-index has changed --base <branch> — vex does not. This makes ast-index uniquely useful during code review for "which symbols did this branch touch?" without parsing a diff manually.

  5. vex's semantic index has a one-time setup cost. ~5 minutes to embed ~15k symbols and ~86 MB ONNX model download on first run. Worth it for natural-language queries and similar/duplicates, but you must commit to it upfront.

  6. First-time Windows install of vex requires building from source (no prebuilt Windows binary in the v1.5.0 release assets). See claude-code-tools.md § vex.

Practical recommendation

Use vex as primary, ast-index as fallback for:

  • ast-index changed --base <branch> during code review (no vex equivalent).
  • Stricter symbol/usages when vex's textual matches are too noisy.

This matches the priority chain already in the recommended global CLAUDE.md snippet:

vex → ast-index → Grep/Glob

The chain is not "vex always wins" — each tool has cases where it's the right call.

Re-running these benchmarks

If you want to validate on a different repo or newer versions:

# 1. Build both indices fresh
vex init && vex index                            # vex structural
vex index --semantic                             # vex semantic (slow, one-time)
ast-index rebuild                                # ast-index

# 2. Run identical queries through both
SYM="SomeClassInYourRepo"
vex search "$SYM" --format compact ; ast-index symbol "$SYM"
vex usages "$SYM" --format compact ; ast-index usages "$SYM"
vex callers "$SYM" --format compact ; ast-index callers "$SYM"

# 3. Branch-diff (ast-index only)
ast-index changed --base master

Record the tool versions and timestamps alongside the numbers — see this document's header for the template.