Files
claude-code-facts/code-search-vex-vs-ast-index.md
T
alexei.dolgolyov 0129059830 docs(code-search): add vex vs ast-index benchmark notes
Point-in-time comparison of vex 1.5.0 vs ast-index 3.27.0 on a mixed-language
repo (Python/Kotlin/TS/JS, ~553 files, ~15-17k symbols). Documents:

- Indexing time, footprint, query latency for both tools
- Quality differences on real queries (usages, callers, symbol, semantic)
- Notable findings: ast-index's Python call graph was empty for this repo,
  vex's implementations misses generic-parameterized subclasses, vex usages
  catches comments/docstrings (text-flavored), ast-index uniquely has
  'changed --base <branch>' with no vex equivalent
- Re-run instructions for validating on a different repo or newer versions

Linked from claude-code-tools.md at the end of the vex section so readers
encounter the comparison right after learning about vex. Not surfaced as a
top-level README entry since it's narrower than the other root-level guides.
2026-05-18 00:48:29 +03:00

111 lines
6.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Code Search: vex vs ast-index — Benchmark Notes
> **Snapshot:** 2026-05-18 · **Tested versions:** `vex 1.5.0`, `ast-index 3.27.0`
>
> These tools evolve quickly. Results below are **point-in-time** and only
> describe the versions and the single repo tested. Re-run the benchmarks before
> citing them on a different repo, on later versions, or after either tool
> changes its index format.
## Test environment
| Aspect | Value |
|---|---|
| Repo | `led-grab` (private, mixed-language LED capture/streaming app) |
| Total files indexed | 527555 (depends on tool's file filter) |
| Total symbols indexed | ~14,969 (vex) / ~16,785 (ast-index) |
| Languages present | **Python**, **Kotlin** (Android), **TypeScript**, **JavaScript**, plus PowerShell/Bash scripts |
| Host | Single Windows 10 workstation, Git Bash, SSD |
| Index storage | `~/.cache/vex/` (vex) / `%LOCALAPPDATA%\ast-index\` (ast-index) |
The repo size is "small/medium" by both tools' definitions. **Numbers on a 10× larger repo will not scale linearly** — semantic embeddings in particular grow with symbol count, and call-graph construction grows with edge count.
## Indexing & footprint
| Aspect | vex (structural) | vex (`--semantic`) | ast-index |
|---|---|---|---|
| Cold build time | **1.6 s** | 5 m 20 s (one-time embeddings) | 1.2 s |
| Symbols | 14,969 | 14,969 | 16,785 |
| Index size on disk | 5.8 MB | larger (embeddings) | 9.7 MB |
| Incremental update | `vex update`, or `auto_update = true` in `.vex.toml` | same | rebuild only |
| Call graph | Built into index, ~4 ms queries | same | Present but **empty for Python in this repo** (see "Notable findings") |
| Multi-language | 18+ via tree-sitter | same | 13+ |
| Branch-diff (`changed --base master`) | — | — | **Yes** |
## Query latency (warm, sub-100 ms is "fast enough")
| Operation | vex | ast-index | Notes |
|---|---|---|---|
| Symbol definition | ~107 ms | **3591 ms** | Both fast |
| Usages | ~117 ms (11 hits) | ~35 ms (**4 hits**) | vex catches comments/docstrings; ast-index returns only structural refs |
| Callers | **~45 ms (6 hits)** | ~52 ms (**0 hits**) | ast-index's Python call graph was empty for this repo |
| Implementations / subclasses | ~200 ms (**0 hits**) | n/a | vex misses generic-parameterized form `class Foo(Base[T])` |
| Existence check | ~50 ms | ~30 ms | Both fine |
| Semantic (NL → symbol) | ~325 ms | — | only vex (requires `--semantic` index) |
| `similar SymName` | ~110 ms | — | only vex |
| Near-duplicate scan | ~18 s whole-repo | — | only vex |
## Query quality findings
Three real queries from the test repo:
| Query | vex | ast-index | Better fit |
|---|---|---|---|
| `usages BaseJsonStore` | 11 hits incl. tests + imports | 4 hits, **misses test files entirely** | vex |
| `symbol ScreenCapture` | 9 hits incl. fields + Kotlin + fn signatures | 5 hits, cleaner (class + imports only) | ast-index *(less noise)* |
| `callers get_latest_frame` | 6 real call sites correctly resolved | **0** (broken) | vex |
| `implementations BaseJsonStore` | 0 (generics bug) | n/a (`class` is closest) | tie / neither |
| Semantic `"WLED device discovery over mDNS"` | finds `wled_provider.discover`, `wled_client` | n/a | vex only |
| Semantic `"JSON storage migration logic"` | finds `BaseJsonStore`, `TestLegacyKeyMigration`, `_LegacyStore` | n/a | vex only |
## Notable findings
1. **ast-index's call graph was empty for this repo's Python.** `ast-index callers <fn>` returned 0 for several functions that vex correctly identified with 6+ real call sites. Whether this is a Python-language bug, an indexing edge case, or specific to this repo's structure was not investigated further — verify on your own repo before relying on `ast-index callers`.
2. **vex's `usages` is text-flavored, not structural.** It catches matches in comments, docstrings, and even `CLAUDE.md`. That can be useful or noisy depending on intent. For "real references only," prefer `vex callers` / `vex callees` / `vex pattern`, or fall back to `ast-index symbol` which is stricter.
3. **vex's `implementations` misses generic-parameterized subclasses.** `class Foo(Base[T])` is not detected as an implementation of `Base`. Workaround: use `vex pattern 'class $NAME($BASE[$$_]):' --lang python` or a plain `vex grep`.
4. **ast-index has `changed --base <branch>` — vex does not.** This makes ast-index uniquely useful during code review for "which symbols did this branch touch?" without parsing a diff manually.
5. **vex's semantic index has a one-time setup cost.** ~5 minutes to embed ~15k symbols and ~86 MB ONNX model download on first run. Worth it for natural-language queries and `similar`/`duplicates`, but you must commit to it upfront.
6. **First-time Windows install of vex requires building from source** (no prebuilt Windows binary in the v1.5.0 release assets). See `claude-code-tools.md` § vex.
## Practical recommendation
Use **vex as primary**, **ast-index as fallback** for:
- `ast-index changed --base <branch>` during code review (no vex equivalent).
- Stricter `symbol`/`usages` when vex's textual matches are too noisy.
This matches the priority chain already in the recommended global `CLAUDE.md` snippet:
```
vex → ast-index → Grep/Glob
```
The chain is not "vex always wins" — each tool has cases where it's the right call.
## Re-running these benchmarks
If you want to validate on a different repo or newer versions:
```bash
# 1. Build both indices fresh
vex init && vex index # vex structural
vex index --semantic # vex semantic (slow, one-time)
ast-index rebuild # ast-index
# 2. Run identical queries through both
SYM="SomeClassInYourRepo"
vex search "$SYM" --format compact ; ast-index symbol "$SYM"
vex usages "$SYM" --format compact ; ast-index usages "$SYM"
vex callers "$SYM" --format compact ; ast-index callers "$SYM"
# 3. Branch-diff (ast-index only)
ast-index changed --base master
```
Record the tool versions and timestamps alongside the numbers — see this document's header for the template.