claude-code-facts/code-search-vex-vs-ast-index.md

# Code Search: vex vs ast-index — Benchmark Notes

> **Snapshot:** 2026-05-18 · **Tested versions:** `vex 1.5.0`, `ast-index 3.27.0`
>
> These tools evolve quickly. Results below are **point-in-time** and only
> describe the versions and the single repo tested. Re-run the benchmarks before
> citing them on a different repo, on later versions, or after either tool
> changes its index format.

## Test environment

| Aspect | Value |
|---|---|
| Repo | `led-grab` (private, mixed-language LED capture/streaming app) |
| Total files indexed | 527–555 (depends on tool's file filter) |
| Total symbols indexed | ~14,969 (vex) / ~16,785 (ast-index) |
| Languages present | **Python**, **Kotlin** (Android), **TypeScript**, **JavaScript**, plus PowerShell/Bash scripts |
| Host | Single Windows 10 workstation, Git Bash, SSD |
| Index storage | `~/.cache/vex/` (vex) / `%LOCALAPPDATA%\ast-index\` (ast-index) |

The repo size is "small/medium" by both tools' definitions. **Numbers on a 10× larger repo will not scale linearly** — semantic embeddings in particular grow with symbol count, and call-graph construction grows with edge count.

## Indexing & footprint

| Aspect | vex (structural) | vex (`--semantic`) | ast-index |
|---|---|---|---|
| Cold build time | **1.6 s** | 5 m 20 s (one-time embeddings) | 1.2 s |
| Symbols | 14,969 | 14,969 | 16,785 |
| Index size on disk | 5.8 MB | larger (embeddings) | 9.7 MB |
| Incremental update | `vex update`, or `auto_update = true` in `.vex.toml` | same | rebuild only |
| Call graph | Built into index, ~4 ms queries | same | Present but **empty for Python in this repo** (see "Notable findings") |
| Multi-language | 18+ via tree-sitter | same | 13+ |
| Branch-diff (`changed --base master`) | — | — | **Yes** |

## Query latency (warm, sub-100 ms is "fast enough")

| Operation | vex | ast-index | Notes |
|---|---|---|---|
| Symbol definition | ~107 ms | **35–91 ms** | Both fast |
| Usages | ~117 ms (11 hits) | ~35 ms (**4 hits**) | vex catches comments/docstrings; ast-index returns only structural refs |
| Callers | **~45 ms (6 hits)** | ~52 ms (**0 hits**) | ast-index's Python call graph was empty for this repo |
| Implementations / subclasses | ~200 ms (**0 hits**) | n/a | vex misses generic-parameterized form `class Foo(Base[T])` |
| Existence check | ~50 ms | ~30 ms | Both fine |
| Semantic (NL → symbol) | ~325 ms | — | only vex (requires `--semantic` index) |
| `similar SymName` | ~110 ms | — | only vex |
| Near-duplicate scan | ~18 s whole-repo | — | only vex |

## Query quality findings

Three real queries from the test repo:

| Query | vex | ast-index | Better fit |
|---|---|---|---|
| `usages BaseJsonStore` | 11 hits incl. tests + imports | 4 hits, **misses test files entirely** | vex |
| `symbol ScreenCapture` | 9 hits incl. fields + Kotlin + fn signatures | 5 hits, cleaner (class + imports only) | ast-index *(less noise)* |
| `callers get_latest_frame` | 6 real call sites correctly resolved | **0** (broken) | vex |
| `implementations BaseJsonStore` | 0 (generics bug) | n/a (`class` is closest) | tie / neither |
| Semantic `"WLED device discovery over mDNS"` | finds `wled_provider.discover`, `wled_client` | n/a | vex only |
| Semantic `"JSON storage migration logic"` | finds `BaseJsonStore`, `TestLegacyKeyMigration`, `_LegacyStore` | n/a | vex only |

## Notable findings

1. **ast-index's call graph was empty for this repo's Python.** `ast-index callers <fn>` returned 0 for several functions that vex correctly identified with 6+ real call sites. Whether this is a Python-language bug, an indexing edge case, or specific to this repo's structure was not investigated further — verify on your own repo before relying on `ast-index callers`.

2. **vex's `usages` is text-flavored, not structural.** It catches matches in comments, docstrings, and even `CLAUDE.md`. That can be useful or noisy depending on intent. For "real references only," prefer `vex callers` / `vex callees` / `vex pattern`, or fall back to `ast-index symbol` which is stricter.

3. **vex's `implementations` misses generic-parameterized subclasses.** `class Foo(Base[T])` is not detected as an implementation of `Base`. Workaround: use `vex pattern 'class $NAME($BASE[$$_]):' --lang python` or a plain `vex grep`.

4. **ast-index has `changed --base <branch>` — vex does not.** This makes ast-index uniquely useful during code review for "which symbols did this branch touch?" without parsing a diff manually.

5. **vex's semantic index has a one-time setup cost.** ~5 minutes to embed ~15k symbols and ~86 MB ONNX model download on first run. Worth it for natural-language queries and `similar`/`duplicates`, but you must commit to it upfront.

6. **First-time Windows install of vex requires building from source** (no prebuilt Windows binary in the v1.5.0 release assets). See `claude-code-tools.md` § vex.

## Practical recommendation

Use **vex as primary**, **ast-index as fallback** for:

- `ast-index changed --base <branch>` during code review (no vex equivalent).
- Stricter `symbol`/`usages` when vex's textual matches are too noisy.

This matches the priority chain already in the recommended global `CLAUDE.md` snippet:

```
vex → ast-index → Grep/Glob
```

The chain is not "vex always wins" — each tool has cases where it's the right call.

## Re-running these benchmarks

If you want to validate on a different repo or newer versions:

```bash
# 1. Build both indices fresh
vex init && vex index                            # vex structural
vex index --semantic                             # vex semantic (slow, one-time)
ast-index rebuild                                # ast-index

# 2. Run identical queries through both
SYM="SomeClassInYourRepo"
vex search "$SYM" --format compact ; ast-index symbol "$SYM"
vex usages "$SYM" --format compact ; ast-index usages "$SYM"
vex callers "$SYM" --format compact ; ast-index callers "$SYM"

# 3. Branch-diff (ast-index only)
ast-index changed --base master
```

Record the tool versions and timestamps alongside the numbers — see this document's header for the template.