diff --git a/claude-code-tools.md b/claude-code-tools.md index 09639e3..8aac510 100644 --- a/claude-code-tools.md +++ b/claude-code-tools.md @@ -160,6 +160,8 @@ Fast hybrid code search — combines tree-sitter AST parsing, FST symbol lookup, - **Global (`~/.claude/CLAUDE.md`)** — preferred default once `vex-mcp` is registered at user scope. Vex becomes the recommendation for every project as soon as the index is built. - **Project-local (`./CLAUDE.md`)** — for overrides (different fallback chain, monorepo-specific `--filter` paths, language-specific `--kind` defaults, or excluding vex on repos where it isn't set up). +- **vex vs AST Index — when each wins:** Both tools cover similar ground, but they're not interchangeable. For a point-in-time head-to-head on a real mixed-language repo (Python / Kotlin / TypeScript / JavaScript) with measured latency, quality differences, and version-pinned findings, see [`code-search-vex-vs-ast-index.md`](code-search-vex-vs-ast-index.md). Headline: keep `vex` as primary, fall back to `ast-index changed --base ` for code-review diffs (no vex equivalent) and to `ast-index symbol`/`usages` when vex's textual matches are too noisy. + ### Packaged Skills Packaged `.skill` files available in the [skills/](skills/) directory: diff --git a/code-search-vex-vs-ast-index.md b/code-search-vex-vs-ast-index.md new file mode 100644 index 0000000..703da81 --- /dev/null +++ b/code-search-vex-vs-ast-index.md @@ -0,0 +1,110 @@ +# Code Search: vex vs ast-index — Benchmark Notes + +> **Snapshot:** 2026-05-18 · **Tested versions:** `vex 1.5.0`, `ast-index 3.27.0` +> +> These tools evolve quickly. Results below are **point-in-time** and only +> describe the versions and the single repo tested. Re-run the benchmarks before +> citing them on a different repo, on later versions, or after either tool +> changes its index format. + +## Test environment + +| Aspect | Value | +|---|---| +| Repo | `led-grab` (private, mixed-language LED capture/streaming app) | +| Total files indexed | 527–555 (depends on tool's file filter) | +| Total symbols indexed | ~14,969 (vex) / ~16,785 (ast-index) | +| Languages present | **Python**, **Kotlin** (Android), **TypeScript**, **JavaScript**, plus PowerShell/Bash scripts | +| Host | Single Windows 10 workstation, Git Bash, SSD | +| Index storage | `~/.cache/vex/` (vex) / `%LOCALAPPDATA%\ast-index\` (ast-index) | + +The repo size is "small/medium" by both tools' definitions. **Numbers on a 10× larger repo will not scale linearly** — semantic embeddings in particular grow with symbol count, and call-graph construction grows with edge count. + +## Indexing & footprint + +| Aspect | vex (structural) | vex (`--semantic`) | ast-index | +|---|---|---|---| +| Cold build time | **1.6 s** | 5 m 20 s (one-time embeddings) | 1.2 s | +| Symbols | 14,969 | 14,969 | 16,785 | +| Index size on disk | 5.8 MB | larger (embeddings) | 9.7 MB | +| Incremental update | `vex update`, or `auto_update = true` in `.vex.toml` | same | rebuild only | +| Call graph | Built into index, ~4 ms queries | same | Present but **empty for Python in this repo** (see "Notable findings") | +| Multi-language | 18+ via tree-sitter | same | 13+ | +| Branch-diff (`changed --base master`) | — | — | **Yes** | + +## Query latency (warm, sub-100 ms is "fast enough") + +| Operation | vex | ast-index | Notes | +|---|---|---|---| +| Symbol definition | ~107 ms | **35–91 ms** | Both fast | +| Usages | ~117 ms (11 hits) | ~35 ms (**4 hits**) | vex catches comments/docstrings; ast-index returns only structural refs | +| Callers | **~45 ms (6 hits)** | ~52 ms (**0 hits**) | ast-index's Python call graph was empty for this repo | +| Implementations / subclasses | ~200 ms (**0 hits**) | n/a | vex misses generic-parameterized form `class Foo(Base[T])` | +| Existence check | ~50 ms | ~30 ms | Both fine | +| Semantic (NL → symbol) | ~325 ms | — | only vex (requires `--semantic` index) | +| `similar SymName` | ~110 ms | — | only vex | +| Near-duplicate scan | ~18 s whole-repo | — | only vex | + +## Query quality findings + +Three real queries from the test repo: + +| Query | vex | ast-index | Better fit | +|---|---|---|---| +| `usages BaseJsonStore` | 11 hits incl. tests + imports | 4 hits, **misses test files entirely** | vex | +| `symbol ScreenCapture` | 9 hits incl. fields + Kotlin + fn signatures | 5 hits, cleaner (class + imports only) | ast-index *(less noise)* | +| `callers get_latest_frame` | 6 real call sites correctly resolved | **0** (broken) | vex | +| `implementations BaseJsonStore` | 0 (generics bug) | n/a (`class` is closest) | tie / neither | +| Semantic `"WLED device discovery over mDNS"` | finds `wled_provider.discover`, `wled_client` | n/a | vex only | +| Semantic `"JSON storage migration logic"` | finds `BaseJsonStore`, `TestLegacyKeyMigration`, `_LegacyStore` | n/a | vex only | + +## Notable findings + +1. **ast-index's call graph was empty for this repo's Python.** `ast-index callers ` returned 0 for several functions that vex correctly identified with 6+ real call sites. Whether this is a Python-language bug, an indexing edge case, or specific to this repo's structure was not investigated further — verify on your own repo before relying on `ast-index callers`. + +2. **vex's `usages` is text-flavored, not structural.** It catches matches in comments, docstrings, and even `CLAUDE.md`. That can be useful or noisy depending on intent. For "real references only," prefer `vex callers` / `vex callees` / `vex pattern`, or fall back to `ast-index symbol` which is stricter. + +3. **vex's `implementations` misses generic-parameterized subclasses.** `class Foo(Base[T])` is not detected as an implementation of `Base`. Workaround: use `vex pattern 'class $NAME($BASE[$$_]):' --lang python` or a plain `vex grep`. + +4. **ast-index has `changed --base ` — vex does not.** This makes ast-index uniquely useful during code review for "which symbols did this branch touch?" without parsing a diff manually. + +5. **vex's semantic index has a one-time setup cost.** ~5 minutes to embed ~15k symbols and ~86 MB ONNX model download on first run. Worth it for natural-language queries and `similar`/`duplicates`, but you must commit to it upfront. + +6. **First-time Windows install of vex requires building from source** (no prebuilt Windows binary in the v1.5.0 release assets). See `claude-code-tools.md` § vex. + +## Practical recommendation + +Use **vex as primary**, **ast-index as fallback** for: + +- `ast-index changed --base ` during code review (no vex equivalent). +- Stricter `symbol`/`usages` when vex's textual matches are too noisy. + +This matches the priority chain already in the recommended global `CLAUDE.md` snippet: + +``` +vex → ast-index → Grep/Glob +``` + +The chain is not "vex always wins" — each tool has cases where it's the right call. + +## Re-running these benchmarks + +If you want to validate on a different repo or newer versions: + +```bash +# 1. Build both indices fresh +vex init && vex index # vex structural +vex index --semantic # vex semantic (slow, one-time) +ast-index rebuild # ast-index + +# 2. Run identical queries through both +SYM="SomeClassInYourRepo" +vex search "$SYM" --format compact ; ast-index symbol "$SYM" +vex usages "$SYM" --format compact ; ast-index usages "$SYM" +vex callers "$SYM" --format compact ; ast-index callers "$SYM" + +# 3. Branch-diff (ast-index only) +ast-index changed --base master +``` + +Record the tool versions and timestamps alongside the numbers — see this document's header for the template.