Indexed AST Search
Indexed AST search is the warmed-cache version of AST search. The first call extracts node-level facts (call names, instantiations, class declarations, method declarations) into an on-disk store; subsequent calls use the store to skip files that obviously cannot contain a structural match. The matcher then re-verifies the candidates against the source.
The fact index is exposed under the greph-index ast-index CLI subcommand and the Greph::buildAstIndex() / Greph::searchAstIndexed() facade methods. It also participates in warmed multi-index and manifest-backed set workflows.
Building the index
# Full build at the current directory
./vendor/bin/greph-index ast-index build .
# Full build at an explicit root
./vendor/bin/greph-index ast-index build path/to/repo
# Use a non-default index directory
./vendor/bin/greph-index ast-index build . --index-dir /tmp/greph-ast-index
# Build a mutable overlay that can refresh cheaply on demand
./vendor/bin/greph-index ast-index build . \
--lifecycle opportunistic-refresh \
--auto-refresh-max-files 32 \
--auto-refresh-max-bytes 1048576The fact index is stored at <root>/.greph-ast-index/ by default.
A successful build prints a one-line summary:
Built AST index for 2547 files in .greph-ast-index (84931 fact rows, +2547 ~0 -0 =0)The four counters track added, updated, deleted, and unchanged files since the last build.
Refreshing the index
./vendor/bin/greph-index ast-index refresh .refresh re-walks the indexed root, re-parses changed files, and rewrites only the fact rows that moved. The output uses the same four counters as the text index.
Lifecycle profiles behave the same way as warmed text indexes:
static: never freshness-check or mutate automaticallymanual-refresh: report stale state instats, but never refresh during searchopportunistic-refresh: refresh on search only when the changed set is still cheap enoughstrict-stale-check: reject stale searches instead of refreshing
Querying the index
./vendor/bin/greph-index ast-index search 'new $CLASS()' src
./vendor/bin/greph-index ast-index search '$obj->$method($$$ARGS)' src
./vendor/bin/greph-index ast-index search --json 'array($$$ITEMS)' src
./vendor/bin/greph-index ast-index search -l 'array($$$ITEMS)' src
./vendor/bin/greph-index ast-index search --trace-plan 'new $CLASS()' srcThe metavariable syntax is identical to native AST search. See Modes / AST Search for the full grammar.
Options
| Flag | Meaning |
|---|---|
--lang NAME | AST language. Default: php |
-j N, --jobs N | Number of parallel workers |
-l, --files-with-matches | List matching files only |
--glob GLOB | Include only files whose paths match GLOB |
--type NAME / --type-not NAME | File type filter |
--json | Emit JSON output |
--no-ignore | Ignore .gitignore and .grephignore |
--hidden | Include hidden files |
--strict-parse | Fail on parse errors instead of skipping them |
--fallback MODE | Missing-index behavior: fail (default) or scan |
--index-dir DIR | Use a non-default index directory |
--trace-plan | Emit warmed AST planner diagnostics to stderr |
--show-index-origin | Prefix output with the matching index label in multi-index or set mode |
Missing-index fallback
By default, ast-index search raises an error when no index exists at the requested location. Pass --fallback scan to fall back to a native AST scan instead. This is the right setting for tools that want indexed search when available but should still produce results when the index has not been built yet.
./vendor/bin/greph-index ast-index search --fallback scan 'new $CLASS()' srcMulti-index search
You can search across several warmed AST indexes in one command by repeating --index-dir. This is useful for layouts such as WordPress core plus a plugin or theme.
./vendor/bin/greph-index ast-index search \
--index-dir ../wordpress/.greph-ast-index \
--index-dir ./.greph-ast-index \
--show-index-origin \
'new $CLASS()' .When --show-index-origin is enabled, output is prefixed with the matching warmed index entry name so you can see which root produced the hit.
Manifest-backed index sets
For stable warmed setups, define a manifest and let greph-index set operate on the grouped indexes:
./vendor/bin/greph-index set build
./vendor/bin/greph-index set stats --mode ast-index --dry-refresh
./vendor/bin/greph-index set search --mode ast-index --show-index-origin 'new $CLASS()' .The set subcommands share the same lifecycle handling, stale checks, and planner diagnostics as direct ast-index usage.
Planner diagnostics
--trace-plan shows how Greph narrowed the query before the full matcher ran. For ast-index that typically includes:
- the coarse fact signature extracted from the pattern
- how many warmed indexes were searched
- how many files were rejected by the fact filter
- how many candidate files were re-parsed for exact structural verification
What the index stores
For every indexed file the AST fact extractor records a row per "interesting" node. The set of fact kinds covers the constructs that AST patterns most commonly target:
- Function and method calls (by called name)
- Class instantiations (by class name)
- Class, interface, trait, and enum declarations (by name)
- Function declarations (by name)
- Method declarations (by name and visibility)
- Use statements
When a query arrives, Greph\Ast\AstFactQuery translates the search pattern into a coarse fact signature (for example, "must contain a new of any class") and intersects the fact store. Files that survive the intersection are then re-parsed and matched against the full pattern.
This means the fact index is a prefilter, not a full-fidelity match store. The match step still parses the source, so structural correctness is identical to native AST search; only the candidate set is smaller.
When to use it
Indexed AST mode is the right tool when:
- You run the same structural queries against the same codebase repeatedly.
- The patterns target identifiable shapes (call name, class name, declaration kind) that the fact extractor stores.
- You want lower latency than re-parsing every file.
For patterns whose distinguishing feature is not in the fact set (for example, complex nested expressions without a recognizable head), the cached AST mode is usually a better fit because it skips the parse step entirely without relying on a coarse signature.
Programmatic use
use Greph\Greph;
use Greph\Ast\AstSearchOptions;
use Greph\Index\IndexLifecycleProfile;
Greph::buildAstIndex('.', lifecycle: IndexLifecycleProfile::OpportunisticRefresh);
$matches = Greph::searchAstIndexed(
'new $CLASS()',
'src',
new AstSearchOptions(jobs: 4, tracePlan: true),
);searchAstIndexed accepts the same AstSearchOptions as native AST search and returns the same list<AstMatch> shape. For multi-index or manifest-backed flows, use searchAstIndexedMany(...) and searchAstIndexedSet(...).