Greph
Indexed Modes

Indexed AST Search

Indexed AST search is the warmed-cache version of AST search. The first call extracts node-level facts (call names, instantiations, class declarations, method declarations) into an on-disk store; subsequent calls use the store to skip files that obviously cannot contain a structural match. The matcher then re-verifies the candidates against the source.

The fact index is exposed under the greph-index ast-index CLI subcommand and the Greph::buildAstIndex() / Greph::searchAstIndexed() facade methods. It also participates in warmed multi-index and manifest-backed set workflows.

Building the index

# Full build at the current directory
./vendor/bin/greph-index ast-index build .

# Full build at an explicit root
./vendor/bin/greph-index ast-index build path/to/repo

# Use a non-default index directory
./vendor/bin/greph-index ast-index build . --index-dir /tmp/greph-ast-index

# Build a mutable overlay that can refresh cheaply on demand
./vendor/bin/greph-index ast-index build . \
  --lifecycle opportunistic-refresh \
  --auto-refresh-max-files 32 \
  --auto-refresh-max-bytes 1048576

The fact index is stored at <root>/.greph-ast-index/ by default.

A successful build prints a one-line summary:

Built AST index for 2547 files in .greph-ast-index (84931 fact rows, +2547 ~0 -0 =0)

The four counters track added, updated, deleted, and unchanged files since the last build.

Refreshing the index

./vendor/bin/greph-index ast-index refresh .

refresh re-walks the indexed root, re-parses changed files, and rewrites only the fact rows that moved. The output uses the same four counters as the text index.

Lifecycle profiles behave the same way as warmed text indexes:

  • static: never freshness-check or mutate automatically
  • manual-refresh: report stale state in stats, but never refresh during search
  • opportunistic-refresh: refresh on search only when the changed set is still cheap enough
  • strict-stale-check: reject stale searches instead of refreshing

Querying the index

./vendor/bin/greph-index ast-index search 'new $CLASS()' src
./vendor/bin/greph-index ast-index search '$obj->$method($$$ARGS)' src
./vendor/bin/greph-index ast-index search --json 'array($$$ITEMS)' src
./vendor/bin/greph-index ast-index search -l 'array($$$ITEMS)' src
./vendor/bin/greph-index ast-index search --trace-plan 'new $CLASS()' src

The metavariable syntax is identical to native AST search. See Modes / AST Search for the full grammar.

Options

FlagMeaning
--lang NAMEAST language. Default: php
-j N, --jobs NNumber of parallel workers
-l, --files-with-matchesList matching files only
--glob GLOBInclude only files whose paths match GLOB
--type NAME / --type-not NAMEFile type filter
--jsonEmit JSON output
--no-ignoreIgnore .gitignore and .grephignore
--hiddenInclude hidden files
--strict-parseFail on parse errors instead of skipping them
--fallback MODEMissing-index behavior: fail (default) or scan
--index-dir DIRUse a non-default index directory
--trace-planEmit warmed AST planner diagnostics to stderr
--show-index-originPrefix output with the matching index label in multi-index or set mode

Missing-index fallback

By default, ast-index search raises an error when no index exists at the requested location. Pass --fallback scan to fall back to a native AST scan instead. This is the right setting for tools that want indexed search when available but should still produce results when the index has not been built yet.

./vendor/bin/greph-index ast-index search --fallback scan 'new $CLASS()' src

You can search across several warmed AST indexes in one command by repeating --index-dir. This is useful for layouts such as WordPress core plus a plugin or theme.

./vendor/bin/greph-index ast-index search \
  --index-dir ../wordpress/.greph-ast-index \
  --index-dir ./.greph-ast-index \
  --show-index-origin \
  'new $CLASS()' .

When --show-index-origin is enabled, output is prefixed with the matching warmed index entry name so you can see which root produced the hit.

Manifest-backed index sets

For stable warmed setups, define a manifest and let greph-index set operate on the grouped indexes:

./vendor/bin/greph-index set build
./vendor/bin/greph-index set stats --mode ast-index --dry-refresh
./vendor/bin/greph-index set search --mode ast-index --show-index-origin 'new $CLASS()' .

The set subcommands share the same lifecycle handling, stale checks, and planner diagnostics as direct ast-index usage.

Planner diagnostics

--trace-plan shows how Greph narrowed the query before the full matcher ran. For ast-index that typically includes:

  • the coarse fact signature extracted from the pattern
  • how many warmed indexes were searched
  • how many files were rejected by the fact filter
  • how many candidate files were re-parsed for exact structural verification

What the index stores

For every indexed file the AST fact extractor records a row per "interesting" node. The set of fact kinds covers the constructs that AST patterns most commonly target:

  • Function and method calls (by called name)
  • Class instantiations (by class name)
  • Class, interface, trait, and enum declarations (by name)
  • Function declarations (by name)
  • Method declarations (by name and visibility)
  • Use statements

When a query arrives, Greph\Ast\AstFactQuery translates the search pattern into a coarse fact signature (for example, "must contain a new of any class") and intersects the fact store. Files that survive the intersection are then re-parsed and matched against the full pattern.

This means the fact index is a prefilter, not a full-fidelity match store. The match step still parses the source, so structural correctness is identical to native AST search; only the candidate set is smaller.

When to use it

Indexed AST mode is the right tool when:

  • You run the same structural queries against the same codebase repeatedly.
  • The patterns target identifiable shapes (call name, class name, declaration kind) that the fact extractor stores.
  • You want lower latency than re-parsing every file.

For patterns whose distinguishing feature is not in the fact set (for example, complex nested expressions without a recognizable head), the cached AST mode is usually a better fit because it skips the parse step entirely without relying on a coarse signature.

Programmatic use

use Greph\Greph;
use Greph\Ast\AstSearchOptions;
use Greph\Index\IndexLifecycleProfile;

Greph::buildAstIndex('.', lifecycle: IndexLifecycleProfile::OpportunisticRefresh);

$matches = Greph::searchAstIndexed(
    'new $CLASS()',
    'src',
    new AstSearchOptions(jobs: 4, tracePlan: true),
);

searchAstIndexed accepts the same AstSearchOptions as native AST search and returns the same list<AstMatch> shape. For multi-index or manifest-backed flows, use searchAstIndexedMany(...) and searchAstIndexedSet(...).

On this page