Blog Post: Structural Code Analysis with ast-grep in Polyglot Monorepos
Title Options
- “Taming Polyglot Monorepos with ast-grep: A Practical Guide”
- “Fast Structural Code Search Across Languages Using ast-grep”
- “Beyond Grep: Structural Code Analysis for NixOS and Beyond”
- “ast-grep: The Missing Link Between grep and Semantic Analysis”
Target Audience
- NixOS users managing complex configurations
- DevOps engineers with polyglot infrastructure repos
- Kubernetes/Tekton contributors
- Anyone maintaining multi-language codebases
Key Points / Hook
- Polyglot monorepos are hard to analyze with traditional tools
- Text search (grep/ripgrep) gives false positives
- Language-specific tools don’t scale across languages
- ast-grep bridges the gap: fast + accurate + multi-language
Outline
Introduction (200 words)
The Problem:
- Managing a homelab with NixOS = complex monorepo
- Mix of Nix, Bash, Go, Python, YAML, JSON
- Traditional tools fall short:
- ripgrep: Fast but inaccurate (matches comments, strings)
- semgrep: Accurate but slow (30s for 400 files)
- Language-specific: Fragmented, multiple configs
The Solution:
- ast-grep: AST-based pattern matching
- One tool, one config, multiple languages
- Fast enough for interactive use (0.02s)
- Accurate enough to avoid false positives
What is ast-grep? (300 words)
Concept:
- “grep for code structure”
- Parses code into Abstract Syntax Tree
- Matches patterns, not text
- Built in Rust, uses tree-sitter grammars
Example:
# Text search - many false positives
rg "password" --type nix
# → Matches: comments, descriptions, variable names
# Structural search - only assignments
ast-grep -p 'password = $VAL' -l nix
# → Matches: Only actual password assignments
Key Features:
- Pattern syntax:
$VARfor wildcards,$$$ARGSfor lists - Fix patterns: Interactive refactoring
- YAML rules: Custom linting
- LSP support: Editor integration
- 20+ languages: Nix, Bash, Go, Python, YAML, etc.
Use Case: Home Repository (400 words)
Context:
- NixOS monorepo: 8 hosts, custom modules, tools
- 415 files: 226 Nix, 142 Bash, 47 Go, + more
- Need to maintain consistency across languages
Problems to Solve:
- Inconsistent Nix module options
- Bash scripts without error handling
- Security issues (hardcoded secrets, unsafe patterns)
- Finding deprecated patterns
Implementation:
Created .ast-grep/ with custom rules:
Nix Rules:
nix-explicit-option-types: Enforce type annotationsnix-prefer-inherit: Use inherit for cleaner codenix-prefer-optional: Conditional list items
Bash Rules:
bash-require-strict-mode: Enforce set -euo pipefailbash-unsafe-rm-rf: Catch dangerous rm -rfbash-use-command-over-which: POSIX compliance
Security Rules:
security-unsafe-curl-pipe-sh: Prevent curl | sh
Results:
Files scanned: 415
Scan time: 0.022 seconds
Issues found: 154
- 137 warnings (missing type annotations)
- 10 errors (missing strict mode)
- 5 errors (unsafe rm -rf)
- 1 error (curl | sh pattern)
Impact:
- Found real issues (scripts without error handling)
- Caught security anti-pattern (curl | sh in example)
- Identified 137 places needing better documentation (types)
- Fast enough to run on every save
Real-World Examples (500 words)
Example 1: Finding Unsafe rm -rf
Found in nix-flake-update script:
rm -rf "$WORKTREE_DIR" || true
Flagged as error: “Ensure variable is not empty”
Analysis:
- In this case, safe (variable set at script start with timestamp)
- But the rule is valuable - catches real bugs elsewhere
- Shows ast-grep helps find patterns, human reviews context
Fix applied to other scripts:
# Before
rm -rf $TEMP_DIR
# After
[[ -n "$TEMP_DIR" ]] && rm -rf "$TEMP_DIR"
Example 2: Bash Scripts Without Error Handling
Found 10 scripts missing set -euo pipefail:
- install.sh
- keyboard firmware builders
- imperative deployment scripts
Why it matters:
- Scripts continue on errors → silent failures
- Undefined variables → unpredictable behavior
- Pipeline failures hidden
Fix:
#!/usr/bin/env bash
set -euo pipefail # ← Added this line
# Rest of script...
Result: More robust scripts, easier debugging
Example 3: NixOS Module Type Annotations
Found 137 mkOption calls without type annotations:
# Before
myOption = mkOption {
default = "value";
description = "My option";
};
# After
myOption = mkOption {
type = types.str; # ← Added this
default = "value";
description = "My option";
};
Benefits:
- Better documentation
- Type checking catches errors
- Auto-completion in editors
- Consistent style
Example 4: Interactive Refactoring
Suppose we want to standardize on a new function:
ast-grep -p 'oldFunc($$$ARGS)' \
--rewrite 'newFunc($$$ARGS)' \
--interactive
For each match:
- Shows context (surrounding code)
- Shows proposed change
- Asks: Apply? (y/n/q)
vs. sed/awk:
- No risk of matching in comments/strings
- Review each change
- Skip false positives
- Undo is tracked
Beyond Personal Projects: Tekton (400 words)
The Challenge:
- Tekton: Kubernetes-native CI/CD
- Large Go codebase (1500+ files)
- YAML CRDs and examples (500+ files)
- API migration: v1beta1 → v1
Use Case 1: API Migration
Find all v1beta1 usage:
ast-grep -p 'apiVersion: tekton.dev/v1beta1' -l yaml | wc -l
# → 247 files
Interactive migration:
ast-grep -p 'apiVersion: tekton.dev/v1beta1' \
--rewrite 'apiVersion: tekton.dev/v1' \
--interactive \
-l yaml examples/
Review each change, skip generated files.
Use Case 2: Security Scanning
Find hardcoded secrets:
id: go-no-hardcoded-secrets
message: Potential hardcoded secret
severity: error
language: go
rule:
any:
- pattern: password := "$SECRET"
- pattern: token := "$SECRET"
where:
SECRET:
regex: '^[A-Za-z0-9+/=]{20,}$'
Use Case 3: Best Practices
Enforce RBAC markers in controllers:
id: go-require-rbac-markers
message: Add RBAC markers for controller
severity: warning
language: go
rule:
pattern: |
func (r *$REC) Reconcile(ctx context.Context, req ctrl.Request) {
$$$
}
not:
precedes:
pattern: // +kubebuilder:rbac
Performance:
- Large repo: ~2000 files
- Full scan: < 2 seconds
- Fast enough for PR checks
Impact:
- Automated API migration guidance
- Caught security issues before code review
- Consistent error handling across codebase
- Saved hours in manual review
Comparison with Other Tools (300 words)
vs. ripgrep:
- ripgrep: 0.005s (faster)
- ast-grep: 0.022s (more accurate)
- Use ripgrep for quick searches
- Use ast-grep for refactoring
vs. semgrep:
- semgrep: 30s (deeper analysis)
- ast-grep: 0.022s (structural patterns)
- Use semgrep for security audits
- Use ast-grep for daily linting
vs. Language-Specific Tools:
- statix (Nix): Deep semantic analysis
- shellcheck (Bash): Shell-specific checks
- golangci-lint (Go): Comprehensive linting
- ast-grep complements, not replaces
Decision Matrix:
Speed needed: ripgrep > ast-grep > semgrep
Accuracy needed: semgrep > ast-grep > ripgrep
Cross-language: ast-grep > semgrep > ripgrep
Refactoring: ast-grep > (IDE tools) > ripgrep
Best Practice: Use Multiple Tools
lint:
statix check . # Nix semantics
shellcheck *.sh # Bash analysis
ast-grep scan # Cross-language patterns
semgrep --config=security # Weekly security audit
Performance Benchmarks (200 words)
Home Repository (415 files):
| Tool | Time | Files/sec |
|---|---|---|
| ripgrep | 0.005s | 83,000 |
| ast-grep | 0.022s | 18,900 |
| semgrep | 30s | 14 |
Large Codebase (2000 files):
| Tool | Time |
|---|---|
| ripgrep | 0.01s |
| ast-grep | 0.5s |
| semgrep | 10min |
Memory Usage:
- ast-grep: < 50MB
- semgrep: ~500MB
- Language tools vary
Why Speed Matters:
- Interactive use: Need < 1s feedback
- Pre-commit hooks: Need < 5s total
- CI/CD: < 30s ideal for fast iteration
- ast-grep fits all three
Getting Started (300 words)
Installation:
# Nix
nix profile install nixpkgs#ast-grep
# Or
nix-shell -p ast-grep
Quick Start:
# Search
ast-grep -p 'pattern' -l language file.ext
# Interactive refactor
ast-grep -p 'old' --rewrite 'new' --interactive
# Scan with rules
ast-grep new # Initialize project
ast-grep scan # Run linter
Create Rules:
- Create
.ast-grep/sgconfig.yml:
ruleDirs:
- rules
languageGlobs:
nix: ["**/*.nix"]
bash: ["**/*.sh"]
- Create
.ast-grep/rules/my-rule.yml:
id: my-rule
message: Your message
severity: warning
language: Nix
rule:
pattern: $PATTERN
fix: $FIX # optional
- Test:
ast-grep scan
Workflow:
# 1. Find patterns
rg "approximate_text" # Quick exploration
ast-grep -p 'exact_pattern' # Accurate search
# 2. Refactor
ast-grep -p 'old' --rewrite 'new' --interactive
# 3. Lint
ast-grep scan # Custom rules
make lint # All tools
# 4. Verify
make test
Tips and Best Practices (300 words)
1. Start Simple
- Begin with one rule
- Test on small directory first
- Iterate based on false positives
2. Use the Playground
- https://ast-grep.github.io/playground.html
- Test patterns before creating rules
- See AST structure
3. Combine with Other Tools
- Don’t replace language-specific linters
- Use ast-grep for custom patterns
- Layer tools: fast → comprehensive
4. Write Good Rules
# Bad: Vague message
message: Fix this code
# Good: Actionable message
message: Use lib.mkEnableOption for boolean options
# Great: With explanation
message: Use lib.mkEnableOption for boolean options
note: |
mkEnableOption provides:
- Consistent description format
- Standard default (false)
- Better documentation
Example: enable = mkEnableOption "my service";
5. Severity Levels
error: Must fix (breaks build/security)warning: Should fix (best practices)info: Consider fixing (style)hint: Optional (suggestions)
6. Interactive Review
- Always use
--interactivefor refactoring - Review context, not just the match
- Some patterns are intentional
7. Performance Tuning
- Use specific directories:
ast-grep scan systems/ - Filter by severity:
--error-only - Use
--jsonfor processing - Cache results if needed
8. Share Rules
- Create rule repository for your org
- Contribute to ast-grep catalog
- Document why rules exist
Limitations and When NOT to Use (200 words)
ast-grep is NOT for:
-
Deep Semantic Analysis
- Type checking: Use proper type checkers
- Data flow: Use semgrep or language tools
- Complex relationships: Use IDEs
-
Simple Text Search
- Quick exploration: Use ripgrep
- Log searching: Use grep/awk
- String finding: Use text tools
-
Replacing Language Tools
- Go: Still need golangci-lint
- Nix: Still need statix
- Bash: Still need shellcheck
Gotchas:
-
Pattern Complexity
- Some patterns are hard to express
- Test in playground first
-
False Negatives
- Only matches syntactically valid code
- Won’t find typos or syntax errors
-
Language Support
- Quality depends on tree-sitter parser
- Some languages better than others
Solution: Layer Tools
- ast-grep for patterns and refactoring
- Language tools for deep analysis
- Text search for exploration
- All three together for comprehensive coverage
Conclusion (200 words)
Summary:
- ast-grep fills gap between grep and semantic analysis
- Perfect for polyglot monorepos
- Fast enough for interactive use
- Accurate enough to avoid false positives
- Valuable for refactoring and custom linting
Key Takeaways:
- Use ast-grep for structural code search
- Create custom rules for your project
- Combine with language-specific tools
- Interactive refactoring beats sed/awk
- Speed matters for developer experience
Results from Home Repository:
- 154 issues found in 0.022 seconds
- Real bugs caught (missing error handling)
- Better code quality (type annotations)
- Safer scripts (variable checking)
Next Steps:
- Install ast-grep
- Try pattern search on your code
- Create one custom rule
- Integrate into your workflow
- Share rules with community
Resources:
- Official docs: https://ast-grep.github.io/
- Playground: https://ast-grep.github.io/playground.html
- This guide: [link to blog post]
- Example rules: [link to GitHub repo]
Final Thought: In a world of complex polyglot codebases, ast-grep is the structural search tool we’ve been missing. Fast, accurate, and flexible - it’s become an essential part of my development workflow.
Publishing Checklist
- Write full article from outline
- Add code examples with syntax highlighting
- Create diagrams (workflow, comparison matrix)
- Add screenshots (playground, scan results)
- Include benchmark graphs
- Add real repository examples
- Link to example rules on GitHub
- Proofread and edit
- Get feedback from community
- Publish on:
- Personal blog
- Dev.to
- Medium (optional)
- Hacker News (if appropriate)
- Reddit (r/NixOS, r/devops)
- Lobsters
- Share on:
- Twitter/X
- Mastodon
- NixOS Discourse
- ast-grep Discord
Estimated Length
- Full article: 3000-3500 words
- Reading time: 15-20 minutes
- Code examples: 20-25 snippets
- Images/diagrams: 5-7
Related Content
- Follow-up: “Building a Shared ast-grep Rule Library for NixOS”
- Series: “Code Quality in Polyglot Monorepos”
- Video: “ast-grep Live Demo and Walkthrough”