Blog Post: Structural Code Analysis with ast-grep in Polyglot Monorepos

Title Options

“Taming Polyglot Monorepos with ast-grep: A Practical Guide”
“Fast Structural Code Search Across Languages Using ast-grep”
“Beyond Grep: Structural Code Analysis for NixOS and Beyond”
“ast-grep: The Missing Link Between grep and Semantic Analysis”

Target Audience

NixOS users managing complex configurations
DevOps engineers with polyglot infrastructure repos
Kubernetes/Tekton contributors
Anyone maintaining multi-language codebases

Key Points / Hook

Polyglot monorepos are hard to analyze with traditional tools
Text search (grep/ripgrep) gives false positives
Language-specific tools don’t scale across languages
ast-grep bridges the gap: fast + accurate + multi-language

Outline

Introduction (200 words)

The Problem:

Managing a homelab with NixOS = complex monorepo
Mix of Nix, Bash, Go, Python, YAML, JSON
Traditional tools fall short:
- ripgrep: Fast but inaccurate (matches comments, strings)
- semgrep: Accurate but slow (30s for 400 files)
- Language-specific: Fragmented, multiple configs

The Solution:

ast-grep: AST-based pattern matching
One tool, one config, multiple languages
Fast enough for interactive use (0.02s)
Accurate enough to avoid false positives

What is ast-grep? (300 words)

Concept:

“grep for code structure”
Parses code into Abstract Syntax Tree
Matches patterns, not text
Built in Rust, uses tree-sitter grammars

Example:

# Text search - many false positives
rg "password" --type nix
# → Matches: comments, descriptions, variable names

# Structural search - only assignments
ast-grep -p 'password = $VAL' -l nix
# → Matches: Only actual password assignments

Key Features:

Pattern syntax: $VAR for wildcards, $$$ARGS for lists
Fix patterns: Interactive refactoring
YAML rules: Custom linting
LSP support: Editor integration
20+ languages: Nix, Bash, Go, Python, YAML, etc.

Use Case: Home Repository (400 words)

Context:

NixOS monorepo: 8 hosts, custom modules, tools
415 files: 226 Nix, 142 Bash, 47 Go, + more
Need to maintain consistency across languages

Problems to Solve:

Inconsistent Nix module options
Bash scripts without error handling
Security issues (hardcoded secrets, unsafe patterns)
Finding deprecated patterns

Implementation:

Created .ast-grep/ with custom rules:

Nix Rules:

nix-explicit-option-types: Enforce type annotations
nix-prefer-inherit: Use inherit for cleaner code
nix-prefer-optional: Conditional list items

Bash Rules:

bash-require-strict-mode: Enforce set -euo pipefail
bash-unsafe-rm-rf: Catch dangerous rm -rf
bash-use-command-over-which: POSIX compliance

Security Rules:

security-unsafe-curl-pipe-sh: Prevent curl | sh

Results:

Files scanned: 415
Scan time: 0.022 seconds
Issues found: 154
- 137 warnings (missing type annotations)
- 10 errors (missing strict mode)
- 5 errors (unsafe rm -rf)
- 1 error (curl | sh pattern)

Impact:

Found real issues (scripts without error handling)
Caught security anti-pattern (curl | sh in example)
Identified 137 places needing better documentation (types)
Fast enough to run on every save

Real-World Examples (500 words)

Example 1: Finding Unsafe rm -rf

Found in nix-flake-update script:

rm -rf "$WORKTREE_DIR" || true

Flagged as error: “Ensure variable is not empty”

Analysis:

In this case, safe (variable set at script start with timestamp)
But the rule is valuable - catches real bugs elsewhere
Shows ast-grep helps find patterns, human reviews context

Fix applied to other scripts:

# Before
rm -rf $TEMP_DIR

# After
[[ -n "$TEMP_DIR" ]] && rm -rf "$TEMP_DIR"

Example 2: Bash Scripts Without Error Handling

Found 10 scripts missing set -euo pipefail:

install.sh
keyboard firmware builders
imperative deployment scripts

Why it matters:

Scripts continue on errors → silent failures
Undefined variables → unpredictable behavior
Pipeline failures hidden

Fix:

#!/usr/bin/env bash
set -euo pipefail  # ← Added this line

# Rest of script...

Result: More robust scripts, easier debugging

Example 3: NixOS Module Type Annotations

Found 137 mkOption calls without type annotations:

# Before
myOption = mkOption {
  default = "value";
  description = "My option";
};

# After
myOption = mkOption {
  type = types.str;  # ← Added this
  default = "value";
  description = "My option";
};

Benefits:

Better documentation
Type checking catches errors
Auto-completion in editors
Consistent style

Example 4: Interactive Refactoring

Suppose we want to standardize on a new function:

ast-grep -p 'oldFunc($$$ARGS)' \
  --rewrite 'newFunc($$$ARGS)' \
  --interactive

For each match:

Shows context (surrounding code)
Shows proposed change
Asks: Apply? (y/n/q)

vs. sed/awk:

No risk of matching in comments/strings
Review each change
Skip false positives
Undo is tracked

Beyond Personal Projects: Tekton (400 words)

The Challenge:

Tekton: Kubernetes-native CI/CD
Large Go codebase (1500+ files)
YAML CRDs and examples (500+ files)
API migration: v1beta1 → v1

Use Case 1: API Migration

Find all v1beta1 usage:

ast-grep -p 'apiVersion: tekton.dev/v1beta1' -l yaml | wc -l
# → 247 files

Interactive migration:

ast-grep -p 'apiVersion: tekton.dev/v1beta1' \
  --rewrite 'apiVersion: tekton.dev/v1' \
  --interactive \
  -l yaml examples/

Review each change, skip generated files.

Use Case 2: Security Scanning

Find hardcoded secrets:

id: go-no-hardcoded-secrets
message: Potential hardcoded secret
severity: error
language: go
rule:
  any:
    - pattern: password := "$SECRET"
    - pattern: token := "$SECRET"
  where:
    SECRET:
      regex: '^[A-Za-z0-9+/=]{20,}$'

Use Case 3: Best Practices

Enforce RBAC markers in controllers:

id: go-require-rbac-markers
message: Add RBAC markers for controller
severity: warning
language: go
rule:
  pattern: |
    func (r *$REC) Reconcile(ctx context.Context, req ctrl.Request) {
      $$$
    }
  not:
    precedes:
      pattern: // +kubebuilder:rbac

Performance:

Large repo: ~2000 files
Full scan: < 2 seconds
Fast enough for PR checks

Impact:

Automated API migration guidance
Caught security issues before code review
Consistent error handling across codebase
Saved hours in manual review

Comparison with Other Tools (300 words)

vs. ripgrep:

ripgrep: 0.005s (faster)
ast-grep: 0.022s (more accurate)
Use ripgrep for quick searches
Use ast-grep for refactoring

vs. semgrep:

semgrep: 30s (deeper analysis)
ast-grep: 0.022s (structural patterns)
Use semgrep for security audits
Use ast-grep for daily linting

vs. Language-Specific Tools:

statix (Nix): Deep semantic analysis
shellcheck (Bash): Shell-specific checks
golangci-lint (Go): Comprehensive linting
ast-grep complements, not replaces

Decision Matrix:

Speed needed:     ripgrep > ast-grep > semgrep
Accuracy needed:  semgrep > ast-grep > ripgrep
Cross-language:   ast-grep > semgrep > ripgrep
Refactoring:      ast-grep > (IDE tools) > ripgrep

Best Practice: Use Multiple Tools

lint:
  statix check .              # Nix semantics
  shellcheck *.sh             # Bash analysis
  ast-grep scan               # Cross-language patterns
  semgrep --config=security   # Weekly security audit

Performance Benchmarks (200 words)

Home Repository (415 files):

Tool	Time	Files/sec
ripgrep	0.005s	83,000
ast-grep	0.022s	18,900
semgrep	30s	14

Large Codebase (2000 files):

Tool	Time
ripgrep	0.01s
ast-grep	0.5s
semgrep	10min

Memory Usage:

ast-grep: < 50MB
semgrep: ~500MB
Language tools vary

Why Speed Matters:

Interactive use: Need < 1s feedback
Pre-commit hooks: Need < 5s total
CI/CD: < 30s ideal for fast iteration
ast-grep fits all three

Getting Started (300 words)

Installation:

# Nix
nix profile install nixpkgs#ast-grep

# Or
nix-shell -p ast-grep

Quick Start:

# Search
ast-grep -p 'pattern' -l language file.ext

# Interactive refactor
ast-grep -p 'old' --rewrite 'new' --interactive

# Scan with rules
ast-grep new  # Initialize project
ast-grep scan # Run linter

Create Rules:

Create .ast-grep/sgconfig.yml:

ruleDirs:
  - rules
languageGlobs:
  nix: ["**/*.nix"]
  bash: ["**/*.sh"]

Create .ast-grep/rules/my-rule.yml:

id: my-rule
message: Your message
severity: warning
language: Nix
rule:
  pattern: $PATTERN
fix: $FIX  # optional

Test:

ast-grep scan

Workflow:

# 1. Find patterns
rg "approximate_text"          # Quick exploration
ast-grep -p 'exact_pattern'    # Accurate search

# 2. Refactor
ast-grep -p 'old' --rewrite 'new' --interactive

# 3. Lint
ast-grep scan                  # Custom rules
make lint                      # All tools

# 4. Verify
make test

Tips and Best Practices (300 words)

1. Start Simple

Begin with one rule
Test on small directory first
Iterate based on false positives

2. Use the Playground

https://ast-grep.github.io/playground.html
Test patterns before creating rules
See AST structure

3. Combine with Other Tools

Don’t replace language-specific linters
Use ast-grep for custom patterns
Layer tools: fast → comprehensive

4. Write Good Rules

# Bad: Vague message
message: Fix this code

# Good: Actionable message
message: Use lib.mkEnableOption for boolean options

# Great: With explanation
message: Use lib.mkEnableOption for boolean options
note: |
  mkEnableOption provides:
  - Consistent description format
  - Standard default (false)
  - Better documentation
  
  Example: enable = mkEnableOption "my service";

5. Severity Levels

error: Must fix (breaks build/security)
warning: Should fix (best practices)
info: Consider fixing (style)
hint: Optional (suggestions)

6. Interactive Review

Always use --interactive for refactoring
Review context, not just the match
Some patterns are intentional

7. Performance Tuning

Use specific directories: ast-grep scan systems/
Filter by severity: --error-only
Use --json for processing
Cache results if needed

8. Share Rules

Create rule repository for your org
Contribute to ast-grep catalog
Document why rules exist

Limitations and When NOT to Use (200 words)

ast-grep is NOT for:

Deep Semantic Analysis
- Type checking: Use proper type checkers
- Data flow: Use semgrep or language tools
- Complex relationships: Use IDEs
Simple Text Search
- Quick exploration: Use ripgrep
- Log searching: Use grep/awk
- String finding: Use text tools
Replacing Language Tools
- Go: Still need golangci-lint
- Nix: Still need statix
- Bash: Still need shellcheck

Gotchas:

Pattern Complexity
- Some patterns are hard to express
- Test in playground first
False Negatives
- Only matches syntactically valid code
- Won’t find typos or syntax errors
Language Support
- Quality depends on tree-sitter parser
- Some languages better than others

Solution: Layer Tools

ast-grep for patterns and refactoring
Language tools for deep analysis
Text search for exploration
All three together for comprehensive coverage

Conclusion (200 words)

Summary:

ast-grep fills gap between grep and semantic analysis
Perfect for polyglot monorepos
Fast enough for interactive use
Accurate enough to avoid false positives
Valuable for refactoring and custom linting

Key Takeaways:

Use ast-grep for structural code search
Create custom rules for your project
Combine with language-specific tools
Interactive refactoring beats sed/awk
Speed matters for developer experience

Results from Home Repository:

154 issues found in 0.022 seconds
Real bugs caught (missing error handling)
Better code quality (type annotations)
Safer scripts (variable checking)

Next Steps:

Install ast-grep
Try pattern search on your code
Create one custom rule
Integrate into your workflow
Share rules with community

Resources:

Official docs: https://ast-grep.github.io/
Playground: https://ast-grep.github.io/playground.html
This guide: [link to blog post]
Example rules: [link to GitHub repo]

Final Thought: In a world of complex polyglot codebases, ast-grep is the structural search tool we’ve been missing. Fast, accurate, and flexible - it’s become an essential part of my development workflow.

Publishing Checklist

Write full article from outline
Add code examples with syntax highlighting
Create diagrams (workflow, comparison matrix)
Add screenshots (playground, scan results)
Include benchmark graphs
Add real repository examples
Link to example rules on GitHub
Proofread and edit
Get feedback from community
Publish on:
- Personal blog
- Dev.to
- Medium (optional)
- Hacker News (if appropriate)
- Reddit (r/NixOS, r/devops)
- Lobsters
Share on:
- Twitter/X
- Mastodon
- LinkedIn
- NixOS Discourse
- ast-grep Discord

Estimated Length

Full article: 3000-3500 words
Reading time: 15-20 minutes
Code examples: 20-25 snippets
Images/diagrams: 5-7

Follow-up: “Building a Shared ast-grep Rule Library for NixOS”
Series: “Code Quality in Polyglot Monorepos”
Video: “ast-grep Live Demo and Walkthrough”