main

Blog Post: Structural Code Analysis with ast-grep in Polyglot Monorepos

Title Options

  1. “Taming Polyglot Monorepos with ast-grep: A Practical Guide”
  2. “Fast Structural Code Search Across Languages Using ast-grep”
  3. “Beyond Grep: Structural Code Analysis for NixOS and Beyond”
  4. “ast-grep: The Missing Link Between grep and Semantic Analysis”

Target Audience

  • NixOS users managing complex configurations
  • DevOps engineers with polyglot infrastructure repos
  • Kubernetes/Tekton contributors
  • Anyone maintaining multi-language codebases

Key Points / Hook

  • Polyglot monorepos are hard to analyze with traditional tools
  • Text search (grep/ripgrep) gives false positives
  • Language-specific tools don’t scale across languages
  • ast-grep bridges the gap: fast + accurate + multi-language

Outline

Introduction (200 words)

The Problem:

  • Managing a homelab with NixOS = complex monorepo
  • Mix of Nix, Bash, Go, Python, YAML, JSON
  • Traditional tools fall short:
    • ripgrep: Fast but inaccurate (matches comments, strings)
    • semgrep: Accurate but slow (30s for 400 files)
    • Language-specific: Fragmented, multiple configs

The Solution:

  • ast-grep: AST-based pattern matching
  • One tool, one config, multiple languages
  • Fast enough for interactive use (0.02s)
  • Accurate enough to avoid false positives

What is ast-grep? (300 words)

Concept:

  • “grep for code structure”
  • Parses code into Abstract Syntax Tree
  • Matches patterns, not text
  • Built in Rust, uses tree-sitter grammars

Example:

# Text search - many false positives
rg "password" --type nix
# → Matches: comments, descriptions, variable names

# Structural search - only assignments
ast-grep -p 'password = $VAL' -l nix
# → Matches: Only actual password assignments

Key Features:

  • Pattern syntax: $VAR for wildcards, $$$ARGS for lists
  • Fix patterns: Interactive refactoring
  • YAML rules: Custom linting
  • LSP support: Editor integration
  • 20+ languages: Nix, Bash, Go, Python, YAML, etc.

Use Case: Home Repository (400 words)

Context:

  • NixOS monorepo: 8 hosts, custom modules, tools
  • 415 files: 226 Nix, 142 Bash, 47 Go, + more
  • Need to maintain consistency across languages

Problems to Solve:

  1. Inconsistent Nix module options
  2. Bash scripts without error handling
  3. Security issues (hardcoded secrets, unsafe patterns)
  4. Finding deprecated patterns

Implementation:

Created .ast-grep/ with custom rules:

Nix Rules:

  • nix-explicit-option-types: Enforce type annotations
  • nix-prefer-inherit: Use inherit for cleaner code
  • nix-prefer-optional: Conditional list items

Bash Rules:

  • bash-require-strict-mode: Enforce set -euo pipefail
  • bash-unsafe-rm-rf: Catch dangerous rm -rf
  • bash-use-command-over-which: POSIX compliance

Security Rules:

  • security-unsafe-curl-pipe-sh: Prevent curl | sh

Results:

Files scanned: 415
Scan time: 0.022 seconds
Issues found: 154
- 137 warnings (missing type annotations)
- 10 errors (missing strict mode)
- 5 errors (unsafe rm -rf)
- 1 error (curl | sh pattern)

Impact:

  • Found real issues (scripts without error handling)
  • Caught security anti-pattern (curl | sh in example)
  • Identified 137 places needing better documentation (types)
  • Fast enough to run on every save

Real-World Examples (500 words)

Example 1: Finding Unsafe rm -rf

Found in nix-flake-update script:

rm -rf "$WORKTREE_DIR" || true

Flagged as error: “Ensure variable is not empty”

Analysis:

  • In this case, safe (variable set at script start with timestamp)
  • But the rule is valuable - catches real bugs elsewhere
  • Shows ast-grep helps find patterns, human reviews context

Fix applied to other scripts:

# Before
rm -rf $TEMP_DIR

# After
[[ -n "$TEMP_DIR" ]] && rm -rf "$TEMP_DIR"

Example 2: Bash Scripts Without Error Handling

Found 10 scripts missing set -euo pipefail:

  • install.sh
  • keyboard firmware builders
  • imperative deployment scripts

Why it matters:

  • Scripts continue on errors → silent failures
  • Undefined variables → unpredictable behavior
  • Pipeline failures hidden

Fix:

#!/usr/bin/env bash
set -euo pipefail  # ← Added this line

# Rest of script...

Result: More robust scripts, easier debugging

Example 3: NixOS Module Type Annotations

Found 137 mkOption calls without type annotations:

# Before
myOption = mkOption {
  default = "value";
  description = "My option";
};

# After
myOption = mkOption {
  type = types.str;  # ← Added this
  default = "value";
  description = "My option";
};

Benefits:

  • Better documentation
  • Type checking catches errors
  • Auto-completion in editors
  • Consistent style

Example 4: Interactive Refactoring

Suppose we want to standardize on a new function:

ast-grep -p 'oldFunc($$$ARGS)' \
  --rewrite 'newFunc($$$ARGS)' \
  --interactive

For each match:

  1. Shows context (surrounding code)
  2. Shows proposed change
  3. Asks: Apply? (y/n/q)

vs. sed/awk:

  • No risk of matching in comments/strings
  • Review each change
  • Skip false positives
  • Undo is tracked

Beyond Personal Projects: Tekton (400 words)

The Challenge:

  • Tekton: Kubernetes-native CI/CD
  • Large Go codebase (1500+ files)
  • YAML CRDs and examples (500+ files)
  • API migration: v1beta1 → v1

Use Case 1: API Migration

Find all v1beta1 usage:

ast-grep -p 'apiVersion: tekton.dev/v1beta1' -l yaml | wc -l
# → 247 files

Interactive migration:

ast-grep -p 'apiVersion: tekton.dev/v1beta1' \
  --rewrite 'apiVersion: tekton.dev/v1' \
  --interactive \
  -l yaml examples/

Review each change, skip generated files.

Use Case 2: Security Scanning

Find hardcoded secrets:

id: go-no-hardcoded-secrets
message: Potential hardcoded secret
severity: error
language: go
rule:
  any:
    - pattern: password := "$SECRET"
    - pattern: token := "$SECRET"
  where:
    SECRET:
      regex: '^[A-Za-z0-9+/=]{20,}$'

Use Case 3: Best Practices

Enforce RBAC markers in controllers:

id: go-require-rbac-markers
message: Add RBAC markers for controller
severity: warning
language: go
rule:
  pattern: |
    func (r *$REC) Reconcile(ctx context.Context, req ctrl.Request) {
      $$$
    }
  not:
    precedes:
      pattern: // +kubebuilder:rbac

Performance:

  • Large repo: ~2000 files
  • Full scan: < 2 seconds
  • Fast enough for PR checks

Impact:

  • Automated API migration guidance
  • Caught security issues before code review
  • Consistent error handling across codebase
  • Saved hours in manual review

Comparison with Other Tools (300 words)

vs. ripgrep:

  • ripgrep: 0.005s (faster)
  • ast-grep: 0.022s (more accurate)
  • Use ripgrep for quick searches
  • Use ast-grep for refactoring

vs. semgrep:

  • semgrep: 30s (deeper analysis)
  • ast-grep: 0.022s (structural patterns)
  • Use semgrep for security audits
  • Use ast-grep for daily linting

vs. Language-Specific Tools:

  • statix (Nix): Deep semantic analysis
  • shellcheck (Bash): Shell-specific checks
  • golangci-lint (Go): Comprehensive linting
  • ast-grep complements, not replaces

Decision Matrix:

Speed needed:     ripgrep > ast-grep > semgrep
Accuracy needed:  semgrep > ast-grep > ripgrep
Cross-language:   ast-grep > semgrep > ripgrep
Refactoring:      ast-grep > (IDE tools) > ripgrep

Best Practice: Use Multiple Tools

lint:
  statix check .              # Nix semantics
  shellcheck *.sh             # Bash analysis
  ast-grep scan               # Cross-language patterns
  semgrep --config=security   # Weekly security audit

Performance Benchmarks (200 words)

Home Repository (415 files):

Tool Time Files/sec
ripgrep 0.005s 83,000
ast-grep 0.022s 18,900
semgrep 30s 14

Large Codebase (2000 files):

Tool Time
ripgrep 0.01s
ast-grep 0.5s
semgrep 10min

Memory Usage:

  • ast-grep: < 50MB
  • semgrep: ~500MB
  • Language tools vary

Why Speed Matters:

  • Interactive use: Need < 1s feedback
  • Pre-commit hooks: Need < 5s total
  • CI/CD: < 30s ideal for fast iteration
  • ast-grep fits all three

Getting Started (300 words)

Installation:

# Nix
nix profile install nixpkgs#ast-grep

# Or
nix-shell -p ast-grep

Quick Start:

# Search
ast-grep -p 'pattern' -l language file.ext

# Interactive refactor
ast-grep -p 'old' --rewrite 'new' --interactive

# Scan with rules
ast-grep new  # Initialize project
ast-grep scan # Run linter

Create Rules:

  1. Create .ast-grep/sgconfig.yml:
ruleDirs:
  - rules
languageGlobs:
  nix: ["**/*.nix"]
  bash: ["**/*.sh"]
  1. Create .ast-grep/rules/my-rule.yml:
id: my-rule
message: Your message
severity: warning
language: Nix
rule:
  pattern: $PATTERN
fix: $FIX  # optional
  1. Test:
ast-grep scan

Workflow:

# 1. Find patterns
rg "approximate_text"          # Quick exploration
ast-grep -p 'exact_pattern'    # Accurate search

# 2. Refactor
ast-grep -p 'old' --rewrite 'new' --interactive

# 3. Lint
ast-grep scan                  # Custom rules
make lint                      # All tools

# 4. Verify
make test

Tips and Best Practices (300 words)

1. Start Simple

  • Begin with one rule
  • Test on small directory first
  • Iterate based on false positives

2. Use the Playground

3. Combine with Other Tools

  • Don’t replace language-specific linters
  • Use ast-grep for custom patterns
  • Layer tools: fast → comprehensive

4. Write Good Rules

# Bad: Vague message
message: Fix this code

# Good: Actionable message
message: Use lib.mkEnableOption for boolean options

# Great: With explanation
message: Use lib.mkEnableOption for boolean options
note: |
  mkEnableOption provides:
  - Consistent description format
  - Standard default (false)
  - Better documentation
  
  Example: enable = mkEnableOption "my service";

5. Severity Levels

  • error: Must fix (breaks build/security)
  • warning: Should fix (best practices)
  • info: Consider fixing (style)
  • hint: Optional (suggestions)

6. Interactive Review

  • Always use --interactive for refactoring
  • Review context, not just the match
  • Some patterns are intentional

7. Performance Tuning

  • Use specific directories: ast-grep scan systems/
  • Filter by severity: --error-only
  • Use --json for processing
  • Cache results if needed

8. Share Rules

  • Create rule repository for your org
  • Contribute to ast-grep catalog
  • Document why rules exist

Limitations and When NOT to Use (200 words)

ast-grep is NOT for:

  1. Deep Semantic Analysis

    • Type checking: Use proper type checkers
    • Data flow: Use semgrep or language tools
    • Complex relationships: Use IDEs
  2. Simple Text Search

    • Quick exploration: Use ripgrep
    • Log searching: Use grep/awk
    • String finding: Use text tools
  3. Replacing Language Tools

    • Go: Still need golangci-lint
    • Nix: Still need statix
    • Bash: Still need shellcheck

Gotchas:

  1. Pattern Complexity

    • Some patterns are hard to express
    • Test in playground first
  2. False Negatives

    • Only matches syntactically valid code
    • Won’t find typos or syntax errors
  3. Language Support

    • Quality depends on tree-sitter parser
    • Some languages better than others

Solution: Layer Tools

  • ast-grep for patterns and refactoring
  • Language tools for deep analysis
  • Text search for exploration
  • All three together for comprehensive coverage

Conclusion (200 words)

Summary:

  • ast-grep fills gap between grep and semantic analysis
  • Perfect for polyglot monorepos
  • Fast enough for interactive use
  • Accurate enough to avoid false positives
  • Valuable for refactoring and custom linting

Key Takeaways:

  1. Use ast-grep for structural code search
  2. Create custom rules for your project
  3. Combine with language-specific tools
  4. Interactive refactoring beats sed/awk
  5. Speed matters for developer experience

Results from Home Repository:

  • 154 issues found in 0.022 seconds
  • Real bugs caught (missing error handling)
  • Better code quality (type annotations)
  • Safer scripts (variable checking)

Next Steps:

  1. Install ast-grep
  2. Try pattern search on your code
  3. Create one custom rule
  4. Integrate into your workflow
  5. Share rules with community

Resources:

Final Thought: In a world of complex polyglot codebases, ast-grep is the structural search tool we’ve been missing. Fast, accurate, and flexible - it’s become an essential part of my development workflow.


Publishing Checklist

  • Write full article from outline
  • Add code examples with syntax highlighting
  • Create diagrams (workflow, comparison matrix)
  • Add screenshots (playground, scan results)
  • Include benchmark graphs
  • Add real repository examples
  • Link to example rules on GitHub
  • Proofread and edit
  • Get feedback from community
  • Publish on:
    • Personal blog
    • Dev.to
    • Medium (optional)
    • Hacker News (if appropriate)
    • Reddit (r/NixOS, r/devops)
    • Lobsters
  • Share on:
    • Twitter/X
    • Mastodon
    • LinkedIn
    • NixOS Discourse
    • ast-grep Discord

Estimated Length

  • Full article: 3000-3500 words
  • Reading time: 15-20 minutes
  • Code examples: 20-25 snippets
  • Images/diagrams: 5-7
  • Follow-up: “Building a Shared ast-grep Rule Library for NixOS”
  • Series: “Code Quality in Polyglot Monorepos”
  • Video: “ast-grep Live Demo and Walkthrough”