Analyze Workflow

Analyze email patterns, statistics, and trends using mu queries and data processing.

Workflow Steps

1. Understand Analysis Request

Identify what the user wants to analyze:

Volume: Email counts over time
People: Top senders/recipients, communication patterns
Topics: Subject patterns, keyword frequency
Threads: Conversation analysis
Attachments: File type distribution
Response times: Time between emails in threads
Activity patterns: Time of day, day of week

2. Gather Data

Use mu find with appropriate queries and JSON output:

# Get structured data for analysis
mu find <query> --format=json > emails.json

# Count emails by criteria
mu find <query> | wc -l

# Get specific date ranges
mu find date:20250101..20250131 --format=json

3. Process Data

Use shell tools to analyze:

# Top senders
mu find <query> --format=json | jq -r '.from' | sort | uniq -c | sort -rn

# Emails by month
mu find <query> --format=json | jq -r '.date | strftime("%Y-%m")' | sort | uniq -c

# Attachment types
mu find attach:* --format=json | jq -r '.attachments[].name' | sed 's/.*\.//' | sort | uniq -c

# Average email size
mu find <query> --format=json | jq '.size' | awk '{sum+=$1; n++} END {print sum/n}'

4. Visualize Results

Present findings clearly:

Tables: Formatted counts and statistics
Lists: Top N senders, subjects, etc.
Summaries: Key insights and patterns
Comparisons: Personal vs work, this month vs last month

5. Provide Insights

Interpret the data:

Identify trends (increasing/decreasing volume)
Highlight patterns (busiest times, top correspondents)
Suggest actions (archive old threads, follow up on flagged items)

Common Analysis Patterns

Email Volume Analysis

Count emails by account:

echo "Personal: $(mu find maildir:/icloud/* | wc -l)"
echo "Work: $(mu find maildir:/redhat/* | wc -l)"

Count by month for the year:

for month in {01..12}; do
  count=$(mu find date:2025${month}01..2025${month}31 | wc -l)
  echo "2025-${month}: ${count}"
done

Unread email count:

mu find flag:unread maildir:/icloud/* | wc -l
mu find flag:unread maildir:/redhat/* | wc -l

People Analysis

Top 10 senders:

mu find <query> --format=json | \
  jq -r '.from' | \
  sort | uniq -c | sort -rn | head -10

Email exchange with specific person:

mu find "from:alice@example.com OR to:alice@example.com" | wc -l

Communication frequency over time:

mu find from:alice@example.com --format=json | \
  jq -r '.date | strftime("%Y-%m")' | \
  sort | uniq -c

Topic Analysis

Common subject keywords:

mu find <query> --format=json | \
  jq -r '.subject' | \
  tr '[:upper:]' '[:lower:]' | \
  grep -oE '\w+' | \
  sort | uniq -c | sort -rn | head -20

Emails by project (work):

for project in knative kubernetes konflux; do
  count=$(mu find maildir:/redhat/${project}/* | wc -l)
  echo "${project}: ${count}"
done

Attachment Analysis

Total attachments count:

mu find attach:* | wc -l

Attachment types distribution:

mu find attach:* --format=json | \
  jq -r '.attachments[]?.name' | \
  sed 's/.*\.//' | \
  tr '[:upper:]' '[:lower:]' | \
  sort | uniq -c | sort -rn

Large emails with attachments:

mu find attach:* size:1M.. --format=json | \
  jq -r '"\(.size) \(.subject)"' | \
  sort -rn

Thread Analysis

Thread depth (replies):

mu find <query> --format=json | \
  jq -r 'select(.references != null) | .references | length' | \
  awk '{sum+=$1; n++} END {print "Avg replies:", sum/n}'

Longest threads:

mu find <query> --format=json | \
  jq -r 'select(.references != null) | "\(.references | length) \(.subject)"' | \
  sort -rn | head -10

Temporal Analysis

Emails by day of week:

mu find <query> --format=json | \
  jq -r '.date | strftime("%A")' | \
  sort | uniq -c

Emails by hour of day:

mu find <query> --format=json | \
  jq -r '.date | strftime("%H")' | \
  sort | uniq -c | sort -k2 -n

Activity timeline (last 7 days):

for i in {0..6}; do
  date=$(date -d "-${i} days" +%Y%m%d)
  count=$(mu find date:${date} | wc -l)
  echo "$(date -d "-${i} days" +%Y-%m-%d): ${count}"
done

Best Practices

Performance

Use specific maildir queries to limit scope
Process JSON output for complex analysis
Use streaming tools (jq, awk) for large datasets
Cache results for repeated analysis

Privacy

Aggregate personal and work data separately
Redact email addresses in summaries when appropriate
Be careful with subject content in analysis

Data Quality

Handle missing fields gracefully (use jq select)
Account for timezone differences in date analysis
Normalize data (lowercase, trim) for accurate counts

Tool Recommendations

Use these tools for analysis (all available in nixpkgs):

# In nix-shell
nix-shell -p jq gnugrep gawk coreutils dateutils

# Or using nix-shell shebang in scripts
#!/usr/bin/env nix-shell
#! nix-shell -i bash -p jq gnugrep gawk

jq - JSON processing and querying awk - Text processing and calculations grep/sed - Pattern matching and text manipulation sort/uniq - Counting and deduplication dateutils - Advanced date manipulation

Examples

Monthly email volume comparison:

echo "Last month: $(mu find date:1m..30d | wc -l)"
echo "This month: $(mu find date:30d.. | wc -l)"

Top work correspondents:

echo "Top 10 work email senders:"
mu find maildir:/redhat/* --format=json | \
  jq -r '.from' | \
  sort | uniq -c | sort -rn | head -10

Busiest email hour:

echo "Email activity by hour:"
mu find date:30d.. --format=json | \
  jq -r '.date | strftime("%H")' | \
  sort | uniq -c | sort -rn | head -5

Integration

This workflow often follows:

Search workflow - Initial data gathering
Analyze workflow - Process and analyze data
Present findings to user
Offer drill-down with Search or View workflows