main

Analyze Workflow

Analyze email patterns, statistics, and trends using mu queries and data processing.

Workflow Steps

1. Understand Analysis Request

Identify what the user wants to analyze:

  • Volume: Email counts over time
  • People: Top senders/recipients, communication patterns
  • Topics: Subject patterns, keyword frequency
  • Threads: Conversation analysis
  • Attachments: File type distribution
  • Response times: Time between emails in threads
  • Activity patterns: Time of day, day of week

2. Gather Data

Use mu find with appropriate queries and JSON output:

# Get structured data for analysis
mu find <query> --format=json > emails.json

# Count emails by criteria
mu find <query> | wc -l

# Get specific date ranges
mu find date:20250101..20250131 --format=json

3. Process Data

Use shell tools to analyze:

# Top senders
mu find <query> --format=json | jq -r '.from' | sort | uniq -c | sort -rn

# Emails by month
mu find <query> --format=json | jq -r '.date | strftime("%Y-%m")' | sort | uniq -c

# Attachment types
mu find attach:* --format=json | jq -r '.attachments[].name' | sed 's/.*\.//' | sort | uniq -c

# Average email size
mu find <query> --format=json | jq '.size' | awk '{sum+=$1; n++} END {print sum/n}'

4. Visualize Results

Present findings clearly:

  • Tables: Formatted counts and statistics
  • Lists: Top N senders, subjects, etc.
  • Summaries: Key insights and patterns
  • Comparisons: Personal vs work, this month vs last month

5. Provide Insights

Interpret the data:

  • Identify trends (increasing/decreasing volume)
  • Highlight patterns (busiest times, top correspondents)
  • Suggest actions (archive old threads, follow up on flagged items)

Common Analysis Patterns

Email Volume Analysis

Count emails by account:

echo "Personal: $(mu find maildir:/icloud/* | wc -l)"
echo "Work: $(mu find maildir:/redhat/* | wc -l)"

Count by month for the year:

for month in {01..12}; do
  count=$(mu find date:2025${month}01..2025${month}31 | wc -l)
  echo "2025-${month}: ${count}"
done

Unread email count:

mu find flag:unread maildir:/icloud/* | wc -l
mu find flag:unread maildir:/redhat/* | wc -l

People Analysis

Top 10 senders:

mu find <query> --format=json | \
  jq -r '.from' | \
  sort | uniq -c | sort -rn | head -10

Email exchange with specific person:

mu find "from:alice@example.com OR to:alice@example.com" | wc -l

Communication frequency over time:

mu find from:alice@example.com --format=json | \
  jq -r '.date | strftime("%Y-%m")' | \
  sort | uniq -c

Topic Analysis

Common subject keywords:

mu find <query> --format=json | \
  jq -r '.subject' | \
  tr '[:upper:]' '[:lower:]' | \
  grep -oE '\w+' | \
  sort | uniq -c | sort -rn | head -20

Emails by project (work):

for project in knative kubernetes konflux; do
  count=$(mu find maildir:/redhat/${project}/* | wc -l)
  echo "${project}: ${count}"
done

Attachment Analysis

Total attachments count:

mu find attach:* | wc -l

Attachment types distribution:

mu find attach:* --format=json | \
  jq -r '.attachments[]?.name' | \
  sed 's/.*\.//' | \
  tr '[:upper:]' '[:lower:]' | \
  sort | uniq -c | sort -rn

Large emails with attachments:

mu find attach:* size:1M.. --format=json | \
  jq -r '"\(.size) \(.subject)"' | \
  sort -rn

Thread Analysis

Thread depth (replies):

mu find <query> --format=json | \
  jq -r 'select(.references != null) | .references | length' | \
  awk '{sum+=$1; n++} END {print "Avg replies:", sum/n}'

Longest threads:

mu find <query> --format=json | \
  jq -r 'select(.references != null) | "\(.references | length) \(.subject)"' | \
  sort -rn | head -10

Temporal Analysis

Emails by day of week:

mu find <query> --format=json | \
  jq -r '.date | strftime("%A")' | \
  sort | uniq -c

Emails by hour of day:

mu find <query> --format=json | \
  jq -r '.date | strftime("%H")' | \
  sort | uniq -c | sort -k2 -n

Activity timeline (last 7 days):

for i in {0..6}; do
  date=$(date -d "-${i} days" +%Y%m%d)
  count=$(mu find date:${date} | wc -l)
  echo "$(date -d "-${i} days" +%Y-%m-%d): ${count}"
done

Best Practices

Performance

  • Use specific maildir queries to limit scope
  • Process JSON output for complex analysis
  • Use streaming tools (jq, awk) for large datasets
  • Cache results for repeated analysis

Privacy

  • Aggregate personal and work data separately
  • Redact email addresses in summaries when appropriate
  • Be careful with subject content in analysis

Data Quality

  • Handle missing fields gracefully (use jq select)
  • Account for timezone differences in date analysis
  • Normalize data (lowercase, trim) for accurate counts

Tool Recommendations

Use these tools for analysis (all available in nixpkgs):

# In nix-shell
nix-shell -p jq gnugrep gawk coreutils dateutils

# Or using nix-shell shebang in scripts
#!/usr/bin/env nix-shell
#! nix-shell -i bash -p jq gnugrep gawk

jq - JSON processing and querying awk - Text processing and calculations grep/sed - Pattern matching and text manipulation sort/uniq - Counting and deduplication dateutils - Advanced date manipulation

Examples

Monthly email volume comparison:

echo "Last month: $(mu find date:1m..30d | wc -l)"
echo "This month: $(mu find date:30d.. | wc -l)"

Top work correspondents:

echo "Top 10 work email senders:"
mu find maildir:/redhat/* --format=json | \
  jq -r '.from' | \
  sort | uniq -c | sort -rn | head -10

Busiest email hour:

echo "Email activity by hour:"
mu find date:30d.. --format=json | \
  jq -r '.date | strftime("%H")' | \
  sort | uniq -c | sort -rn | head -5

Integration

This workflow often follows:

  1. Search workflow - Initial data gathering
  2. Analyze workflow - Process and analyze data
  3. Present findings to user
  4. Offer drill-down with Search or View workflows