Skip to main content

Attorney Enrichment Pipeline

Step-by-step runbook for enriching attorney contact data via the Blitz API. This pipeline finds attorneys at law firms through LinkedIn company profiles, then enriches each with email, phone, education, and work history.

Prerequisites

Before running enrichment, confirm:

  1. BLITZ_API_KEY is set in .env -- check with npx tsx -e "console.log(process.env.BLITZ_API_KEY ? 'SET' : 'MISSING')"
  2. SUPABASE_SERVICE_ROLE_KEY is set in .env -- needed to bypass RLS for writes
  3. Firms table is populated -- 200 AmLaw firms should exist in firms table with linkedin_url values
  4. Sufficient Blitz credits -- check balance:
    curl -H "x-api-key: $BLITZ_API_KEY" https://api.blitz-api.ai/api/blitz/key-info

Running Enrichment

Test Run (Always Do This First)

npx tsx scripts/enrich-attorneys.ts --tier amlaw_10 --test

This processes 1 firm and 5 attorneys max. Verify output looks correct before proceeding.

Full Run

# Enrich all AmLaw 10 firms
npx tsx scripts/enrich-attorneys.ts --tier amlaw_10

# Enrich a specific firm
npx tsx scripts/enrich-attorneys.ts --firm "Kirkland"

# Enrich AmLaw 25 tier
npx tsx scripts/enrich-attorneys.ts --tier amlaw_25

# Skip phone enrichment to save credits (5 credits/attorney saved)
npx tsx scripts/enrich-attorneys.ts --tier amlaw_50 --skip-phone

# Dry run -- preview only, no API calls
npx tsx scripts/enrich-attorneys.ts --tier amlaw_100 --dry-run

# Limit attorneys per firm
npx tsx scripts/enrich-attorneys.ts --tier amlaw_10 --max 100

Resume a Failed Run

If a run is interrupted, checkpoint data is saved to .enrichment-progress.json:

npx tsx scripts/enrich-attorneys.ts --resume

This reads the checkpoint file and skips already-processed firms and attorneys.

Pipeline Flow

The enrichment pipeline follows this sequence:

Detailed Steps

  1. Load firms -- Queries firms table filtered by tier or name. Only firms with a valid linkedin_url are processed.

  2. Employee-finder -- Calls POST /v2/search/employee-finder with the firm's LinkedIn company URL and job_function: ["Legal"].

  3. Title filtering -- Results are filtered through isValidAttorneyTitle() which checks against known attorney titles from Associate through Equity Partner. Non-attorney staff (paralegals, legal assistants, etc.) are excluded.

  4. Seniority derivation -- Title is mapped to a seniority level:

    • equity_partner -- Equity Partner
    • partner -- Partner, Managing Partner, Income Partner, Shareholder, Member, etc.
    • counsel -- Counsel, Of Counsel, Senior Counsel, Special Counsel, etc.
    • senior_associate -- Senior Associate, Managing Associate, Principal Associate, etc.
    • associate -- Associate, Attorney, Junior Associate, etc.
    • other -- Anything else
  5. Email enrichment -- Calls POST /v2/enrichment/email with person_linkedin_url (NOT linkedin_profile_url). Costs 1 credit if an email is found.

  6. Phone enrichment -- Calls POST /v2/enrichment/phone with person_linkedin_url. Costs 5 credits if a phone is found.

  7. Record building -- Extracts law school, undergrad, prior firms, and work history from the employee-finder response. Builds a full attorney record.

  8. Batch upsert -- Inserts to attorneys table with ON CONFLICT (linkedin_url) to handle re-runs safely.

  9. Checkpoint -- Saves progress to .enrichment-progress.json after each firm completes.

After Enrichment: Run Pattern Detection

Once attorneys are enriched, run pattern detection to discover firm website URL patterns for attorney bio pages:

# Test all firms (dry run)
npx tsx scripts/detect-profile-patterns.ts

# Save detected patterns to the database
npx tsx scripts/detect-profile-patterns.ts --save

# Test a single firm
npx tsx scripts/detect-profile-patterns.ts --firm "Kirkland"

# Skip firms that already have patterns
npx tsx scripts/detect-profile-patterns.ts --save --skip-existing

Pattern detection:

  • Tests 37 URL patterns in parallel for each firm
  • Uses real enriched attorneys from the DB as test subjects
  • Validates a pattern with a second attorney before saving
  • Requires at least 2 enriched attorneys per firm
  • Saves the working pattern to firms.profile_url_pattern

Common Patterns by Firm

FirmPattern
Most firms/people/{first}-{last}
Kirkland & Ellis/lawyers/{last_initial}/{last}-{first}
Gibson Dunn/lawyers/{last}
Latham & Watkins/people/{last}
Cleary Gottlieb/professionals/{first}-{last}
WilmerHale/bio/{first}-{last}

Firm profile URL patterns vary significantly -- there is no universal standard. Some firms (e.g., Skadden) use JavaScript-rendered pages that require Puppeteer with stealth plugin instead of simple HTTP requests.

Cost Estimation

StepCost
Employee-finder1 credit per result
Email enrichment1 credit (only if found)
Phone enrichment5 credits (only if found)
Typical total2--7 credits per attorney

For a full AmLaw 10 run (~10 firms, ~200 attorneys each), budget roughly 5,000--15,000 credits.

Rate Limits

The pipeline enforces built-in rate limits to avoid being throttled by the Blitz API:

  • 700ms between individual attorney enrichment calls
  • 3 seconds between firms

Do not reduce these values. The Blitz API will throttle or block requests if you exceed their rate limits.

Troubleshooting

ProblemCauseFix
422 errors from email/phoneWrong field nameUse person_linkedin_url, not linkedin_profile_url
No results from employee-finderFirm may lack LinkedIn company pageVerify firms.linkedin_url is correct
email_status insert failsInvalid enum valueOnly valid, invalid, catch_all, unknown are allowed
Checkpoint corruptionPartial write during crashDelete .enrichment-progress.json and restart
Pattern detection returns nothingNot enough enriched attorneysRun enrichment first -- need at least 2 attorneys per firm

See Debugging Enrichment Issues for more detail.