Attorney Enrichment Pipeline
Step-by-step runbook for enriching attorney contact data via the Blitz API. This pipeline finds attorneys at law firms through LinkedIn company profiles, then enriches each with email, phone, education, and work history.
Prerequisites
Before running enrichment, confirm:
BLITZ_API_KEYis set in.env-- check withnpx tsx -e "console.log(process.env.BLITZ_API_KEY ? 'SET' : 'MISSING')"SUPABASE_SERVICE_ROLE_KEYis set in.env-- needed to bypass RLS for writes- Firms table is populated -- 200 AmLaw firms should exist in
firmstable withlinkedin_urlvalues - Sufficient Blitz credits -- check balance:
curl -H "x-api-key: $BLITZ_API_KEY" https://api.blitz-api.ai/api/blitz/key-info
Running Enrichment
Test Run (Always Do This First)
npx tsx scripts/enrich-attorneys.ts --tier amlaw_10 --test
This processes 1 firm and 5 attorneys max. Verify output looks correct before proceeding.
Full Run
# Enrich all AmLaw 10 firms
npx tsx scripts/enrich-attorneys.ts --tier amlaw_10
# Enrich a specific firm
npx tsx scripts/enrich-attorneys.ts --firm "Kirkland"
# Enrich AmLaw 25 tier
npx tsx scripts/enrich-attorneys.ts --tier amlaw_25
# Skip phone enrichment to save credits (5 credits/attorney saved)
npx tsx scripts/enrich-attorneys.ts --tier amlaw_50 --skip-phone
# Dry run -- preview only, no API calls
npx tsx scripts/enrich-attorneys.ts --tier amlaw_100 --dry-run
# Limit attorneys per firm
npx tsx scripts/enrich-attorneys.ts --tier amlaw_10 --max 100
Resume a Failed Run
If a run is interrupted, checkpoint data is saved to .enrichment-progress.json:
npx tsx scripts/enrich-attorneys.ts --resume
This reads the checkpoint file and skips already-processed firms and attorneys.
Pipeline Flow
The enrichment pipeline follows this sequence:
Detailed Steps
-
Load firms -- Queries
firmstable filtered bytierorname. Only firms with a validlinkedin_urlare processed. -
Employee-finder -- Calls
POST /v2/search/employee-finderwith the firm's LinkedIn company URL andjob_function: ["Legal"]. -
Title filtering -- Results are filtered through
isValidAttorneyTitle()which checks against known attorney titles from Associate through Equity Partner. Non-attorney staff (paralegals, legal assistants, etc.) are excluded. -
Seniority derivation -- Title is mapped to a seniority level:
equity_partner-- Equity Partnerpartner-- Partner, Managing Partner, Income Partner, Shareholder, Member, etc.counsel-- Counsel, Of Counsel, Senior Counsel, Special Counsel, etc.senior_associate-- Senior Associate, Managing Associate, Principal Associate, etc.associate-- Associate, Attorney, Junior Associate, etc.other-- Anything else
-
Email enrichment -- Calls
POST /v2/enrichment/emailwithperson_linkedin_url(NOTlinkedin_profile_url). Costs 1 credit if an email is found. -
Phone enrichment -- Calls
POST /v2/enrichment/phonewithperson_linkedin_url. Costs 5 credits if a phone is found. -
Record building -- Extracts law school, undergrad, prior firms, and work history from the employee-finder response. Builds a full attorney record.
-
Batch upsert -- Inserts to
attorneystable withON CONFLICT (linkedin_url)to handle re-runs safely. -
Checkpoint -- Saves progress to
.enrichment-progress.jsonafter each firm completes.
After Enrichment: Run Pattern Detection
Once attorneys are enriched, run pattern detection to discover firm website URL patterns for attorney bio pages:
# Test all firms (dry run)
npx tsx scripts/detect-profile-patterns.ts
# Save detected patterns to the database
npx tsx scripts/detect-profile-patterns.ts --save
# Test a single firm
npx tsx scripts/detect-profile-patterns.ts --firm "Kirkland"
# Skip firms that already have patterns
npx tsx scripts/detect-profile-patterns.ts --save --skip-existing
Pattern detection:
- Tests 37 URL patterns in parallel for each firm
- Uses real enriched attorneys from the DB as test subjects
- Validates a pattern with a second attorney before saving
- Requires at least 2 enriched attorneys per firm
- Saves the working pattern to
firms.profile_url_pattern
Common Patterns by Firm
| Firm | Pattern |
|---|---|
| Most firms | /people/{first}-{last} |
| Kirkland & Ellis | /lawyers/{last_initial}/{last}-{first} |
| Gibson Dunn | /lawyers/{last} |
| Latham & Watkins | /people/{last} |
| Cleary Gottlieb | /professionals/{first}-{last} |
| WilmerHale | /bio/{first}-{last} |
Firm profile URL patterns vary significantly -- there is no universal standard. Some firms (e.g., Skadden) use JavaScript-rendered pages that require Puppeteer with stealth plugin instead of simple HTTP requests.
Cost Estimation
| Step | Cost |
|---|---|
| Employee-finder | 1 credit per result |
| Email enrichment | 1 credit (only if found) |
| Phone enrichment | 5 credits (only if found) |
| Typical total | 2--7 credits per attorney |
For a full AmLaw 10 run (~10 firms, ~200 attorneys each), budget roughly 5,000--15,000 credits.
Rate Limits
The pipeline enforces built-in rate limits to avoid being throttled by the Blitz API:
- 700ms between individual attorney enrichment calls
- 3 seconds between firms
Do not reduce these values. The Blitz API will throttle or block requests if you exceed their rate limits.
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| 422 errors from email/phone | Wrong field name | Use person_linkedin_url, not linkedin_profile_url |
| No results from employee-finder | Firm may lack LinkedIn company page | Verify firms.linkedin_url is correct |
email_status insert fails | Invalid enum value | Only valid, invalid, catch_all, unknown are allowed |
| Checkpoint corruption | Partial write during crash | Delete .enrichment-progress.json and restart |
| Pattern detection returns nothing | Not enough enriched attorneys | Run enrichment first -- need at least 2 attorneys per firm |
See Debugging Enrichment Issues for more detail.