Enrichment Pipeline
The attorney enrichment pipeline discovers attorneys at law firms, enriches their contact information, and populates the attorneys table. It is the core data acquisition process for Client Portal's market intelligence.
Pipeline Overview
Script Usage
# Full run -- all firms with LinkedIn URLs
npx tsx scripts/enrich-attorneys.ts
# Test mode -- 1 firm, 5 attorneys max
npx tsx scripts/enrich-attorneys.ts --test
# Specific firm by name (partial match)
npx tsx scripts/enrich-attorneys.ts --firm "Kirkland"
# By tier
npx tsx scripts/enrich-attorneys.ts --tier amlaw_10
# Max attorneys per firm
npx tsx scripts/enrich-attorneys.ts --max 100
# Skip expensive phone enrichment (saves 5 credits/attorney)
npx tsx scripts/enrich-attorneys.ts --skip-phone
# Skip email enrichment
npx tsx scripts/enrich-attorneys.ts --skip-email
# Resume from last checkpoint
npx tsx scripts/enrich-attorneys.ts --resume
# Preview only -- no API calls, no DB writes
npx tsx scripts/enrich-attorneys.ts --dry-run
Title Filtering
The isValidAttorneyTitle() function filters employee-finder results to actual attorneys. GHL's job_function: ["Legal"] filter returns legal staff broadly (including paralegals, legal assistants, etc.), so post-filtering is essential.
Valid Title Tiers
| Tier | Titles |
|---|---|
| Associate | Associate, Attorney, Lawyer, Junior Associate |
| Senior Associate | Senior Associate, Managing Associate, Principal Associate, Career Associate |
| Counsel | Counsel, Of Counsel, Senior Counsel, Special Counsel, Principal Counsel |
| Partner | Income Partner, Partner, Managing Partner, Office Managing Partner, Firm Chair, Executive Partner, Shareholder, Member |
| Equity Partner | Equity Partner |
Source: /Users/kaihatchman/Desktop/Contact Criteria - Blitz API - Contact Information.csv
Seniority Derivation
The seniority column on the attorneys table is derived from the job title:
function deriveSeniority(title: string): string {
const lower = title.toLowerCase();
if (lower.includes("equity partner")) return "equity_partner";
if (lower.includes("partner")) return "partner";
if (lower.includes("counsel")) return "counsel";
if (lower.includes("senior associate")) return "senior_associate";
if (lower.includes("associate")) return "associate";
return "other";
}
The database column is seniority -- NOT seniority_level. Using seniority_level will cause a query error.
Hierarchy (highest to lowest): equity_partner > partner > counsel > senior_associate > associate > other
Data Extraction
The employee-finder response contains rich data that gets mapped to attorneys table columns:
From Employee-Finder (no extra API calls)
| Field | Source |
|---|---|
full_name, first_name, last_name | Direct fields |
linkedin_url | linkedin_url (unique key) |
linkedin_headline | headline |
linkedin_about | about_me |
linkedin_connections | connections_count |
job_title | Current experience job_title |
job_start_date | Current experience job_start_date |
city, state, country | location object |
law_school, law_school_year, law_school_degree | Education entries (filtered by known law schools) |
undergrad_school, undergrad_year, undergrad_degree | Non-law-school education entries |
other_education | All remaining education entries |
prior_firms | Non-current experiences at law firms |
linkedin_work_history | Full experiences array (JSONB) |
years_experience | Calculated from earliest experience start date |
From Email Enrichment (+1 credit if found)
| Field | Source |
|---|---|
email_primary | email response field |
email_status | email_status (valid, invalid, catch_all, unknown) |
email_source | "blitz_v2" |
email_work | Same as email_primary (work email assumed) |
email_all | Array of all found emails |
From Phone Enrichment (+5 credits if found)
| Field | Source |
|---|---|
phone_mobile | phone response field |
phone_mobile_found | Boolean: phone was found |
source_phone | "blitz_v2" |
Checkpoint System
The pipeline saves progress after each firm to .enrichment-progress.json:
{
"completed_firms": ["uuid-1", "uuid-2"],
"current_firm": "uuid-3",
"current_firm_name": "Kirkland & Ellis",
"completed_attorneys": 45,
"total_credits_used": 312,
"last_updated": "2026-02-10T15:30:00Z"
}
Use --resume to continue from where a previous run stopped. The checkpoint tracks completed firms by ID, so re-running without --resume will re-process all firms (upserting on linkedin_url conflict).
Rate Limits
| Operation | Delay | Reason |
|---|---|---|
| Between attorneys | 700ms | Blitz API rate limit |
| Between firms | 3000ms | Courtesy delay + checkpoint save |
Upsert Strategy
All attorney records are upserted using linkedin_url as the conflict key:
INSERT INTO attorneys (linkedin_url, full_name, ...)
ON CONFLICT (linkedin_url)
DO UPDATE SET full_name = EXCLUDED.full_name, ...
This means re-running enrichment for a firm updates existing records rather than creating duplicates.
Fields Populated vs Remaining
The enrichment pipeline populates 42 of 76 columns on the attorneys table. The remaining 34 fields require firm website scraping (a separate future pipeline).
Populated by Enrichment (42 fields)
Identity, position, LinkedIn data, location, firm details, contact info (email/phone), education, career history, tracking metadata.
Requires Website Scraping (34 fields)
Practice areas, bar admissions, rankings/awards, publications, office phone, firm website profile URL, ICP scoring, and verification flags.
Profile URL Pattern Detection
After enrichment, run the pattern detection script to discover how each firm structures attorney profile URLs on their website:
npx tsx scripts/detect-profile-patterns.ts # Test all firms
npx tsx scripts/detect-profile-patterns.ts --firm "Kirkland" # One firm
npx tsx scripts/detect-profile-patterns.ts --save # Save to DB
npx tsx scripts/detect-profile-patterns.ts --skip-existing # Skip known patterns
npx tsx scripts/detect-profile-patterns.ts --concurrency 5 # Parallel firms
Known Patterns
| Firm | URL Pattern |
|---|---|
| Most firms | /people/{first}-{last} |
| Kirkland & Ellis | /lawyers/{last_initial}/{last}-{first} |
| Gibson Dunn | /lawyers/{last} |
| Latham & Watkins | /people/{last} |
| Cleary Gottlieb | /professionals/{first}-{last} |
| WilmerHale | /bio/{first}-{last} |
The detected pattern is saved to firms.profile_url_pattern and used by future website scraping pipelines.
Monitoring
After an enrichment run, check results:
-- Credits used by firm
SELECT firm_name, COUNT(*) as attorneys, SUM(credits_used) as credits
FROM attorneys
WHERE enriched_at > NOW() - INTERVAL '1 day'
GROUP BY firm_name
ORDER BY credits DESC;
-- Enrichment coverage
SELECT enrichment_status, COUNT(*)
FROM attorneys
GROUP BY enrichment_status;
-- Email hit rate
SELECT
COUNT(*) FILTER (WHERE email_primary IS NOT NULL) as with_email,
COUNT(*) FILTER (WHERE phone_mobile IS NOT NULL) as with_phone,
COUNT(*) as total
FROM attorneys
WHERE enriched_at > NOW() - INTERVAL '1 day';
Related Documentation
- Blitz API Reference -- Endpoint details, field names, and error handling
- Apollo.io -- Alternative enrichment for company-level data