Skip to main content

Enrichment Pipeline

The attorney enrichment pipeline discovers attorneys at law firms, enriches their contact information, and populates the attorneys table. It is the core data acquisition process for Client Portal's market intelligence.

Pipeline Overview

Script Usage

# Full run -- all firms with LinkedIn URLs
npx tsx scripts/enrich-attorneys.ts

# Test mode -- 1 firm, 5 attorneys max
npx tsx scripts/enrich-attorneys.ts --test

# Specific firm by name (partial match)
npx tsx scripts/enrich-attorneys.ts --firm "Kirkland"

# By tier
npx tsx scripts/enrich-attorneys.ts --tier amlaw_10

# Max attorneys per firm
npx tsx scripts/enrich-attorneys.ts --max 100

# Skip expensive phone enrichment (saves 5 credits/attorney)
npx tsx scripts/enrich-attorneys.ts --skip-phone

# Skip email enrichment
npx tsx scripts/enrich-attorneys.ts --skip-email

# Resume from last checkpoint
npx tsx scripts/enrich-attorneys.ts --resume

# Preview only -- no API calls, no DB writes
npx tsx scripts/enrich-attorneys.ts --dry-run

Title Filtering

The isValidAttorneyTitle() function filters employee-finder results to actual attorneys. GHL's job_function: ["Legal"] filter returns legal staff broadly (including paralegals, legal assistants, etc.), so post-filtering is essential.

Valid Title Tiers

TierTitles
AssociateAssociate, Attorney, Lawyer, Junior Associate
Senior AssociateSenior Associate, Managing Associate, Principal Associate, Career Associate
CounselCounsel, Of Counsel, Senior Counsel, Special Counsel, Principal Counsel
PartnerIncome Partner, Partner, Managing Partner, Office Managing Partner, Firm Chair, Executive Partner, Shareholder, Member
Equity PartnerEquity Partner

Source: /Users/kaihatchman/Desktop/Contact Criteria - Blitz API - Contact Information.csv

Seniority Derivation

The seniority column on the attorneys table is derived from the job title:

function deriveSeniority(title: string): string {
const lower = title.toLowerCase();
if (lower.includes("equity partner")) return "equity_partner";
if (lower.includes("partner")) return "partner";
if (lower.includes("counsel")) return "counsel";
if (lower.includes("senior associate")) return "senior_associate";
if (lower.includes("associate")) return "associate";
return "other";
}
Column Name

The database column is seniority -- NOT seniority_level. Using seniority_level will cause a query error.

Hierarchy (highest to lowest): equity_partner > partner > counsel > senior_associate > associate > other

Data Extraction

The employee-finder response contains rich data that gets mapped to attorneys table columns:

From Employee-Finder (no extra API calls)

FieldSource
full_name, first_name, last_nameDirect fields
linkedin_urllinkedin_url (unique key)
linkedin_headlineheadline
linkedin_aboutabout_me
linkedin_connectionsconnections_count
job_titleCurrent experience job_title
job_start_dateCurrent experience job_start_date
city, state, countrylocation object
law_school, law_school_year, law_school_degreeEducation entries (filtered by known law schools)
undergrad_school, undergrad_year, undergrad_degreeNon-law-school education entries
other_educationAll remaining education entries
prior_firmsNon-current experiences at law firms
linkedin_work_historyFull experiences array (JSONB)
years_experienceCalculated from earliest experience start date

From Email Enrichment (+1 credit if found)

FieldSource
email_primaryemail response field
email_statusemail_status (valid, invalid, catch_all, unknown)
email_source"blitz_v2"
email_workSame as email_primary (work email assumed)
email_allArray of all found emails

From Phone Enrichment (+5 credits if found)

FieldSource
phone_mobilephone response field
phone_mobile_foundBoolean: phone was found
source_phone"blitz_v2"

Checkpoint System

The pipeline saves progress after each firm to .enrichment-progress.json:

{
"completed_firms": ["uuid-1", "uuid-2"],
"current_firm": "uuid-3",
"current_firm_name": "Kirkland & Ellis",
"completed_attorneys": 45,
"total_credits_used": 312,
"last_updated": "2026-02-10T15:30:00Z"
}

Use --resume to continue from where a previous run stopped. The checkpoint tracks completed firms by ID, so re-running without --resume will re-process all firms (upserting on linkedin_url conflict).

Rate Limits

OperationDelayReason
Between attorneys700msBlitz API rate limit
Between firms3000msCourtesy delay + checkpoint save

Upsert Strategy

All attorney records are upserted using linkedin_url as the conflict key:

INSERT INTO attorneys (linkedin_url, full_name, ...)
ON CONFLICT (linkedin_url)
DO UPDATE SET full_name = EXCLUDED.full_name, ...

This means re-running enrichment for a firm updates existing records rather than creating duplicates.

Fields Populated vs Remaining

The enrichment pipeline populates 42 of 76 columns on the attorneys table. The remaining 34 fields require firm website scraping (a separate future pipeline).

Populated by Enrichment (42 fields)

Identity, position, LinkedIn data, location, firm details, contact info (email/phone), education, career history, tracking metadata.

Requires Website Scraping (34 fields)

Practice areas, bar admissions, rankings/awards, publications, office phone, firm website profile URL, ICP scoring, and verification flags.

Profile URL Pattern Detection

After enrichment, run the pattern detection script to discover how each firm structures attorney profile URLs on their website:

npx tsx scripts/detect-profile-patterns.ts              # Test all firms
npx tsx scripts/detect-profile-patterns.ts --firm "Kirkland" # One firm
npx tsx scripts/detect-profile-patterns.ts --save # Save to DB
npx tsx scripts/detect-profile-patterns.ts --skip-existing # Skip known patterns
npx tsx scripts/detect-profile-patterns.ts --concurrency 5 # Parallel firms

Known Patterns

FirmURL Pattern
Most firms/people/{first}-{last}
Kirkland & Ellis/lawyers/{last_initial}/{last}-{first}
Gibson Dunn/lawyers/{last}
Latham & Watkins/people/{last}
Cleary Gottlieb/professionals/{first}-{last}
WilmerHale/bio/{first}-{last}

The detected pattern is saved to firms.profile_url_pattern and used by future website scraping pipelines.

Monitoring

After an enrichment run, check results:

-- Credits used by firm
SELECT firm_name, COUNT(*) as attorneys, SUM(credits_used) as credits
FROM attorneys
WHERE enriched_at > NOW() - INTERVAL '1 day'
GROUP BY firm_name
ORDER BY credits DESC;

-- Enrichment coverage
SELECT enrichment_status, COUNT(*)
FROM attorneys
GROUP BY enrichment_status;

-- Email hit rate
SELECT
COUNT(*) FILTER (WHERE email_primary IS NOT NULL) as with_email,
COUNT(*) FILTER (WHERE phone_mobile IS NOT NULL) as with_phone,
COUNT(*) as total
FROM attorneys
WHERE enriched_at > NOW() - INTERVAL '1 day';