Debugging Enrichment Issues
Troubleshooting guide for the attorney enrichment pipeline, Blitz API integration, and pattern detection.
422 from Blitz API
Symptoms: Email or phone enrichment returns HTTP 422 (Unprocessable Entity).
Cause: Using the wrong field name in the request body.
Fix: The Blitz API v2 email and phone endpoints require person_linkedin_url, not linkedin_profile_url:
// Correct (v2)
const response = await fetch('https://api.blitz-api.ai/v2/enrichment/email', {
method: 'POST',
headers: {
'x-api-key': BLITZ_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
person_linkedin_url: 'https://www.linkedin.com/in/john-doe-12345/',
}),
});
// Wrong -- causes 422
const response = await fetch('https://api.blitz-api.ai/v2/enrichment/email', {
method: 'POST',
headers: {
'x-api-key': BLITZ_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
linkedin_profile_url: 'https://www.linkedin.com/in/john-doe-12345/',
}),
});
This is the most common Blitz API issue. The v1 API used linkedin_profile_url but v2 changed it to person_linkedin_url.
No Results from Employee-Finder
Symptoms: employee-finder returns an empty results array for a firm.
Possible causes:
1. Firm does not have a LinkedIn company page
Some firms may not have an active LinkedIn company page, or the URL in the firms table may be incorrect.
Verify:
SELECT name, linkedin_url FROM firms WHERE name ILIKE '%<firm-name>%';
Open the linkedin_url in a browser and verify it loads a valid company page.
2. Wrong company_linkedin_url
The employee-finder endpoint expects the full LinkedIn company URL:
// Correct
{ company_linkedin_url: 'https://www.linkedin.com/company/kirkland-ellis-llp' }
// Wrong -- using a person URL instead of company URL
{ company_linkedin_url: 'https://www.linkedin.com/in/john-doe' }
3. No employees with job_function "Legal"
The enrichment script filters with job_function: ["Legal"]. If the firm's employees are not tagged with this job function on LinkedIn, results will be empty. This is rare for law firms but possible for smaller or newer firms.
Workaround: Try without the job function filter and inspect results manually.
Pattern Detection Fails
Symptoms: detect-profile-patterns.ts returns no patterns for a firm or reports "not enough attorneys."
Cause: Pattern detection requires enriched attorneys in the database to test URL patterns. It needs at least 2 attorneys per firm to validate a pattern.
Verify attorney count:
SELECT f.name, count(a.id) AS attorney_count
FROM firms f
LEFT JOIN attorneys a ON a.firm_id = f.id
WHERE f.name ILIKE '%<firm-name>%'
GROUP BY f.name;
Fix: Run the enrichment pipeline first to populate attorneys, then run pattern detection:
# Step 1: Enrich the firm
npx tsx scripts/enrich-attorneys.ts --firm "<firm-name>"
# Step 2: Detect patterns
npx tsx scripts/detect-profile-patterns.ts --firm "<firm-name>" --save
Non-Standard URL Patterns
Some firms use unusual URL patterns that are not in the 37 tested patterns:
| Firm | Issue |
|---|---|
| Kirkland & Ellis | Uses /lawyers/{last_initial}/{last}-{first} (non-standard) |
| Skadden | JavaScript-rendered pages -- HTTP requests return empty HTML |
| Some boutiques | No attorney bio pages at all |
For JS-rendered pages, use scrape-attorney-profiles.ts with Puppeteer and the stealth plugin.
Checkpoint Corruption
Symptoms: --resume flag causes errors or processes the wrong firms. Script crashes on startup when reading checkpoint.
Cause: The .enrichment-progress.json file was partially written during a crash.
Fix: Delete the checkpoint file and restart:
rm .enrichment-progress.json
npx tsx scripts/enrich-attorneys.ts --tier amlaw_10
Inspect the checkpoint (if you want to salvage progress):
cat .enrichment-progress.json | python3 -m json.tool
The checkpoint contains:
- List of completed firm IDs
- List of completed attorney LinkedIn URLs
- Timestamp of last save
Credit Tracking
Check remaining credits:
curl -H "x-api-key: $BLITZ_API_KEY" https://api.blitz-api.ai/api/blitz/key-info
Estimate cost before running:
| Operation | Cost per Unit |
|---|---|
| Employee-finder | 1 credit per result returned |
| Email enrichment | 1 credit (only charged if email found) |
| Phone enrichment | 5 credits (only charged if phone found) |
Typical cost per attorney: 2--7 credits.
For a full firm (~200 attorneys): 400--1,400 credits.
For all AmLaw 200 firms: Budget 80,000--280,000 credits (rough estimate).
Use --skip-phone to reduce costs by 5 credits per attorney when phone numbers are not immediately needed:
npx tsx scripts/enrich-attorneys.ts --tier amlaw_100 --skip-phone
JS-Rendered Firm Pages
Symptoms: scrape-attorney-profiles.ts returns empty or partial HTML for a firm's website. Pattern detection validates a URL pattern, but scraping gets no data.
Cause: Some firm websites (e.g., Skadden, some boutiques) render attorney bio pages with JavaScript. A simple HTTP GET returns an empty shell page.
Fix: Use Puppeteer with the puppeteer-extra-plugin-stealth package:
import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
puppeteer.use(StealthPlugin());
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle0' });
const content = await page.content();
The networkidle0 option waits until there are no network requests for 500ms, ensuring JS-rendered content has loaded.
Rate Limiting
Symptoms: Blitz API returns 429 Too Many Requests, or responses become empty/slow.
Built-in rate limits in the enrichment script:
- 700ms between individual attorney enrichment calls
- 3 seconds between firms
Do not reduce these values. The Blitz API will throttle or block your API key if you exceed their rate limits.
If you are being rate-limited:
- Wait 5--10 minutes for the rate limit window to reset
- Check if another process is also making Blitz API calls
- Use
--resumeto continue from where the script left off - If persistent, contact Blitz API support to check your rate limit tier
email_status Insert Failure
Symptoms: Enrichment script fails when inserting an attorney with email_status set to an invalid value.
Cause: The email_status enum only accepts 4 values:
| Valid Values |
|---|
valid |
invalid |
catch_all |
unknown |
The Blitz API may return status strings that do not map directly. The enrichment script should normalize the response:
const normalizeEmailStatus = (status: string): string => {
const normalized = status.toLowerCase();
if (['valid', 'invalid', 'catch_all', 'unknown'].includes(normalized)) {
return normalized;
}
return 'unknown'; // Default for any unrecognized status
};
Quick Diagnostic Checklist
When an enrichment run fails, check these in order:
- API key valid? --
curl -H "x-api-key: $BLITZ_API_KEY" https://api.blitz-api.ai/api/blitz/key-info - Credits remaining? -- Same endpoint, check the balance
- Supabase connection? -- Check
VITE_SUPABASE_URLandSUPABASE_SERVICE_ROLE_KEYare set - Correct field names? --
person_linkedin_urlfor v2 (notlinkedin_profile_url) - Correct column names? --
seniority(notseniority_level),email_status(valid enum values only) - Rate limited? -- Check for 429 responses in script output
- Checkpoint issue? -- Delete
.enrichment-progress.jsonif corrupted