Skip to main content

Debugging Enrichment Issues

Troubleshooting guide for the attorney enrichment pipeline, Blitz API integration, and pattern detection.

422 from Blitz API

Symptoms: Email or phone enrichment returns HTTP 422 (Unprocessable Entity).

Cause: Using the wrong field name in the request body.

Fix: The Blitz API v2 email and phone endpoints require person_linkedin_url, not linkedin_profile_url:

// Correct (v2)
const response = await fetch('https://api.blitz-api.ai/v2/enrichment/email', {
method: 'POST',
headers: {
'x-api-key': BLITZ_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
person_linkedin_url: 'https://www.linkedin.com/in/john-doe-12345/',
}),
});

// Wrong -- causes 422
const response = await fetch('https://api.blitz-api.ai/v2/enrichment/email', {
method: 'POST',
headers: {
'x-api-key': BLITZ_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
linkedin_profile_url: 'https://www.linkedin.com/in/john-doe-12345/',
}),
});

This is the most common Blitz API issue. The v1 API used linkedin_profile_url but v2 changed it to person_linkedin_url.

No Results from Employee-Finder

Symptoms: employee-finder returns an empty results array for a firm.

Possible causes:

1. Firm does not have a LinkedIn company page

Some firms may not have an active LinkedIn company page, or the URL in the firms table may be incorrect.

Verify:

SELECT name, linkedin_url FROM firms WHERE name ILIKE '%<firm-name>%';

Open the linkedin_url in a browser and verify it loads a valid company page.

2. Wrong company_linkedin_url

The employee-finder endpoint expects the full LinkedIn company URL:

// Correct
{ company_linkedin_url: 'https://www.linkedin.com/company/kirkland-ellis-llp' }

// Wrong -- using a person URL instead of company URL
{ company_linkedin_url: 'https://www.linkedin.com/in/john-doe' }

The enrichment script filters with job_function: ["Legal"]. If the firm's employees are not tagged with this job function on LinkedIn, results will be empty. This is rare for law firms but possible for smaller or newer firms.

Workaround: Try without the job function filter and inspect results manually.

Pattern Detection Fails

Symptoms: detect-profile-patterns.ts returns no patterns for a firm or reports "not enough attorneys."

Cause: Pattern detection requires enriched attorneys in the database to test URL patterns. It needs at least 2 attorneys per firm to validate a pattern.

Verify attorney count:

SELECT f.name, count(a.id) AS attorney_count
FROM firms f
LEFT JOIN attorneys a ON a.firm_id = f.id
WHERE f.name ILIKE '%<firm-name>%'
GROUP BY f.name;

Fix: Run the enrichment pipeline first to populate attorneys, then run pattern detection:

# Step 1: Enrich the firm
npx tsx scripts/enrich-attorneys.ts --firm "<firm-name>"

# Step 2: Detect patterns
npx tsx scripts/detect-profile-patterns.ts --firm "<firm-name>" --save

Non-Standard URL Patterns

Some firms use unusual URL patterns that are not in the 37 tested patterns:

FirmIssue
Kirkland & EllisUses /lawyers/{last_initial}/{last}-{first} (non-standard)
SkaddenJavaScript-rendered pages -- HTTP requests return empty HTML
Some boutiquesNo attorney bio pages at all

For JS-rendered pages, use scrape-attorney-profiles.ts with Puppeteer and the stealth plugin.

Checkpoint Corruption

Symptoms: --resume flag causes errors or processes the wrong firms. Script crashes on startup when reading checkpoint.

Cause: The .enrichment-progress.json file was partially written during a crash.

Fix: Delete the checkpoint file and restart:

rm .enrichment-progress.json
npx tsx scripts/enrich-attorneys.ts --tier amlaw_10

Inspect the checkpoint (if you want to salvage progress):

cat .enrichment-progress.json | python3 -m json.tool

The checkpoint contains:

  • List of completed firm IDs
  • List of completed attorney LinkedIn URLs
  • Timestamp of last save

Credit Tracking

Check remaining credits:

curl -H "x-api-key: $BLITZ_API_KEY" https://api.blitz-api.ai/api/blitz/key-info

Estimate cost before running:

OperationCost per Unit
Employee-finder1 credit per result returned
Email enrichment1 credit (only charged if email found)
Phone enrichment5 credits (only charged if phone found)

Typical cost per attorney: 2--7 credits.

For a full firm (~200 attorneys): 400--1,400 credits.

For all AmLaw 200 firms: Budget 80,000--280,000 credits (rough estimate).

Use --skip-phone to reduce costs by 5 credits per attorney when phone numbers are not immediately needed:

npx tsx scripts/enrich-attorneys.ts --tier amlaw_100 --skip-phone

JS-Rendered Firm Pages

Symptoms: scrape-attorney-profiles.ts returns empty or partial HTML for a firm's website. Pattern detection validates a URL pattern, but scraping gets no data.

Cause: Some firm websites (e.g., Skadden, some boutiques) render attorney bio pages with JavaScript. A simple HTTP GET returns an empty shell page.

Fix: Use Puppeteer with the puppeteer-extra-plugin-stealth package:

import puppeteer from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';
puppeteer.use(StealthPlugin());

const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle0' });
const content = await page.content();

The networkidle0 option waits until there are no network requests for 500ms, ensuring JS-rendered content has loaded.

Rate Limiting

Symptoms: Blitz API returns 429 Too Many Requests, or responses become empty/slow.

Built-in rate limits in the enrichment script:

  • 700ms between individual attorney enrichment calls
  • 3 seconds between firms

Do not reduce these values. The Blitz API will throttle or block your API key if you exceed their rate limits.

If you are being rate-limited:

  1. Wait 5--10 minutes for the rate limit window to reset
  2. Check if another process is also making Blitz API calls
  3. Use --resume to continue from where the script left off
  4. If persistent, contact Blitz API support to check your rate limit tier

email_status Insert Failure

Symptoms: Enrichment script fails when inserting an attorney with email_status set to an invalid value.

Cause: The email_status enum only accepts 4 values:

Valid Values
valid
invalid
catch_all
unknown

The Blitz API may return status strings that do not map directly. The enrichment script should normalize the response:

const normalizeEmailStatus = (status: string): string => {
const normalized = status.toLowerCase();
if (['valid', 'invalid', 'catch_all', 'unknown'].includes(normalized)) {
return normalized;
}
return 'unknown'; // Default for any unrecognized status
};

Quick Diagnostic Checklist

When an enrichment run fails, check these in order:

  1. API key valid? -- curl -H "x-api-key: $BLITZ_API_KEY" https://api.blitz-api.ai/api/blitz/key-info
  2. Credits remaining? -- Same endpoint, check the balance
  3. Supabase connection? -- Check VITE_SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY are set
  4. Correct field names? -- person_linkedin_url for v2 (not linkedin_profile_url)
  5. Correct column names? -- seniority (not seniority_level), email_status (valid enum values only)
  6. Rate limited? -- Check for 429 responses in script output
  7. Checkpoint issue? -- Delete .enrichment-progress.json if corrupted