How to Audit Thousands of URLs Programmatically with an SEO API
You manage a site with 500 product pages, 200 blog posts, and a dozen landing pages. Auditing them one by one in a web-based SEO tool would take days. A programmatic approach takes minutes. You write a script, feed it your URLs, and get a structured report with every issue across your entire site. Here is how to build a bulk SEO auditing workflow with the SEOPeek API, including working code in Python and Node.js.
API-Based Auditing vs. Crawler-Based Tools
There are two ways to audit a large site: crawl it or call an API for each URL. Crawlers like Screaming Frog or Sitebulb start from your homepage, follow links, and build a map of your site. They are powerful but heavyweight. You need to install desktop software, configure crawl settings, wait for the crawl to finish, then export results.
An API-based approach is different. You already know which URLs you want to audit. Maybe you have a sitemap, a database of product pages, or a list of URLs from your CMS. You send each URL to the API and get structured JSON back. No crawling, no link discovery, no desktop software.
| Factor | API-based (SEOPeek) | Crawler-based |
|---|---|---|
| Setup | Zero (HTTP request) | Install + configure |
| URL selection | You choose exactly which URLs | Discovers via crawling |
| Speed (1,000 URLs) | ~20 min with concurrency | 30–60 min typical |
| Output format | JSON per URL (scriptable) | CSV/Excel export |
| Automation | Script, CI/CD, cron | Limited (CLI mode) |
| Cost | Free tier or $9/mo | $259/yr (Screaming Frog) |
| Link discovery | No (you provide URLs) | Yes |
Use a crawler when you need to discover pages you did not know existed (orphan pages, broken internal links). Use an API when you have a known list of URLs and want fast, repeatable, scriptable audits.
Bulk Auditing with Python
Here is a complete Python script that reads URLs from a file, audits each one with the SEOPeek API, respects rate limits, and writes results to a CSV report:
import requests
import csv
import time
import sys
API_BASE = "https://seopeek.web.app/api/audit"
CONCURRENCY_DELAY = 0.5 # seconds between requests
OUTPUT_FILE = "seo_audit_report.csv"
def audit_url(url):
"""Audit a single URL and return structured results."""
try:
r = requests.get(f"{API_BASE}?url={url}", timeout=10)
r.raise_for_status()
return r.json()
except requests.RequestException as e:
return {"url": url, "error": str(e)}
def main():
# Read URLs from file (one per line)
url_file = sys.argv[1] if len(sys.argv) > 1 else "urls.txt"
with open(url_file) as f:
urls = [line.strip() for line in f if line.strip()]
print(f"Auditing {len(urls)} URLs...")
results = []
for i, url in enumerate(urls):
print(f" [{i+1}/{len(urls)}] {url}")
data = audit_url(url)
if "error" in data:
results.append({"url": url, "score": "ERROR", "grade": "-",
"failing_checks": data["error"]})
else:
# Collect names of failing checks
failing = [name for name, check in data.get("checks", {}).items()
if not check.get("pass")]
results.append({
"url": data["url"],
"score": data["score"],
"grade": data["grade"],
"failing_checks": "; ".join(failing) if failing else "None",
})
time.sleep(CONCURRENCY_DELAY)
# Write CSV report
with open(OUTPUT_FILE, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["url", "score", "grade", "failing_checks"])
writer.writeheader()
writer.writerows(results)
# Print summary
scores = [r["score"] for r in results if isinstance(r["score"], int)]
avg = sum(scores) / len(scores) if scores else 0
below_70 = sum(1 for s in scores if s < 70)
print(f"\nDone. Report saved to {OUTPUT_FILE}")
print(f" Average score: {avg:.1f}")
print(f" Pages below 70: {below_70}/{len(scores)}")
if __name__ == "__main__":
main()
Run it with python audit.py urls.txt. The script processes URLs sequentially with a half-second delay between requests. For the free tier (50 audits/day), this is sufficient. For larger volumes on a paid plan, you can reduce the delay or add concurrency.
Adding Concurrency with asyncio
For auditing thousands of URLs on a paid plan, sequential requests are too slow. Here is an async version that runs 5 requests in parallel:
import asyncio
import aiohttp
SEMAPHORE = asyncio.Semaphore(5) # max 5 concurrent requests
async def audit_url(session, url):
async with SEMAPHORE:
async with session.get(
f"https://seopeek.web.app/api/audit?url={url}"
) as resp:
return await resp.json()
async def main():
urls = open("urls.txt").read().splitlines()
async with aiohttp.ClientSession() as session:
tasks = [audit_url(session, url) for url in urls]
results = await asyncio.gather(*tasks, return_exceptions=True)
for result in results:
if isinstance(result, dict):
print(f"{result.get('url')} — {result.get('score')} ({result.get('grade')})")
else:
print(f"Error: {result}")
asyncio.run(main())
The semaphore limits concurrency to 5 parallel requests, which keeps you well within rate limits while auditing roughly 150 URLs per minute.
Bulk Auditing with Node.js
Here is the equivalent workflow in Node.js, using native fetch with a concurrency pool:
import { readFileSync, writeFileSync } from "fs";
const API_BASE = "https://seopeek.web.app/api/audit";
const MAX_CONCURRENT = 5;
async function auditUrl(url) {
const res = await fetch(`${API_BASE}?url=${encodeURIComponent(url)}`);
if (!res.ok) throw new Error(`HTTP ${res.status} for ${url}`);
return res.json();
}
async function auditBatch(urls) {
const results = [];
for (let i = 0; i < urls.length; i += MAX_CONCURRENT) {
const batch = urls.slice(i, i + MAX_CONCURRENT);
const batchResults = await Promise.allSettled(
batch.map((url) => auditUrl(url))
);
for (const result of batchResults) {
if (result.status === "fulfilled") {
results.push(result.value);
} else {
results.push({ url: batch[0], error: result.reason.message });
}
}
// Brief pause between batches
if (i + MAX_CONCURRENT < urls.length) {
await new Promise((r) => setTimeout(r, 500));
}
}
return results;
}
// Main
const urls = readFileSync("urls.txt", "utf-8")
.split("\n")
.filter(Boolean);
console.log(`Auditing ${urls.length} URLs...`);
const results = await auditBatch(urls);
// Generate report
const report = results.map((r) => ({
url: r.url,
score: r.score ?? "ERROR",
grade: r.grade ?? "-",
failing: r.checks
? Object.entries(r.checks)
.filter(([, c]) => !c.pass)
.map(([name]) => name)
.join("; ")
: r.error || "",
}));
// CSV output
const header = "url,score,grade,failing_checks";
const rows = report.map(
(r) => `"${r.url}",${r.score},${r.grade},"${r.failing}"`
);
writeFileSync("seo_audit_report.csv", [header, ...rows].join("\n"));
// Summary
const scores = results.filter((r) => r.score).map((r) => r.score);
const avg = scores.reduce((a, b) => a + b, 0) / scores.length;
console.log(`Average score: ${avg.toFixed(1)}`);
console.log(`Pages below 70: ${scores.filter((s) => s < 70).length}/${scores.length}`);
This script processes URLs in batches of 5, pauses 500ms between batches, and writes a CSV report at the end. Run it with node audit.mjs.
Result Aggregation and Reporting
Raw audit data is useful. Aggregated data is actionable. After auditing hundreds of URLs, you want answers to specific questions:
- What is my site-wide average SEO score? This is your baseline. Track it over time to measure improvement.
- Which checks fail most often? If 60% of your pages are missing meta descriptions, that is a systematic issue worth fixing at the template level, not page by page.
- Which pages are the worst offenders? Sort by score ascending to find pages that need the most work. Prioritize high-traffic pages.
- Are there patterns by section? Maybe your blog posts score 85 on average but product pages score 55. That tells you exactly where to focus.
Here is a Python snippet that aggregates results from the CSV report:
import csv
from collections import Counter
with open("seo_audit_report.csv") as f:
rows = list(csv.DictReader(f))
scores = [int(r["score"]) for r in rows if r["score"] != "ERROR"]
print(f"Total pages: {len(rows)}")
print(f"Average score: {sum(scores) / len(scores):.1f}")
print(f"Pages scoring A (90+): {sum(1 for s in scores if s >= 90)}")
print(f"Pages scoring F (<50): {sum(1 for s in scores if s < 50)}")
# Most common failing checks
all_fails = []
for r in rows:
if r["failing_checks"] and r["failing_checks"] != "None":
all_fails.extend(r["failing_checks"].split("; "))
print("\nMost common failures:")
for check, count in Counter(all_fails).most_common(10):
pct = count / len(rows) * 100
print(f" {check}: {count} pages ({pct:.0f}%)")
Pro tip: Run this audit weekly on a cron job and save each report with a timestamp. After a month, you can plot your site-wide SEO score over time and see whether your fixes are moving the needle.
Rate Limiting and Best Practices
When auditing at scale, you need to be a good API citizen. Here are the patterns that matter:
- Respect rate limits: SEOPeek's free tier allows 50 audits per day. The Pro plan ($9/mo) gives 1,000 per month. The Business plan ($29/mo) gives 10,000. Plan your concurrency and scheduling around your tier.
- Use exponential backoff: If you get a 429 (rate limited) or 5xx response, wait 1 second and retry. Double the wait on each subsequent failure, up to 30 seconds. Do not retry indefinitely.
- Cache results: If you audit the same URL multiple times in a day (for example, in CI and in a nightly report), cache the first result locally. No need to burn quota on duplicate checks.
- Batch by priority: Audit your highest-traffic pages first. If you hit a limit, you have already covered the pages that matter most.
- Log everything: Save the full JSON response for each URL, not just the score. You will want the detailed check results when you go back to fix issues.
Here is a simple retry wrapper in Python:
import time
import requests
def audit_with_retry(url, max_retries=3):
delay = 1
for attempt in range(max_retries):
r = requests.get(f"https://seopeek.web.app/api/audit?url={url}")
if r.status_code == 200:
return r.json()
if r.status_code == 429:
print(f" Rate limited. Waiting {delay}s...")
time.sleep(delay)
delay *= 2
else:
r.raise_for_status()
raise Exception(f"Failed after {max_retries} retries: {url}")
Building a Recurring Audit Pipeline
A one-time audit gives you a snapshot. A recurring audit gives you a trend. The most effective setup combines a weekly full-site audit with daily checks on your top pages:
- Daily (top 20 pages): Audit your homepage, pricing page, and highest-traffic landing pages. Alert immediately if any score drops more than 10 points from the previous day.
- Weekly (full site): Audit every URL in your sitemap. Generate a summary report with averages, worst pages, and most common failures. Compare to last week.
- On deploy (changed pages): Integrate into your CI/CD pipeline to audit pages affected by each deployment. Catch regressions before they reach production.
Use cron, GitHub Actions scheduled workflows, or any task scheduler to automate this. The scripts above are ready to run on a schedule with no modifications—just point them at your URL list and pipe the output to a file or a Slack webhook.
Start Auditing at Scale
50 free audits per day. No API key, no signup. Send a URL, get 20 on-page SEO checks back in under 2 seconds.
Try SEOPeek free →Conclusion
Manual SEO auditing does not scale. If you have more than a handful of pages, you need a programmatic SEO audit tool that you can script, schedule, and integrate into your existing workflows. The API-based approach gives you structured JSON data for every URL, which you can aggregate, trend, and act on without ever opening a browser.
SEOPeek's API runs 20 on-page checks per URL in under 2 seconds. The free tier covers testing and small sites. Paid plans start at $9/month for teams that need volume. Whether you are an agency auditing client sites, a developer monitoring a large application, or an SEO team tracking improvements over time, the scripts in this guide give you everything you need to get started.