March 28, 2026 · 8 min read

How to Audit Thousands of URLs Programmatically with an SEO API

You manage a site with 500 product pages, 200 blog posts, and a dozen landing pages. Auditing them one by one in a web-based SEO tool would take days. A programmatic approach takes minutes. You write a script, feed it your URLs, and get a structured report with every issue across your entire site. Here is how to build a bulk SEO auditing workflow with the SEOPeek API, including working code in Python and Node.js.

API-Based Auditing vs. Crawler-Based Tools

There are two ways to audit a large site: crawl it or call an API for each URL. Crawlers like Screaming Frog or Sitebulb start from your homepage, follow links, and build a map of your site. They are powerful but heavyweight. You need to install desktop software, configure crawl settings, wait for the crawl to finish, then export results.

An API-based approach is different. You already know which URLs you want to audit. Maybe you have a sitemap, a database of product pages, or a list of URLs from your CMS. You send each URL to the API and get structured JSON back. No crawling, no link discovery, no desktop software.

Factor API-based (SEOPeek) Crawler-based
Setup Zero (HTTP request) Install + configure
URL selection You choose exactly which URLs Discovers via crawling
Speed (1,000 URLs) ~20 min with concurrency 30–60 min typical
Output format JSON per URL (scriptable) CSV/Excel export
Automation Script, CI/CD, cron Limited (CLI mode)
Cost Free tier or $9/mo $259/yr (Screaming Frog)
Link discovery No (you provide URLs) Yes

Use a crawler when you need to discover pages you did not know existed (orphan pages, broken internal links). Use an API when you have a known list of URLs and want fast, repeatable, scriptable audits.

Bulk Auditing with Python

Here is a complete Python script that reads URLs from a file, audits each one with the SEOPeek API, respects rate limits, and writes results to a CSV report:

import requests
import csv
import time
import sys

API_BASE = "https://seopeek.web.app/api/audit"
CONCURRENCY_DELAY = 0.5  # seconds between requests
OUTPUT_FILE = "seo_audit_report.csv"

def audit_url(url):
    """Audit a single URL and return structured results."""
    try:
        r = requests.get(f"{API_BASE}?url={url}", timeout=10)
        r.raise_for_status()
        return r.json()
    except requests.RequestException as e:
        return {"url": url, "error": str(e)}

def main():
    # Read URLs from file (one per line)
    url_file = sys.argv[1] if len(sys.argv) > 1 else "urls.txt"
    with open(url_file) as f:
        urls = [line.strip() for line in f if line.strip()]

    print(f"Auditing {len(urls)} URLs...")
    results = []

    for i, url in enumerate(urls):
        print(f"  [{i+1}/{len(urls)}] {url}")
        data = audit_url(url)

        if "error" in data:
            results.append({"url": url, "score": "ERROR", "grade": "-",
                            "failing_checks": data["error"]})
        else:
            # Collect names of failing checks
            failing = [name for name, check in data.get("checks", {}).items()
                       if not check.get("pass")]
            results.append({
                "url": data["url"],
                "score": data["score"],
                "grade": data["grade"],
                "failing_checks": "; ".join(failing) if failing else "None",
            })

        time.sleep(CONCURRENCY_DELAY)

    # Write CSV report
    with open(OUTPUT_FILE, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["url", "score", "grade", "failing_checks"])
        writer.writeheader()
        writer.writerows(results)

    # Print summary
    scores = [r["score"] for r in results if isinstance(r["score"], int)]
    avg = sum(scores) / len(scores) if scores else 0
    below_70 = sum(1 for s in scores if s < 70)

    print(f"\nDone. Report saved to {OUTPUT_FILE}")
    print(f"  Average score: {avg:.1f}")
    print(f"  Pages below 70: {below_70}/{len(scores)}")

if __name__ == "__main__":
    main()

Run it with python audit.py urls.txt. The script processes URLs sequentially with a half-second delay between requests. For the free tier (50 audits/day), this is sufficient. For larger volumes on a paid plan, you can reduce the delay or add concurrency.

Adding Concurrency with asyncio

For auditing thousands of URLs on a paid plan, sequential requests are too slow. Here is an async version that runs 5 requests in parallel:

import asyncio
import aiohttp

SEMAPHORE = asyncio.Semaphore(5)  # max 5 concurrent requests

async def audit_url(session, url):
    async with SEMAPHORE:
        async with session.get(
            f"https://seopeek.web.app/api/audit?url={url}"
        ) as resp:
            return await resp.json()

async def main():
    urls = open("urls.txt").read().splitlines()
    async with aiohttp.ClientSession() as session:
        tasks = [audit_url(session, url) for url in urls]
        results = await asyncio.gather(*tasks, return_exceptions=True)

    for result in results:
        if isinstance(result, dict):
            print(f"{result.get('url')} — {result.get('score')} ({result.get('grade')})")
        else:
            print(f"Error: {result}")

asyncio.run(main())

The semaphore limits concurrency to 5 parallel requests, which keeps you well within rate limits while auditing roughly 150 URLs per minute.

Bulk Auditing with Node.js

Here is the equivalent workflow in Node.js, using native fetch with a concurrency pool:

import { readFileSync, writeFileSync } from "fs";

const API_BASE = "https://seopeek.web.app/api/audit";
const MAX_CONCURRENT = 5;

async function auditUrl(url) {
  const res = await fetch(`${API_BASE}?url=${encodeURIComponent(url)}`);
  if (!res.ok) throw new Error(`HTTP ${res.status} for ${url}`);
  return res.json();
}

async function auditBatch(urls) {
  const results = [];
  for (let i = 0; i < urls.length; i += MAX_CONCURRENT) {
    const batch = urls.slice(i, i + MAX_CONCURRENT);
    const batchResults = await Promise.allSettled(
      batch.map((url) => auditUrl(url))
    );

    for (const result of batchResults) {
      if (result.status === "fulfilled") {
        results.push(result.value);
      } else {
        results.push({ url: batch[0], error: result.reason.message });
      }
    }

    // Brief pause between batches
    if (i + MAX_CONCURRENT < urls.length) {
      await new Promise((r) => setTimeout(r, 500));
    }
  }
  return results;
}

// Main
const urls = readFileSync("urls.txt", "utf-8")
  .split("\n")
  .filter(Boolean);

console.log(`Auditing ${urls.length} URLs...`);
const results = await auditBatch(urls);

// Generate report
const report = results.map((r) => ({
  url: r.url,
  score: r.score ?? "ERROR",
  grade: r.grade ?? "-",
  failing: r.checks
    ? Object.entries(r.checks)
        .filter(([, c]) => !c.pass)
        .map(([name]) => name)
        .join("; ")
    : r.error || "",
}));

// CSV output
const header = "url,score,grade,failing_checks";
const rows = report.map(
  (r) => `"${r.url}",${r.score},${r.grade},"${r.failing}"`
);
writeFileSync("seo_audit_report.csv", [header, ...rows].join("\n"));

// Summary
const scores = results.filter((r) => r.score).map((r) => r.score);
const avg = scores.reduce((a, b) => a + b, 0) / scores.length;
console.log(`Average score: ${avg.toFixed(1)}`);
console.log(`Pages below 70: ${scores.filter((s) => s < 70).length}/${scores.length}`);

This script processes URLs in batches of 5, pauses 500ms between batches, and writes a CSV report at the end. Run it with node audit.mjs.

Result Aggregation and Reporting

Raw audit data is useful. Aggregated data is actionable. After auditing hundreds of URLs, you want answers to specific questions:

Here is a Python snippet that aggregates results from the CSV report:

import csv
from collections import Counter

with open("seo_audit_report.csv") as f:
    rows = list(csv.DictReader(f))

scores = [int(r["score"]) for r in rows if r["score"] != "ERROR"]

print(f"Total pages: {len(rows)}")
print(f"Average score: {sum(scores) / len(scores):.1f}")
print(f"Pages scoring A (90+): {sum(1 for s in scores if s >= 90)}")
print(f"Pages scoring F (<50): {sum(1 for s in scores if s < 50)}")

# Most common failing checks
all_fails = []
for r in rows:
    if r["failing_checks"] and r["failing_checks"] != "None":
        all_fails.extend(r["failing_checks"].split("; "))

print("\nMost common failures:")
for check, count in Counter(all_fails).most_common(10):
    pct = count / len(rows) * 100
    print(f"  {check}: {count} pages ({pct:.0f}%)")

Pro tip: Run this audit weekly on a cron job and save each report with a timestamp. After a month, you can plot your site-wide SEO score over time and see whether your fixes are moving the needle.

Rate Limiting and Best Practices

When auditing at scale, you need to be a good API citizen. Here are the patterns that matter:

Here is a simple retry wrapper in Python:

import time
import requests

def audit_with_retry(url, max_retries=3):
    delay = 1
    for attempt in range(max_retries):
        r = requests.get(f"https://seopeek.web.app/api/audit?url={url}")
        if r.status_code == 200:
            return r.json()
        if r.status_code == 429:
            print(f"  Rate limited. Waiting {delay}s...")
            time.sleep(delay)
            delay *= 2
        else:
            r.raise_for_status()
    raise Exception(f"Failed after {max_retries} retries: {url}")

Building a Recurring Audit Pipeline

A one-time audit gives you a snapshot. A recurring audit gives you a trend. The most effective setup combines a weekly full-site audit with daily checks on your top pages:

Use cron, GitHub Actions scheduled workflows, or any task scheduler to automate this. The scripts above are ready to run on a schedule with no modifications—just point them at your URL list and pipe the output to a file or a Slack webhook.

Start Auditing at Scale

50 free audits per day. No API key, no signup. Send a URL, get 20 on-page SEO checks back in under 2 seconds.

Try SEOPeek free →

Conclusion

Manual SEO auditing does not scale. If you have more than a handful of pages, you need a programmatic SEO audit tool that you can script, schedule, and integrate into your existing workflows. The API-based approach gives you structured JSON data for every URL, which you can aggregate, trend, and act on without ever opening a browser.

SEOPeek's API runs 20 on-page checks per URL in under 2 seconds. The free tier covers testing and small sites. Paid plans start at $9/month for teams that need volume. Whether you are an agency auditing client sites, a developer monitoring a large application, or an SEO team tracking improvements over time, the scripts in this guide give you everything you need to get started.

More developer APIs from the Peek Suite