March 29, 2026 · 16 min read

Build a Bulk SEO Audit API with FastAPI and SEOPeek

Running SEO audits one URL at a time is fine for spot checks—but when you manage dozens of sites, hundreds of pages, or need to audit an entire sitemap before every deployment, you need a bulk audit API. FastAPI’s native async support, combined with the SEOPeek audit API, makes it possible to audit hundreds of URLs concurrently, export results as CSV or JSON, and process massive batches in the background—all with production-ready Python code you can deploy today.

In this guide
  1. Why you need a bulk SEO audit API
  2. Pydantic models for type-safe requests and responses
  3. Async SEOPeek client with httpx
  4. Batch audit endpoint with concurrency control
  5. CSV and JSON export of audit results
  6. Background task processing for large batches
  7. Caching results with Redis and in-memory TTL
  8. Putting it all together: the complete app
  9. FAQ

1. Why You Need a Bulk SEO Audit API

Individual URL auditing works when you are debugging a single page. It does not scale to real-world SEO operations. Here are the scenarios where a bulk audit API pays for itself:

Python is the natural choice for this kind of automation. FastAPI gives you async request handling out of the box, Pydantic provides automatic validation and serialization, and httpx lets you make concurrent HTTP calls without threading complexity. Combined with the SEOPeek API—which returns structured JSON audit results for any URL—you can build a production-grade bulk audit service in under 200 lines of code.

The SEOPeek API audits the rendered HTML of any URL and returns a numeric score, letter grade, and pass/fail results for 20+ on-page SEO checks—title, description, OG tags, canonical URL, heading structure, and structured data. One GET request, JSON response, under 2 seconds.

2. Pydantic Models for Type-Safe Requests and Responses

Start by defining strict Pydantic models for every data structure in the system. This gives you automatic request validation, response serialization, and self-documenting API schemas in FastAPI’s built-in Swagger UI:

# models.py
from pydantic import BaseModel, HttpUrl, Field
from typing import Optional
from enum import Enum


class AuditCheck(BaseModel):
    name: str
    passed: bool
    message: str
    value: Optional[str] = None


class PageMeta(BaseModel):
    title: Optional[str] = None
    description: Optional[str] = None
    canonical: Optional[str] = None
    og_title: Optional[str] = Field(None, alias="ogTitle")
    og_description: Optional[str] = Field(None, alias="ogDescription")
    og_image: Optional[str] = Field(None, alias="ogImage")

    class Config:
        populate_by_name = True


class AuditResult(BaseModel):
    url: str
    score: int
    grade: str
    checks: list[AuditCheck]
    meta: PageMeta
    timestamp: str
    error: Optional[str] = None


class ExportFormat(str, Enum):
    json = "json"
    csv = "csv"


class BulkAuditRequest(BaseModel):
    urls: list[HttpUrl] = Field(
        ...,
        min_length=1,
        max_length=200,
        description="List of URLs to audit (max 200)"
    )
    concurrency: int = Field(
        default=5,
        ge=1,
        le=20,
        description="Max concurrent requests (1-20)"
    )
    export_format: ExportFormat = Field(
        default=ExportFormat.json,
        description="Response format: json or csv"
    )


class BulkAuditResponse(BaseModel):
    total: int
    passed: int
    failed: int
    average_score: float
    results: list[AuditResult]


class JobStatus(str, Enum):
    pending = "pending"
    running = "running"
    completed = "completed"
    failed = "failed"


class BackgroundJobResponse(BaseModel):
    job_id: str
    status: JobStatus
    total_urls: int
    message: str


class JobResultResponse(BaseModel):
    job_id: str
    status: JobStatus
    result: Optional[BulkAuditResponse] = None

Notice that BulkAuditRequest validates URL format using Pydantic’s HttpUrl type, enforces a maximum of 200 URLs, and constrains concurrency between 1 and 20. Any request that violates these constraints gets a 422 response with a clear error message—no manual validation code required.

3. Async SEOPeek Client with httpx

The core of the system is an async client that calls the SEOPeek API using httpx. Unlike the requests library, httpx supports async/await natively, which means you can fire off multiple audit requests concurrently without blocking the event loop:

# seopeek_client.py
import httpx
from models import AuditResult, AuditCheck, PageMeta
from datetime import datetime, timezone

SEOPEEK_API = (
    "https://us-central1-todd-agent-prod.cloudfunctions.net"
    "/seopeekApi/api/v1/audit"
)


async def audit_single_url(
    client: httpx.AsyncClient,
    url: str,
) -> AuditResult:
    """Audit a single URL via the SEOPeek API."""
    try:
        response = await client.get(
            SEOPEEK_API,
            params={"url": url},
            timeout=30.0,
        )
        response.raise_for_status()
        data = response.json()

        return AuditResult(
            url=data["url"],
            score=data["score"],
            grade=data["grade"],
            checks=[AuditCheck(**c) for c in data["checks"]],
            meta=PageMeta(**data["meta"]),
            timestamp=data["timestamp"],
        )
    except (httpx.HTTPError, KeyError, ValueError) as exc:
        return AuditResult(
            url=url,
            score=0,
            grade="F",
            checks=[],
            meta=PageMeta(),
            timestamp=datetime.now(timezone.utc).isoformat(),
            error=str(exc),
        )

The client handles errors gracefully—if a URL fails to audit (network timeout, invalid response, API error), it returns a result with score: 0 and the error message attached. This means batch operations never crash halfway through; you always get a complete set of results.

4. Batch Audit Endpoint with Concurrency Control

The batch endpoint accepts a list of URLs and audits them concurrently using asyncio.gather. The critical detail is the semaphore—without it, sending 200 URLs simultaneously would overwhelm both the SEOPeek API and your own server. The semaphore limits how many requests are in flight at any given time:

# main.py
import asyncio
import httpx
from fastapi import FastAPI, Query
from models import (
    BulkAuditRequest,
    BulkAuditResponse,
    AuditResult,
)
from seopeek_client import audit_single_url

app = FastAPI(
    title="Bulk SEO Audit API",
    description="Audit hundreds of URLs concurrently with SEOPeek",
    version="1.0.0",
)


async def audit_with_semaphore(
    semaphore: asyncio.Semaphore,
    client: httpx.AsyncClient,
    url: str,
) -> AuditResult:
    """Wrap audit call with a semaphore for rate limiting."""
    async with semaphore:
        return await audit_single_url(client, url)


@app.post("/api/audit/bulk", response_model=BulkAuditResponse)
async def bulk_audit(request: BulkAuditRequest):
    """
    Audit multiple URLs concurrently.

    Accepts up to 200 URLs and processes them with
    configurable concurrency (default 5, max 20).
    """
    semaphore = asyncio.Semaphore(request.concurrency)

    async with httpx.AsyncClient() as client:
        tasks = [
            audit_with_semaphore(semaphore, client, str(url))
            for url in request.urls
        ]
        results: list[AuditResult] = await asyncio.gather(*tasks)

    passed = sum(1 for r in results if r.score >= 70)
    avg = sum(r.score for r in results) / len(results)

    return BulkAuditResponse(
        total=len(results),
        passed=passed,
        failed=len(results) - passed,
        average_score=round(avg, 1),
        results=results,
    )

Test it with curl:

curl -X POST http://localhost:8000/api/audit/bulk \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com",
      "https://example.com/about",
      "https://example.com/pricing",
      "https://example.com/blog"
    ],
    "concurrency": 5
  }'

With concurrency: 5, four URLs will complete in roughly the time it takes to audit one. For 100 URLs, the total time drops from roughly 200 seconds (sequential) to about 40 seconds (5-way concurrency). Increase the concurrency value if your SEOPeek plan supports higher throughput.

5. CSV and JSON Export of Audit Results

SEO teams often need audit results in spreadsheet-friendly formats. Add a dedicated export endpoint that returns either JSON or CSV based on a query parameter:

# Add to main.py
import csv
import io
from fastapi.responses import StreamingResponse


@app.post("/api/audit/export")
async def export_audit(request: BulkAuditRequest):
    """
    Audit URLs and export results as JSON or CSV.
    """
    semaphore = asyncio.Semaphore(request.concurrency)

    async with httpx.AsyncClient() as client:
        tasks = [
            audit_with_semaphore(semaphore, client, str(url))
            for url in request.urls
        ]
        results = await asyncio.gather(*tasks)

    if request.export_format == "csv":
        return _results_to_csv(results)

    # Default: JSON
    passed = sum(1 for r in results if r.score >= 70)
    avg = sum(r.score for r in results) / len(results)
    return BulkAuditResponse(
        total=len(results),
        passed=passed,
        failed=len(results) - passed,
        average_score=round(avg, 1),
        results=results,
    )


def _results_to_csv(results: list[AuditResult]) -> StreamingResponse:
    """Convert audit results to a downloadable CSV file."""
    output = io.StringIO()
    writer = csv.writer(output)

    # Header row
    writer.writerow([
        "URL", "Score", "Grade", "Title", "Description",
        "Canonical", "OG Title", "OG Image",
        "Checks Passed", "Checks Failed", "Error",
    ])

    for r in results:
        checks_passed = sum(1 for c in r.checks if c.passed)
        checks_failed = len(r.checks) - checks_passed
        writer.writerow([
            r.url,
            r.score,
            r.grade,
            r.meta.title or "",
            r.meta.description or "",
            r.meta.canonical or "",
            r.meta.og_title or "",
            r.meta.og_image or "",
            checks_passed,
            checks_failed,
            r.error or "",
        ])

    output.seek(0)
    return StreamingResponse(
        iter([output.getvalue()]),
        media_type="text/csv",
        headers={
            "Content-Disposition": "attachment; filename=seo-audit.csv"
        },
    )

To download a CSV report:

curl -X POST http://localhost:8000/api/audit/export \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com"], "export_format": "csv"}' \
  -o seo-audit.csv

The CSV includes the most important fields—score, grade, title, OG image presence, and error status—so you can open it in Google Sheets or Excel and immediately see which pages need attention.

6. Background Task Processing for Large Batches

For batches larger than 50 URLs, synchronous processing can timeout. FastAPI’s BackgroundTasks lets you accept the request immediately and process the audit in the background. The client polls a status endpoint to retrieve results when ready:

# Add to main.py
import uuid
from fastapi import BackgroundTasks
from models import (
    BackgroundJobResponse,
    JobResultResponse,
    JobStatus,
)

# In-memory job store (use Redis in production)
jobs: dict[str, JobResultResponse] = {}


async def _run_bulk_audit(
    job_id: str,
    urls: list[str],
    concurrency: int,
):
    """Background task: run the audit and store results."""
    jobs[job_id].status = JobStatus.running

    try:
        semaphore = asyncio.Semaphore(concurrency)

        async with httpx.AsyncClient() as client:
            tasks = [
                audit_with_semaphore(semaphore, client, url)
                for url in urls
            ]
            results = await asyncio.gather(*tasks)

        passed = sum(1 for r in results if r.score >= 70)
        avg = sum(r.score for r in results) / len(results)

        jobs[job_id].status = JobStatus.completed
        jobs[job_id].result = BulkAuditResponse(
            total=len(results),
            passed=passed,
            failed=len(results) - passed,
            average_score=round(avg, 1),
            results=results,
        )
    except Exception as exc:
        jobs[job_id].status = JobStatus.failed
        jobs[job_id].result = None


@app.post(
    "/api/audit/async",
    response_model=BackgroundJobResponse,
    status_code=202,
)
async def async_bulk_audit(
    request: BulkAuditRequest,
    background_tasks: BackgroundTasks,
):
    """
    Submit a large batch for background processing.

    Returns a job_id immediately. Poll /api/audit/jobs/{job_id}
    to check status and retrieve results.
    """
    job_id = str(uuid.uuid4())

    jobs[job_id] = JobResultResponse(
        job_id=job_id,
        status=JobStatus.pending,
        result=None,
    )

    background_tasks.add_task(
        _run_bulk_audit,
        job_id,
        [str(u) for u in request.urls],
        request.concurrency,
    )

    return BackgroundJobResponse(
        job_id=job_id,
        status=JobStatus.pending,
        total_urls=len(request.urls),
        message=f"Audit queued. Poll /api/audit/jobs/{job_id}",
    )


@app.get("/api/audit/jobs/{job_id}", response_model=JobResultResponse)
async def get_job_status(job_id: str):
    """Check the status of a background audit job."""
    if job_id not in jobs:
        from fastapi import HTTPException
        raise HTTPException(status_code=404, detail="Job not found")

    return jobs[job_id]

The workflow is straightforward: submit URLs to /api/audit/async, receive a job_id, then poll /api/audit/jobs/{job_id} until the status changes to completed. In production, replace the in-memory jobs dictionary with Redis so results survive server restarts and can be shared across multiple workers.

Production tip: For truly large batches (1,000+ URLs), consider using Celery with Redis as a broker instead of FastAPI BackgroundTasks. Celery provides distributed task execution, automatic retries, and result backends—but for most use cases, BackgroundTasks is simpler and sufficient.

7. Caching Results with Redis and In-Memory TTL

SEO metadata does not change every second. Caching audit results avoids redundant API calls and keeps you within your SEOPeek quota. Here are two approaches—in-memory for single-server deployments, and Redis for distributed setups:

Option A: In-Memory TTL Cache

Use the cachetools library for a zero-dependency caching layer. Results are cached for one hour by default:

# cache.py
from cachetools import TTLCache
from models import AuditResult

# Cache up to 1,000 results for 1 hour (3600 seconds)
_cache: TTLCache[str, AuditResult] = TTLCache(
    maxsize=1000,
    ttl=3600,
)


def get_cached(url: str) -> AuditResult | None:
    return _cache.get(url)


def set_cached(url: str, result: AuditResult) -> None:
    _cache[url] = result


def clear_cache() -> None:
    _cache.clear()

Option B: Redis Cache

For multi-worker deployments behind a load balancer, use Redis so all instances share the same cache:

# redis_cache.py
import json
import redis.asyncio as redis
from models import AuditResult

_redis = redis.from_url("redis://localhost:6379", decode_responses=True)
CACHE_TTL = 3600  # 1 hour


async def get_cached(url: str) -> AuditResult | None:
    data = await _redis.get(f"seo:audit:{url}")
    if data:
        return AuditResult.model_validate_json(data)
    return None


async def set_cached(url: str, result: AuditResult) -> None:
    await _redis.set(
        f"seo:audit:{url}",
        result.model_dump_json(),
        ex=CACHE_TTL,
    )


async def clear_cache() -> None:
    keys = []
    async for key in _redis.scan_iter("seo:audit:*"):
        keys.append(key)
    if keys:
        await _redis.delete(*keys)

Now update the audit function to check the cache before hitting the API:

# Updated seopeek_client.py
from cache import get_cached, set_cached  # or redis_cache


async def audit_single_url_cached(
    client: httpx.AsyncClient,
    url: str,
) -> AuditResult:
    """Audit with cache-first strategy."""
    # Check cache first
    cached = get_cached(url)       # use await for Redis
    if cached is not None:
        return cached

    # Cache miss: call the API
    result = await audit_single_url(client, url)

    # Cache successful results only
    if result.error is None:
        set_cached(url, result)    # use await for Redis

    return result

With caching enabled, auditing the same sitemap twice in an hour consumes zero additional API quota. The first run populates the cache, and subsequent runs return instantly from memory or Redis.

8. Putting It All Together

Here is the complete project structure and the commands to get it running:

# Project structure
seo-audit-api/
  main.py              # FastAPI app with all endpoints
  models.py            # Pydantic models
  seopeek_client.py    # Async SEOPeek API client
  cache.py             # In-memory TTL cache
  redis_cache.py       # Redis cache (optional)
  requirements.txt     # Dependencies
# requirements.txt
fastapi==0.115.0
uvicorn[standard]==0.30.0
httpx==0.27.0
pydantic==2.9.0
cachetools==5.5.0
redis==5.1.0          # optional, for Redis caching
# Install and run
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

Once running, open http://localhost:8000/docs to see the auto-generated Swagger UI with all endpoints documented, request/response schemas rendered from your Pydantic models, and a “Try it out” button for every route.

Here is a complete Python script that uses your new API to audit an entire sitemap and generate a report:

# scripts/audit_sitemap.py
"""Audit an entire sitemap using the bulk audit API."""
import httpx
import xml.etree.ElementTree as ET
import sys
import json


async def main(sitemap_url: str):
    async with httpx.AsyncClient(timeout=120) as client:
        # 1. Fetch and parse the sitemap
        resp = await client.get(sitemap_url)
        root = ET.fromstring(resp.text)

        ns = {"s": "http://www.sitemaps.org/schemas/sitemap/0.9"}
        urls = [
            loc.text
            for loc in root.findall(".//s:loc", ns)
            if loc.text
        ]

        print(f"Found {len(urls)} URLs in sitemap")

        # 2. Submit to bulk audit API
        result = await client.post(
            "http://localhost:8000/api/audit/bulk",
            json={"urls": urls, "concurrency": 10},
        )
        data = result.json()

        # 3. Print summary
        print(f"\nAverage Score: {data['average_score']}")
        print(f"Passed (>=70): {data['passed']}")
        print(f"Failed (<70):  {data['failed']}")

        # 4. Show worst pages
        worst = sorted(data["results"], key=lambda r: r["score"])
        print("\nWorst 5 pages:")
        for r in worst[:5]:
            print(f"  {r['score']}/100  {r['url']}")

        # 5. Save full report
        with open("audit-report.json", "w") as f:
            json.dump(data, f, indent=2)
        print("\nFull report saved to audit-report.json")


if __name__ == "__main__":
    import asyncio
    url = sys.argv[1] if len(sys.argv) > 1 else "https://example.com/sitemap.xml"
    asyncio.run(main(url))

Run it against any sitemap:

python scripts/audit_sitemap.py https://yoursite.com/sitemap.xml

Start Auditing for Free

50 free audits per day. No API key required. JSON response in under 2 seconds.

Start auditing for free →

Frequently Asked Questions

How many URLs can I audit in a single batch with the SEOPeek API?

The free tier allows 50 audits per day. With the Starter plan ($9/month) you get 1,000 audits per day, and the Pro plan ($29/month) provides 10,000. Your FastAPI batch endpoint can accept any number of URLs, but you should use a semaphore to limit concurrency and stay within your rate limits. The Pydantic model in this guide caps a single request at 200 URLs, which you can adjust based on your plan.

Why use httpx instead of requests for the SEOPeek API?

httpx supports async/await natively, which means you can audit multiple URLs concurrently without blocking the event loop. The requests library is synchronous and would force sequential processing, making batch audits significantly slower. With httpx and asyncio.gather, you can audit 10 URLs in roughly the time it takes requests to audit one. httpx also provides a connection pool, HTTP/2 support, and a nearly identical API to requests, so migration is painless.

Can I use FastAPI BackgroundTasks for large SEO audits?

Yes. For batches larger than 20–30 URLs, it is better to return a 202 Accepted response immediately and process the audit in the background. FastAPI BackgroundTasks runs the audit after the response is sent. Store results in Redis or a database, then have the client poll a status endpoint. For very large batches (1,000+ URLs), consider Celery with a Redis broker for distributed task execution and automatic retries.

How do I cache SEOPeek audit results to avoid redundant API calls?

Use Redis with a TTL (time-to-live) of 1–24 hours depending on how frequently your pages change. For simpler setups, Python’s cachetools library provides an in-memory TTLCache with zero infrastructure requirements. The cache key should be the URL being audited. Only cache successful results—errors should always be retried. Invalidate the cache whenever you deploy content changes by calling the clear_cache() function or deleting the Redis keys.

Does SEOPeek work with other Python frameworks like Django or Flask?

Yes. The SEOPeek API is a standard REST endpoint that works with any HTTP client in any language. This guide uses FastAPI because its native async support makes concurrent batch auditing straightforward, but you can use the same httpx client code with Django (via django-ninja or Django REST Framework), Flask (via Quart for async), or even plain scripts. The seopeek_client.py module has no FastAPI dependency and works anywhere you can await.

More from the Peek Suite

SEOPeek is part of a family of developer tools. Each one solves a specific problem with a single API call: