Build a Bulk SEO Audit API with FastAPI and SEOPeek
Running SEO audits one URL at a time is fine for spot checks—but when you manage dozens of sites, hundreds of pages, or need to audit an entire sitemap before every deployment, you need a bulk audit API. FastAPI’s native async support, combined with the SEOPeek audit API, makes it possible to audit hundreds of URLs concurrently, export results as CSV or JSON, and process massive batches in the background—all with production-ready Python code you can deploy today.
- Why you need a bulk SEO audit API
- Pydantic models for type-safe requests and responses
- Async SEOPeek client with httpx
- Batch audit endpoint with concurrency control
- CSV and JSON export of audit results
- Background task processing for large batches
- Caching results with Redis and in-memory TTL
- Putting it all together: the complete app
- FAQ
1. Why You Need a Bulk SEO Audit API
Individual URL auditing works when you are debugging a single page. It does not scale to real-world SEO operations. Here are the scenarios where a bulk audit API pays for itself:
- Pre-deployment validation — audit every URL in your sitemap before pushing to production, catching missing titles, broken OG images, and canonical URL mismatches before Google sees them
- Agency client reporting — generate a comprehensive SEO health report across all client domains in minutes instead of hours
- Competitor monitoring — track how competitors’ on-page SEO changes over time by auditing their key pages daily
- CMS migration validation — verify that every page preserved its SEO metadata after moving from WordPress to Headless CMS or vice versa
- E-commerce catalog audits — audit thousands of product pages to find pages with missing descriptions, duplicate titles, or no structured data
Python is the natural choice for this kind of automation. FastAPI gives you async request handling out of the box, Pydantic provides automatic validation and serialization, and httpx lets you make concurrent HTTP calls without threading complexity. Combined with the SEOPeek API—which returns structured JSON audit results for any URL—you can build a production-grade bulk audit service in under 200 lines of code.
The SEOPeek API audits the rendered HTML of any URL and returns a numeric score, letter grade, and pass/fail results for 20+ on-page SEO checks—title, description, OG tags, canonical URL, heading structure, and structured data. One GET request, JSON response, under 2 seconds.
2. Pydantic Models for Type-Safe Requests and Responses
Start by defining strict Pydantic models for every data structure in the system. This gives you automatic request validation, response serialization, and self-documenting API schemas in FastAPI’s built-in Swagger UI:
# models.py
from pydantic import BaseModel, HttpUrl, Field
from typing import Optional
from enum import Enum
class AuditCheck(BaseModel):
name: str
passed: bool
message: str
value: Optional[str] = None
class PageMeta(BaseModel):
title: Optional[str] = None
description: Optional[str] = None
canonical: Optional[str] = None
og_title: Optional[str] = Field(None, alias="ogTitle")
og_description: Optional[str] = Field(None, alias="ogDescription")
og_image: Optional[str] = Field(None, alias="ogImage")
class Config:
populate_by_name = True
class AuditResult(BaseModel):
url: str
score: int
grade: str
checks: list[AuditCheck]
meta: PageMeta
timestamp: str
error: Optional[str] = None
class ExportFormat(str, Enum):
json = "json"
csv = "csv"
class BulkAuditRequest(BaseModel):
urls: list[HttpUrl] = Field(
...,
min_length=1,
max_length=200,
description="List of URLs to audit (max 200)"
)
concurrency: int = Field(
default=5,
ge=1,
le=20,
description="Max concurrent requests (1-20)"
)
export_format: ExportFormat = Field(
default=ExportFormat.json,
description="Response format: json or csv"
)
class BulkAuditResponse(BaseModel):
total: int
passed: int
failed: int
average_score: float
results: list[AuditResult]
class JobStatus(str, Enum):
pending = "pending"
running = "running"
completed = "completed"
failed = "failed"
class BackgroundJobResponse(BaseModel):
job_id: str
status: JobStatus
total_urls: int
message: str
class JobResultResponse(BaseModel):
job_id: str
status: JobStatus
result: Optional[BulkAuditResponse] = None
Notice that BulkAuditRequest validates URL format using Pydantic’s HttpUrl type, enforces a maximum of 200 URLs, and constrains concurrency between 1 and 20. Any request that violates these constraints gets a 422 response with a clear error message—no manual validation code required.
3. Async SEOPeek Client with httpx
The core of the system is an async client that calls the SEOPeek API using httpx. Unlike the requests library, httpx supports async/await natively, which means you can fire off multiple audit requests concurrently without blocking the event loop:
# seopeek_client.py
import httpx
from models import AuditResult, AuditCheck, PageMeta
from datetime import datetime, timezone
SEOPEEK_API = (
"https://us-central1-todd-agent-prod.cloudfunctions.net"
"/seopeekApi/api/v1/audit"
)
async def audit_single_url(
client: httpx.AsyncClient,
url: str,
) -> AuditResult:
"""Audit a single URL via the SEOPeek API."""
try:
response = await client.get(
SEOPEEK_API,
params={"url": url},
timeout=30.0,
)
response.raise_for_status()
data = response.json()
return AuditResult(
url=data["url"],
score=data["score"],
grade=data["grade"],
checks=[AuditCheck(**c) for c in data["checks"]],
meta=PageMeta(**data["meta"]),
timestamp=data["timestamp"],
)
except (httpx.HTTPError, KeyError, ValueError) as exc:
return AuditResult(
url=url,
score=0,
grade="F",
checks=[],
meta=PageMeta(),
timestamp=datetime.now(timezone.utc).isoformat(),
error=str(exc),
)
The client handles errors gracefully—if a URL fails to audit (network timeout, invalid response, API error), it returns a result with score: 0 and the error message attached. This means batch operations never crash halfway through; you always get a complete set of results.
4. Batch Audit Endpoint with Concurrency Control
The batch endpoint accepts a list of URLs and audits them concurrently using asyncio.gather. The critical detail is the semaphore—without it, sending 200 URLs simultaneously would overwhelm both the SEOPeek API and your own server. The semaphore limits how many requests are in flight at any given time:
# main.py
import asyncio
import httpx
from fastapi import FastAPI, Query
from models import (
BulkAuditRequest,
BulkAuditResponse,
AuditResult,
)
from seopeek_client import audit_single_url
app = FastAPI(
title="Bulk SEO Audit API",
description="Audit hundreds of URLs concurrently with SEOPeek",
version="1.0.0",
)
async def audit_with_semaphore(
semaphore: asyncio.Semaphore,
client: httpx.AsyncClient,
url: str,
) -> AuditResult:
"""Wrap audit call with a semaphore for rate limiting."""
async with semaphore:
return await audit_single_url(client, url)
@app.post("/api/audit/bulk", response_model=BulkAuditResponse)
async def bulk_audit(request: BulkAuditRequest):
"""
Audit multiple URLs concurrently.
Accepts up to 200 URLs and processes them with
configurable concurrency (default 5, max 20).
"""
semaphore = asyncio.Semaphore(request.concurrency)
async with httpx.AsyncClient() as client:
tasks = [
audit_with_semaphore(semaphore, client, str(url))
for url in request.urls
]
results: list[AuditResult] = await asyncio.gather(*tasks)
passed = sum(1 for r in results if r.score >= 70)
avg = sum(r.score for r in results) / len(results)
return BulkAuditResponse(
total=len(results),
passed=passed,
failed=len(results) - passed,
average_score=round(avg, 1),
results=results,
)
Test it with curl:
curl -X POST http://localhost:8000/api/audit/bulk \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com",
"https://example.com/about",
"https://example.com/pricing",
"https://example.com/blog"
],
"concurrency": 5
}'
With concurrency: 5, four URLs will complete in roughly the time it takes to audit one. For 100 URLs, the total time drops from roughly 200 seconds (sequential) to about 40 seconds (5-way concurrency). Increase the concurrency value if your SEOPeek plan supports higher throughput.
5. CSV and JSON Export of Audit Results
SEO teams often need audit results in spreadsheet-friendly formats. Add a dedicated export endpoint that returns either JSON or CSV based on a query parameter:
# Add to main.py
import csv
import io
from fastapi.responses import StreamingResponse
@app.post("/api/audit/export")
async def export_audit(request: BulkAuditRequest):
"""
Audit URLs and export results as JSON or CSV.
"""
semaphore = asyncio.Semaphore(request.concurrency)
async with httpx.AsyncClient() as client:
tasks = [
audit_with_semaphore(semaphore, client, str(url))
for url in request.urls
]
results = await asyncio.gather(*tasks)
if request.export_format == "csv":
return _results_to_csv(results)
# Default: JSON
passed = sum(1 for r in results if r.score >= 70)
avg = sum(r.score for r in results) / len(results)
return BulkAuditResponse(
total=len(results),
passed=passed,
failed=len(results) - passed,
average_score=round(avg, 1),
results=results,
)
def _results_to_csv(results: list[AuditResult]) -> StreamingResponse:
"""Convert audit results to a downloadable CSV file."""
output = io.StringIO()
writer = csv.writer(output)
# Header row
writer.writerow([
"URL", "Score", "Grade", "Title", "Description",
"Canonical", "OG Title", "OG Image",
"Checks Passed", "Checks Failed", "Error",
])
for r in results:
checks_passed = sum(1 for c in r.checks if c.passed)
checks_failed = len(r.checks) - checks_passed
writer.writerow([
r.url,
r.score,
r.grade,
r.meta.title or "",
r.meta.description or "",
r.meta.canonical or "",
r.meta.og_title or "",
r.meta.og_image or "",
checks_passed,
checks_failed,
r.error or "",
])
output.seek(0)
return StreamingResponse(
iter([output.getvalue()]),
media_type="text/csv",
headers={
"Content-Disposition": "attachment; filename=seo-audit.csv"
},
)
To download a CSV report:
curl -X POST http://localhost:8000/api/audit/export \
-H "Content-Type: application/json" \
-d '{"urls": ["https://example.com"], "export_format": "csv"}' \
-o seo-audit.csv
The CSV includes the most important fields—score, grade, title, OG image presence, and error status—so you can open it in Google Sheets or Excel and immediately see which pages need attention.
6. Background Task Processing for Large Batches
For batches larger than 50 URLs, synchronous processing can timeout. FastAPI’s BackgroundTasks lets you accept the request immediately and process the audit in the background. The client polls a status endpoint to retrieve results when ready:
# Add to main.py
import uuid
from fastapi import BackgroundTasks
from models import (
BackgroundJobResponse,
JobResultResponse,
JobStatus,
)
# In-memory job store (use Redis in production)
jobs: dict[str, JobResultResponse] = {}
async def _run_bulk_audit(
job_id: str,
urls: list[str],
concurrency: int,
):
"""Background task: run the audit and store results."""
jobs[job_id].status = JobStatus.running
try:
semaphore = asyncio.Semaphore(concurrency)
async with httpx.AsyncClient() as client:
tasks = [
audit_with_semaphore(semaphore, client, url)
for url in urls
]
results = await asyncio.gather(*tasks)
passed = sum(1 for r in results if r.score >= 70)
avg = sum(r.score for r in results) / len(results)
jobs[job_id].status = JobStatus.completed
jobs[job_id].result = BulkAuditResponse(
total=len(results),
passed=passed,
failed=len(results) - passed,
average_score=round(avg, 1),
results=results,
)
except Exception as exc:
jobs[job_id].status = JobStatus.failed
jobs[job_id].result = None
@app.post(
"/api/audit/async",
response_model=BackgroundJobResponse,
status_code=202,
)
async def async_bulk_audit(
request: BulkAuditRequest,
background_tasks: BackgroundTasks,
):
"""
Submit a large batch for background processing.
Returns a job_id immediately. Poll /api/audit/jobs/{job_id}
to check status and retrieve results.
"""
job_id = str(uuid.uuid4())
jobs[job_id] = JobResultResponse(
job_id=job_id,
status=JobStatus.pending,
result=None,
)
background_tasks.add_task(
_run_bulk_audit,
job_id,
[str(u) for u in request.urls],
request.concurrency,
)
return BackgroundJobResponse(
job_id=job_id,
status=JobStatus.pending,
total_urls=len(request.urls),
message=f"Audit queued. Poll /api/audit/jobs/{job_id}",
)
@app.get("/api/audit/jobs/{job_id}", response_model=JobResultResponse)
async def get_job_status(job_id: str):
"""Check the status of a background audit job."""
if job_id not in jobs:
from fastapi import HTTPException
raise HTTPException(status_code=404, detail="Job not found")
return jobs[job_id]
The workflow is straightforward: submit URLs to /api/audit/async, receive a job_id, then poll /api/audit/jobs/{job_id} until the status changes to completed. In production, replace the in-memory jobs dictionary with Redis so results survive server restarts and can be shared across multiple workers.
Production tip: For truly large batches (1,000+ URLs), consider using Celery with Redis as a broker instead of FastAPI BackgroundTasks. Celery provides distributed task execution, automatic retries, and result backends—but for most use cases, BackgroundTasks is simpler and sufficient.
7. Caching Results with Redis and In-Memory TTL
SEO metadata does not change every second. Caching audit results avoids redundant API calls and keeps you within your SEOPeek quota. Here are two approaches—in-memory for single-server deployments, and Redis for distributed setups:
Option A: In-Memory TTL Cache
Use the cachetools library for a zero-dependency caching layer. Results are cached for one hour by default:
# cache.py
from cachetools import TTLCache
from models import AuditResult
# Cache up to 1,000 results for 1 hour (3600 seconds)
_cache: TTLCache[str, AuditResult] = TTLCache(
maxsize=1000,
ttl=3600,
)
def get_cached(url: str) -> AuditResult | None:
return _cache.get(url)
def set_cached(url: str, result: AuditResult) -> None:
_cache[url] = result
def clear_cache() -> None:
_cache.clear()
Option B: Redis Cache
For multi-worker deployments behind a load balancer, use Redis so all instances share the same cache:
# redis_cache.py
import json
import redis.asyncio as redis
from models import AuditResult
_redis = redis.from_url("redis://localhost:6379", decode_responses=True)
CACHE_TTL = 3600 # 1 hour
async def get_cached(url: str) -> AuditResult | None:
data = await _redis.get(f"seo:audit:{url}")
if data:
return AuditResult.model_validate_json(data)
return None
async def set_cached(url: str, result: AuditResult) -> None:
await _redis.set(
f"seo:audit:{url}",
result.model_dump_json(),
ex=CACHE_TTL,
)
async def clear_cache() -> None:
keys = []
async for key in _redis.scan_iter("seo:audit:*"):
keys.append(key)
if keys:
await _redis.delete(*keys)
Now update the audit function to check the cache before hitting the API:
# Updated seopeek_client.py
from cache import get_cached, set_cached # or redis_cache
async def audit_single_url_cached(
client: httpx.AsyncClient,
url: str,
) -> AuditResult:
"""Audit with cache-first strategy."""
# Check cache first
cached = get_cached(url) # use await for Redis
if cached is not None:
return cached
# Cache miss: call the API
result = await audit_single_url(client, url)
# Cache successful results only
if result.error is None:
set_cached(url, result) # use await for Redis
return result
With caching enabled, auditing the same sitemap twice in an hour consumes zero additional API quota. The first run populates the cache, and subsequent runs return instantly from memory or Redis.
8. Putting It All Together
Here is the complete project structure and the commands to get it running:
# Project structure
seo-audit-api/
main.py # FastAPI app with all endpoints
models.py # Pydantic models
seopeek_client.py # Async SEOPeek API client
cache.py # In-memory TTL cache
redis_cache.py # Redis cache (optional)
requirements.txt # Dependencies
# requirements.txt
fastapi==0.115.0
uvicorn[standard]==0.30.0
httpx==0.27.0
pydantic==2.9.0
cachetools==5.5.0
redis==5.1.0 # optional, for Redis caching
# Install and run
pip install -r requirements.txt
uvicorn main:app --reload --port 8000
Once running, open http://localhost:8000/docs to see the auto-generated Swagger UI with all endpoints documented, request/response schemas rendered from your Pydantic models, and a “Try it out” button for every route.
Here is a complete Python script that uses your new API to audit an entire sitemap and generate a report:
# scripts/audit_sitemap.py
"""Audit an entire sitemap using the bulk audit API."""
import httpx
import xml.etree.ElementTree as ET
import sys
import json
async def main(sitemap_url: str):
async with httpx.AsyncClient(timeout=120) as client:
# 1. Fetch and parse the sitemap
resp = await client.get(sitemap_url)
root = ET.fromstring(resp.text)
ns = {"s": "http://www.sitemaps.org/schemas/sitemap/0.9"}
urls = [
loc.text
for loc in root.findall(".//s:loc", ns)
if loc.text
]
print(f"Found {len(urls)} URLs in sitemap")
# 2. Submit to bulk audit API
result = await client.post(
"http://localhost:8000/api/audit/bulk",
json={"urls": urls, "concurrency": 10},
)
data = result.json()
# 3. Print summary
print(f"\nAverage Score: {data['average_score']}")
print(f"Passed (>=70): {data['passed']}")
print(f"Failed (<70): {data['failed']}")
# 4. Show worst pages
worst = sorted(data["results"], key=lambda r: r["score"])
print("\nWorst 5 pages:")
for r in worst[:5]:
print(f" {r['score']}/100 {r['url']}")
# 5. Save full report
with open("audit-report.json", "w") as f:
json.dump(data, f, indent=2)
print("\nFull report saved to audit-report.json")
if __name__ == "__main__":
import asyncio
url = sys.argv[1] if len(sys.argv) > 1 else "https://example.com/sitemap.xml"
asyncio.run(main(url))
Run it against any sitemap:
python scripts/audit_sitemap.py https://yoursite.com/sitemap.xml
Start Auditing for Free
50 free audits per day. No API key required. JSON response in under 2 seconds.
Start auditing for free →Frequently Asked Questions
How many URLs can I audit in a single batch with the SEOPeek API?
The free tier allows 50 audits per day. With the Starter plan ($9/month) you get 1,000 audits per day, and the Pro plan ($29/month) provides 10,000. Your FastAPI batch endpoint can accept any number of URLs, but you should use a semaphore to limit concurrency and stay within your rate limits. The Pydantic model in this guide caps a single request at 200 URLs, which you can adjust based on your plan.
Why use httpx instead of requests for the SEOPeek API?
httpx supports async/await natively, which means you can audit multiple URLs concurrently without blocking the event loop. The requests library is synchronous and would force sequential processing, making batch audits significantly slower. With httpx and asyncio.gather, you can audit 10 URLs in roughly the time it takes requests to audit one. httpx also provides a connection pool, HTTP/2 support, and a nearly identical API to requests, so migration is painless.
Can I use FastAPI BackgroundTasks for large SEO audits?
Yes. For batches larger than 20–30 URLs, it is better to return a 202 Accepted response immediately and process the audit in the background. FastAPI BackgroundTasks runs the audit after the response is sent. Store results in Redis or a database, then have the client poll a status endpoint. For very large batches (1,000+ URLs), consider Celery with a Redis broker for distributed task execution and automatic retries.
How do I cache SEOPeek audit results to avoid redundant API calls?
Use Redis with a TTL (time-to-live) of 1–24 hours depending on how frequently your pages change. For simpler setups, Python’s cachetools library provides an in-memory TTLCache with zero infrastructure requirements. The cache key should be the URL being audited. Only cache successful results—errors should always be retried. Invalidate the cache whenever you deploy content changes by calling the clear_cache() function or deleting the Redis keys.
Does SEOPeek work with other Python frameworks like Django or Flask?
Yes. The SEOPeek API is a standard REST endpoint that works with any HTTP client in any language. This guide uses FastAPI because its native async support makes concurrent batch auditing straightforward, but you can use the same httpx client code with Django (via django-ninja or Django REST Framework), Flask (via Quart for async), or even plain scripts. The seopeek_client.py module has no FastAPI dependency and works anywhere you can await.
More from the Peek Suite
SEOPeek is part of a family of developer tools. Each one solves a specific problem with a single API call:
- OGPeek — Preview and validate Open Graph meta tags for any URL. See exactly how your pages appear when shared on Twitter, Facebook, Slack, and Discord.
- StackPeek — Detect the tech stack of any website. Identify frameworks, CMS, analytics, CDN, and hosting providers in one request.
- CronPeek — Monitor cron jobs, scheduled tasks, and uptime. Get alerted when a job fails or a URL goes down.