Claude Code skill to use Cloudflare's new /crawl endpoint.
Open Claude and run:
/cf-crawl https://cloudify.ro
--limit 50
Replace the URL, and the skill does the rest... including helping you configure your @Cloudflare
Perfect for RAG pipelines and model training The skill wraps that API so you can use it directly inside Claude Code without touching the raw API
npx claude-code-templates@latest --skill utilities/cf-crawl---
name: cf-crawl
description: "Crawl entire websites using Cloudflare Browser Rendering /crawl API. Initiates async crawl jobs, polls for completion, and saves results as markdown files. Useful for ingesting documentation sites, knowledge bases, or any web content into your project context. Requires CLOUDFLARE_ACCOUNT_ID and CLOUDFLARE_API_TOKEN environment variables."
---
# Cloudflare Website Crawler
You are a web crawling assistant that uses Cloudflare's Browser Rendering /crawl REST API to crawl websites and save their content as markdown files for local use.
## Prerequisites
The user must have:
1. A Cloudflare account with Browser Rendering enabled
2. `CLOUDFLARE_ACCOUNT_ID` and `CLOUDFLARE_API_TOKEN` available (see below)
## Workflow
When the user asks to crawl a website, follow this exact workflow:
### Step 1: Load Credentials
Look for `CLOUDFLARE_ACCOUNT_ID` and `CLOUDFLARE_API_TOKEN` in this order:
1. **Current environment variables** - Check if already exported in the shell
2. **Project `.env` file** - Read `.env` in the current working directory and extract the values
3. **Project `.env.local` file** - Read `.env.local` in the current working directory
4. **Home directory `.env`** - Read `~/.env` as a last resort
To load from a `.env` file, parse it line by line looking for `CLOUDFLARE_ACCOUNT_ID=` and `CLOUDFLARE_API_TOKEN=` entries. Use this bash approach:
```bash
# Load from .env if vars are not already set
if [ -z "$CLOUDFLARE_ACCOUNT_ID" ] || [ -z "$CLOUDFLARE_API_TOKEN" ]; then
for envfile in .env .env.local "$HOME/.env"; do
if [ -f "$envfile" ]; then
eval "$(grep -E '^CLOUDFLARE_(ACCOUNT_ID|API_TOKEN)=' "$envfile" | sed 's/^/export /')"
fi
done
fi
```
If credentials are still missing after checking all sources, tell the user to add them to their project `.env` file:
```
CLOUDFLARE_ACCOUNT_ID=your-account-id
CLOUDFLARE_API_TOKEN=your-api-token
```
The API token needs "Browser Rendering - Edit" permission. Create one at [Cloudflare Dashboard > API Tokens](https://dash.cloudflare.com/profile/api-tokens).
### Step 2: Validate Credentials
Verify both variables are set and non-empty before proceeding.
### Step 3: Initiate Crawl
Send a POST request to start the crawl job. Choose parameters based on user needs:
```bash
curl -s -X POST "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"url": "<TARGET_URL>",
"limit": <NUMBER_OF_PAGES>,
"formats": ["markdown"],
"options": {
"excludePatterns": ["**/changelog/**", "**/api-reference/**"]
}
}'
```
The response returns a job ID:
```json
{"success": true, "result": "job-uuid-here"}
```
### Step 4: Poll for Completion
Poll the job status every 5 seconds until it completes:
```bash
curl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?limit=1" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'Status: {d[\"result\"][\"status\"]} | Finished: {d[\"result\"][\"finished\"]}/{d[\"result\"][\"total\"]}')"
```
Possible job statuses:
- `running` - Still in progress, keep polling
- `completed` - All pages processed
- `cancelled_due_to_timeout` - Exceeded 7-day limit
- `cancelled_due_to_limits` - Hit account limits
- `errored` - Something went wrong
### Step 5: Retrieve Results
Fetch all completed records using pagination (cursor-based):
```bash
curl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?status=completed&limit=50" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}"
```
If there are more records, use the `cursor` value from the response:
```bash
curl -s -X GET "https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl/<JOB_ID>?status=completed&limit=50&cursor=<CURSOR>" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}"
```
### Step 6: Save Results
Save each page's markdown content to a local directory. Use a script like:
```bash
# Create output directory
mkdir -p .crawl-output
# Fetch and save all pages
python3 -c "
import json, os, re, sys, urllib.request
account_id = os.environ['CLOUDFLARE_ACCOUNT_ID']
api_token = os.environ['CLOUDFLARE_API_TOKEN']
job_id = '<JOB_ID>'
base = f'https://api.cloudflare.com/client/v4/accounts/{account_id}/browser-rendering/crawl/{job_id}'
outdir = '.crawl-output'
os.makedirs(outdir, exist_ok=True)
cursor = None
total_saved = 0
while True:
url = f'{base}?status=completed&limit=50'
if cursor:
url += f'&cursor={cursor}'
req = urllib.request.Request(url, headers={
'Authorization': f'Bearer {api_token}'
})
with urllib.request.urlopen(req) as resp:
data = json.load(resp)
records = data.get('result', {}).get('records', [])
if not records:
break
for rec in records:
page_url = rec.get('url', '')
md = rec.get('markdown', '')
if not md:
continue
# Convert URL to filename
name = re.sub(r'https?://', '', page_url)
name = re.sub(r'[^a-zA-Z0-9]', '_', name).strip('_')[:120]
filepath = os.path.join(outdir, f'{name}.md')
with open(filepath, 'w') as f:
f.write(f'<!-- Source: {page_url} -->\n\n')
f.write(md)
total_saved += 1
cursor = data.get('result', {}).get('cursor')
if cursor is None:
break
print(f'Saved {total_saved} pages to {outdir}/')
"
```
## Parameter Reference
### Core Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `url` | string | (required) | Starting URL to crawl |
| `limit` | number | 10 | Max pages to crawl (up to 100,000) |
| `depth` | number | 100,000 | Max link depth from starting URL |
| `formats` | array | ["html"] | Output formats: `html`, `markdown`, `json` |
| `render` | boolean | true | `true` = headless browser, `false` = fast HTML fetch |
| `source` | string | "all" | Page discovery: `all`, `sitemaps`, `links` |
| `maxAge` | number | 86400 | Cache validity in seconds (max 604800) |
### Options Object
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `includePatterns` | array | [] | Wildcard patterns to include (`*` and `**`) |
| `excludePatterns` | array | [] | Wildcard patterns to exclude (higher priority) |
| `includeSubdomains` | boolean | false | Follow links to subdomains |
| `includeExternalLinks` | boolean | false | Follow external links |
### Advanced Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `jsonOptions` | object | AI-powered structured extraction (prompt, response_format) |
| `authenticate` | object | HTTP basic auth (username, password) |
| `setExtraHTTPHeaders` | object | Custom headers for requests |
| `rejectResourceTypes` | array | Skip: image, media, font, stylesheet |
| `userAgent` | string | Custom user agent string |
| `cookies` | array | Custom cookies for requests |
## Usage Examples
### Crawl documentation site (most common)
```
/cf-crawl https://docs.example.com --limit 50
```
Crawls up to 50 pages, saves as markdown.
### Crawl with filters
```
/cf-crawl https://docs.example.com --limit 100 --include "/guides/**,/api/**" --exclude "/changelog/**"
```
### Fast crawl without JavaScript rendering
```
/cf-crawl https://docs.example.com --no-render --limit 200
```
Uses static HTML fetch - faster and cheaper but won't capture JS-rendered content.
### Crawl and merge into single file
```
/cf-crawl https://docs.example.com --limit 50 --merge
```
Merges all pages into a single markdown file for easy context loading.
## Argument Parsing
When invoked as `/cf-crawl`, parse the arguments as follows:
- First positional argument: the URL to crawl
- `--limit N` or `-l N`: max pages (default: 20)
- `--depth N` or `-d N`: max depth (default: 100000)
- `--include "pattern1,pattern2"`: include URL patterns
- `--exclude "pattern1,pattern2"`: exclude URL patterns
- `--no-render`: disable JavaScript rendering (faster)
- `--merge`: combine all output into a single file
- `--output DIR` or `-o DIR`: output directory (default: `.crawl-output`)
- `--source sitemaps|links|all`: page discovery method (default: all)
If no URL is provided, ask the user for the target URL.
## Important Notes
- The /crawl endpoint respects robots.txt directives including crawl-delay
- Blocked URLs appear with `"status": "disallowed"` in results
- Free plan: 10 minutes of browser time per day
- Job results are available for 14 days after completion
- Max job runtime: 7 days
- Response page size limit: 10 MB per page
- Use `render: false` for static sites to save browser time
- Pattern wildcards: `*` matches any character except `/`, `**` matches including `/`
cf-crawl — Cloudflare Setup Guide
Everything you need to configure on the Cloudflare side to use the cf-crawl Claude Code skill.✅ Skill Validated: The cf-crawl skill uses the real Cloudflare Browser Rendering /crawl REST API, launched in open beta on March 10, 2026. All endpoints, parameters, and workflows in the skill match current Cloudflare documentation.1. Overview
The cf-crawl skill uses Cloudflare's Browser Rendering REST API, specifically the /crawl endpoint. This endpoint is a hosted headless browser service that discovers and renders pages across an entire site, returning content as HTML, Markdown, or structured JSON — with no browser infrastructure to manage yourself.
What you need from Cloudflare
- A Cloudflare account (free tier works)
- Browser Rendering enabled on your account
- A
CLOUDFLARE_ACCOUNT_ID— your numeric account identifier - A
CLOUDFLARE_API_TOKENwith theBrowser Rendering - Editpermission
How billing works
Browser Rendering is billed on browser time used, not requests. The free tier is genuinely usable for documentation crawling:
| Plan | Included free | Overage |
|---|---|---|
| Workers Free Plan | 10 min browser time/day | N/A |
| Workers Paid Plan | 10 hours browser time/month | $0.09/browser-hour |
Static mode (render: false) |
No browser time consumed | Free regardless of plan |
💡 Tip: For most documentation sites (static HTML, no heavy JS rendering), use render: false. This skips the headless browser entirely, costs nothing beyond the free quota, and is significantly faster.2. Account Setup
Step 1 — Create or log in to your Cloudflare account
- Go to dash.cloudflare.com and sign in or create a free account.
- No domain or zone is required — the Browser Rendering REST API works at the account level.
Step 2 — Find your Account ID
Your Account ID is needed for all API requests as part of the URL path.
- Log in to the Cloudflare dashboard.
- In the left sidebar, click Workers & Pages.
- Your Account ID is shown in the right-hand sidebar under Account Details.
- Copy it — it looks like:
a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4
📋 Note: The Account ID is a 32-character hexadecimal string. Do not confuse it with your email, zone ID, or user ID.
3. Create an API Token
The skill requires an API token (not your Global API Key) with one specific permission: Browser Rendering - Edit.
Step-by-step token creation
- Go to: dash.cloudflare.com/profile/api-tokens
- Click Create Token.
- Click Create Custom Token (do not use a preset template).
- Give the token a descriptive name, e.g.
cf-crawl-skill. - Under Permissions, configure:
Account | Browser Rendering | Edit
- Under Account Resources, select Include → All Accounts (or your specific account).
- Leave Client IP Address Filtering empty unless you want to restrict by IP.
- Click Continue to summary, then Create Token.
- Copy the token immediately — Cloudflare only shows it once.
⚠️ Security: Store the token in a.envfile and never commit it to version control. The skill will automatically read from.env,.env.local, or~/.envin that order.
4. Configure Credentials
The skill auto-discovers credentials from environment files. Create or edit a .env file in your project root:
# .env
CLOUDFLARE_ACCOUNT_ID=a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4
CLOUDFLARE_API_TOKEN=your-token-here
The skill checks these locations in order:
| Priority | Location |
|---|---|
| 1st | Current shell environment variables (if already exported) |
| 2nd | .env in your project directory |
| 3rd | .env.local in your project directory |
| 4th | ~/.env in your home directory |
✅ Verify: Runecho $CLOUDFLARE_ACCOUNT_IDto confirm your env is loaded. If empty, runsource .envor restart your terminal session.
5. Validate Your Setup
Before running a full crawl, test that your credentials work with a quick curl:
curl -s -X POST \
"https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/browser-rendering/crawl" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "limit": 1, "formats": ["markdown"]}'
A successful response returns a job ID:
{"success": true, "result": "abc123de-f456-7890-abcd-ef1234567890"}
Common error responses
| Response | Cause |
|---|---|
401 Unauthorized |
Token is invalid or missing the Browser Rendering - Edit permission |
403 Forbidden |
Account ID is wrong, or token doesn't cover this account |
429 Too Many Requests |
Rate limited — Free plan: 3 req/s, Paid plan: 10 req/s |
"success": false |
Check the errors array in the response for a specific message |
6. Limits & Important Notes
| Limit | Value |
|---|---|
| Max pages per crawl | 100,000 (set with the limit parameter) |
| Max job runtime | 7 days before auto-cancellation |
| Result retention | 14 days after job completion |
| Max page size | 10 MB per page |
robots.txt |
Honored by default — the crawler is a well-behaved bot |
| Bot detection | Cannot bypass Cloudflare's own bot detection or CAPTCHAs |
| Free plan browser time | 10 minutes/day (REST API) |
| Paid plan browser time | 10 hours/month included |
| Static mode | Use render: false for static sites — no browser time consumed |
Crawl patterns
Use wildcard patterns to focus or exclude parts of a site:
// Include only docs and guides
"includePatterns": ["/docs/**", "/guides/**"]
// Exclude changelog and API reference
"excludePatterns": ["**/changelog/**", "**/api-reference/**"]
// * matches any character except /
// ** matches any character including /
7. Skill Validation Report
The cf-crawl skill was independently verified against current Cloudflare documentation as of March 2026.
✅ What is accurate
- API endpoint URL:
POST .../browser-rendering/crawl— confirmed correct - Job polling:
GET .../browser-rendering/crawl/{job_id}— confirmed correct - Job statuses:
running,completed,cancelled_due_to_timeout,cancelled_due_to_limits,errored— all confirmed - Parameters:
url,limit,depth,formats,render,source,maxAge— all confirmed - Options object:
includePatterns,excludePatterns,includeSubdomains— all confirmed - Pagination via
cursorquery parameter — confirmed correct - The
?limit=1polling trick for lightweight status checks — confirmed and recommended in official docs - Token permission required:
Browser Rendering - Edit— confirmed - Free tier: 10 minutes browser time/day — confirmed for Workers Free Plan
⚠️ One nuance to note
Free tier clarification: The skill's docs say "Free plan: 10 minutes of browser time per day" — this is accurate for the Workers Free Plan. The Workers Paid Plan includes 10 hours/month. The /crawl endpoint launched in open beta on March 10, 2026 and is available on both plans.New feature not yet in the skill
Cloudflare added incremental crawling alongside the /crawl launch: use modifiedSince and maxAge to skip pages that haven't changed since a previous crawl. This is useful for recurring crawls of the same site.
8. Quick Reference
Useful URLs
- Cloudflare Dashboard
- Create API Token
- Browser Rendering Docs
- /crawl Endpoint Docs
- Browser Rendering Pricing
- Browser Rendering Limits
Pre-flight checklist
- Cloudflare account created
- Account ID copied from Workers & Pages sidebar
- API token created with
Browser Rendering - Editpermission - Credentials saved to
.envin project root - Validation curl returned a job ID
- Ready to use
/cf-crawlin Claude Code