Farcaster Hub Scraper
Protocol-native Farcaster data ingestion for research, analytics, and social graph analysis. Collect casts, reactions, follows, user profiles, and real-time events directly from Farcaster Hubs via HTTP API.
Farcaster Hub Scraper
Protocol-native Farcaster data ingestion for research, analytics, and social graph analysis. Collect casts, reactions, follows, user profiles, and real-time events directly from Farcaster Hubs via HTTP API.
Features
✅ Protocol-First Design - Direct Hub HTTP API integration (no third-party dependencies) ✅ Three Ingestion Modes - Deterministic backfill by FIDs, time-bounded studies, or incremental event tailing ✅ Comprehensive Data - Casts, reactions (likes/recasts), follows, user profiles, and events ✅ Optional Enrichment - Parse Frames/Mini-Apps metadata from embedded URLs ✅ State Checkpointing - Migration-safe, resumable runs with automatic state persistence ✅ Rate Limiting & Retries - Production-grade reliability with exponential backoff ✅ Neynar v2 Support - Optional integration with Neynar hosted hubs ✅ Multiple Views - Pre-configured dataset views for easy data exploration
Who Uses This Actor?
🎯 Target Users
📊 Web3 Data Analysts & Researchers (Dune, Flipside)
- Export Farcaster data to SQL databases for analytics dashboards
- Track protocol growth, user engagement trends, and network effects
- Cross-reference social data with onchain transactions
🛠️ Farcaster Frame/Mini-App Developers
- Monitor Frame engagement and interaction patterns
- Track which users interact with your Mini-Apps
- Analyze viral content and user acquisition funnels
📢 Web3 Marketing Agencies & Brands
- Track influencer campaigns and brand mentions
- Measure content reach and engagement rates
- Identify key opinion leaders in the Farcaster ecosystem
🎓 Academic Researchers
- Study decentralized social network dynamics
- Analyze information diffusion and community formation
- Research Web3 social graph topology
Use Cases by Persona
📊 For Data Analysts
Influencer Ranking Dashboard
{
"mode": "byFids",
"fids": [2, 3, 6833, 5650, 7890],
"include": {"casts": true, "reactions": true, "userData": true},
"maxRecords": 50000
}
→ Export to Dune to calculate engagement rates, follower growth, content velocity
Protocol Growth Metrics
{
"mode": "tailEvents",
"maxRecords": 100000
}
→ Stream all events to track daily active users, network growth, retention
🛠️ For Frame Developers
Frame Interaction Analysis
{
"mode": "byFids",
"fids": [list of users who interacted],
"include": {"casts": true, "reactions": true},
"fetchEmbeds": true
}
→ Identify which casts contain your Frame, track engagement patterns
Real-Time Frame Monitoring
{
"mode": "tailEvents",
"tail": {"fromEventId": "latest"},
"maxRecords": 10000
}
→ Get notified when users interact with your Frames in real-time
📢 For Marketing Agencies
Campaign Performance Tracking
{
"mode": "byFids",
"fids": [brand_account, influencer1, influencer2],
"startTimestamp": 130000000,
"stopTimestamp": 130100000,
"include": {"casts": true, "reactions": true}
}
→ Measure campaign reach during specific time window
Influencer Discovery
{
"mode": "byFids",
"fids": [competitor_followers],
"include": {"links": true, "userData": true, "reactions": true}
}
→ Find high-engagement users in target communities
🎓 For Researchers
Social Network Topology Study
{
"mode": "byFids",
"discoverFids": true,
"shardIds": [0, 1, 2],
"include": {"links": true, "userData": true},
"maxRecords": 500000
}
→ Build complete follow graph for network analysis
Information Diffusion Analysis
{
"mode": "byTime",
"fids": [seed_users],
"startTimestamp": 100000000,
"stopTimestamp": 100500000,
"include": {"casts": true, "reactions": true}
}
→ Track how content spreads through the network over time
Quick Start
Basic Example: Backfill by FIDs
{
"hubBaseUrl": "https://hub.pinata.cloud",
"mode": "byFids",
"fids": [2, 3, 6833],
"include": {
"casts": true,
"reactions": true,
"links": true,
"userData": true
},
"pageSize": 1000,
"maxRecords": 10000
}
Time-Bounded Study
{
"hubBaseUrl": "https://hub.pinata.cloud",
"mode": "byTime",
"fids": [2, 3],
"startTimestamp": 100000000,
"stopTimestamp": 100050000,
"include": {
"casts": true,
"reactions": true
}
}
Real-Time Event Tail
{
"hubBaseUrl": "https://hub.pinata.cloud",
"mode": "tailEvents",
"tail": {
"fromEventId": "0",
"shardIndex": 0
},
"maxRecords": 1000
}
Auto-Discover FIDs via Shard Scan
{
"hubBaseUrl": "https://hub.pinata.cloud",
"mode": "byFids",
"discoverFids": true,
"shardIds": [0, 1],
"include": {
"casts": true,
"userData": true
},
"maxRecords": 5000
}
With Frame/Mini-App Metadata Parsing
{
"hubBaseUrl": "https://hub.pinata.cloud",
"mode": "byFids",
"fids": [2],
"fetchEmbeds": true,
"maxEmbedsPerRun": 100,
"proxy": "RESIDENTIAL",
"include": {
"casts": true
}
}
Input Configuration
Required Fields
| Field | Type | Description | Default |
|---|---|---|---|
hubBaseUrl | string | HTTP endpoint of Farcaster Hub | https://hub.pinata.cloud |
mode | enum | Ingestion mode: byFids, byTime, tailEvents | byFids |
Mode-Specific Fields
By FIDs Mode
| Field | Type | Description | Default |
|---|---|---|---|
fids | array<integer> | List of Farcaster IDs to scrape | [] |
discoverFids | boolean | Auto-discover FIDs via shard scan | false |
shardIds | array<integer> | Shard IDs to scan when discovering | [] |
By Time Mode
| Field | Type | Description | Default |
|---|---|---|---|
fids | array<integer> | FIDs to scrape (required) | [] |
startTimestamp | integer | Start time (Farcaster epoch seconds) | - |
stopTimestamp | integer | Stop time (Farcaster epoch seconds) | - |
Tail Events Mode
| Field | Type | Description | Default |
|---|---|---|---|
tail.fromEventId | string | Start from event ID (empty = start from 0) | "0" |
tail.shardIndex | integer | Shard index to tail (optional) | - |
Entity Filters
| Field | Type | Description | Default |
|---|---|---|---|
include.casts | boolean | Include cast messages | true |
include.reactions | boolean | Include reactions (likes/recasts) | true |
include.links | boolean | Include follows | true |
include.userData | boolean | Include user profiles | true |
Optional Features
| Field | Type | Description | Default |
|---|---|---|---|
fetchEmbeds | boolean | Parse embedded URLs for Frames/Mini-Apps | false |
maxEmbedsPerRun | integer | Max embeds to fetch per run | 500 |
neynarApiKey | string | Neynar v2 API key (optional) | - |
clientApi | boolean | Enable Farcaster Client API (experimental) | false |
proxy | string | Apify Proxy groups or custom URL | - |
Performance & Limits
| Field | Type | Description | Default |
|---|---|---|---|
pageSize | integer | Records per page (max 1000) | 1000 |
maxRecords | integer | Stop after N records (safety limit) | - |
requestPerMinute | integer | Rate limit for Hub API calls | 600 |
Output Schema
The actor produces normalized entities with the following types:
Cast Entity
{
"entity_type": "cast",
"fid": 2,
"hash": "0x1234567890abcdef",
"ts": 123456789,
"ts_iso": "2025-01-15T10:30:00.000Z",
"text": "Hello Farcaster!",
"mentions": [3, 6833],
"parent": {
"castId": { "fid": 2, "hash": "0xabc..." }
},
"embeds": {
"urls": ["https://example.com"],
"castIds": []
},
"derived": {
"urls": ["https://example.com"],
"frame_meta": {
"name": "My App",
"url": "https://app.example.com"
}
},
"ingest_source": "hub_http",
"ingest_ts": "2025-01-15T10:31:00.000Z",
"raw": { /* original Hub message */ }
}
Reaction Entity
{
"entity_type": "reaction",
"fid": 3,
"type": "like",
"target": {
"castId": { "fid": 2, "hash": "0x1234..." }
},
"ts": 123456790,
"ts_iso": "2025-01-15T10:31:00.000Z",
"hash": "0xabcd...",
"ingest_source": "hub_http",
"ingest_ts": "2025-01-15T10:32:00.000Z",
"raw": { /* original Hub message */ }
}
Link Entity (Follow)
{
"entity_type": "link",
"fid": 3,
"targetFid": 2,
"type": "follow",
"ts": 123456791,
"ts_iso": "2025-01-15T10:32:00.000Z",
"hash": "0xdef...",
"ingest_source": "hub_http",
"ingest_ts": "2025-01-15T10:33:00.000Z",
"raw": { /* original Hub message */ }
}
User Data Entity
{
"entity_type": "user_data",
"fid": 2,
"username": "vitalik.eth",
"display": "Vitalik",
"pfp": "https://example.com/pfp.png",
"bio": "Ethereum co-founder",
"url": "https://vitalik.ca",
"location": "Singapore",
"github": "vbuterin",
"twitter": "VitalikButerin",
"ts": 123456792,
"ts_iso": "2025-01-15T10:33:00.000Z",
"ingest_source": "hub_http",
"ingest_ts": "2025-01-15T10:34:00.000Z",
"raw": [ /* original Hub messages */ ]
}
Event Entity (Tail Mode)
{
"entity_type": "event",
"event_id": "12345",
"event_type": "MERGE_MESSAGE",
"ts": 123456793,
"ts_iso": "2025-01-15T10:34:00.000Z",
"shard_index": 0,
"message": { /* hydrated message if MERGE_MESSAGE */ },
"ingest_source": "hub_http",
"ingest_ts": "2025-01-15T10:35:00.000Z",
"raw": { /* original Hub event */ }
}
Farcaster Timestamps
Important: Farcaster uses a custom epoch starting at 2021-01-01T00:00:00.000Z.
- All entities include both
ts(Farcaster epoch seconds) andts_iso(ISO 8601) fields - Use
ts_isofor human-readable timestamps and data analysis - Use
tsfor filtering Hub API requests
Example conversion:
- Farcaster epoch
100000000=2024-03-03T01:46:40.000Z - Current time:
isoToFarcasterEpoch(new Date().toISOString())
Ingestion Modes Explained
Mode 1: By FIDs (Deterministic Backfill)
Use Case: Research specific users, backfill known accounts
How it works:
- For each FID in the input list (or discovered via shard scan):
- Fetch all casts with pagination
- Fetch all reactions (likes/recasts)
- Fetch all follows
- Fetch user profile data
- Maintains checkpoint per FID (
lastTs,lastPageToken) for resumable runs - Optionally discover FIDs by scanning specified shards
Best for: User-centric analysis, follower studies, content backfills
Mode 2: By Time Window (Targeted Study)
Use Case: Time-bounded analysis (e.g., "all activity during an event")
How it works:
- For each FID, fetch only messages within
startTimestamptostopTimestamp - Applies time filters to casts (Hub native support)
- Filters reactions and links manually (Hub doesn't support time filters)
- Faster than full backfill when studying specific time periods
Best for: Event analysis, temporal studies, A/B testing
Mode 3: Tail Events (Near-Real-Time)
Use Case: Live monitoring, incremental ingestion
How it works:
- Poll
/v1/eventsstarting fromfromEventId(or last checkpoint) - For
MERGE_MESSAGEevents, hydrate and push the message entity - Update
lastEventIdcheckpoint per shard - Sleeps 5s between polls (configurable)
Important: Hubs prune events older than ~3 days. Run frequently (every 1-2 days) to avoid data loss.
Best for: Real-time dashboards, notifications, streaming pipelines
Optional Features
Frame/Mini-App Metadata Parsing
When fetchEmbeds: true, the actor will:
- Extract all unique URLs from cast embeds
- Fetch each URL (up to
maxEmbedsPerRunlimit) - Parse
fc:miniapp:*andfc:frame:*meta tags - Enrich cast entities with
derived.frame_metaobject
Use Proxy: Set proxy field to avoid rate limits (e.g., "RESIDENTIAL" for Apify Proxy)
Performance: Adds ~2-5s per URL. Use maxEmbedsPerRun to cap crawling time.
Neynar v2 Integration
Provide neynarApiKey to use Neynar's hosted Hub endpoints instead of direct Hub HTTP.
Benefits:
- Faster, managed infrastructure
- No self-hosted Hub required
- Additional features (v2 only; v1 EOL March 31, 2025)
Records flagged: All entities get ingest_source: "neynar_v2"
Client API (Experimental)
Set clientApi: true to enable Warpcast-specific endpoints (e.g., trending, channels).
Warning: Non-protocol data. Records flagged as ingest_source: "client_api" to avoid confusion.
State Checkpointing & Resumability
The actor automatically persists state every 30 seconds and on Apify migration events:
- Per-FID checkpoints:
{ lastTs, lastPageToken }for resuming mid-pagination - Per-Shard checkpoints:
{ lastEventId }for event tail mode - Migration-safe: Survives container restarts and platform migrations
To resume a run:
- Start the actor with same input
- State is automatically restored
- Scraping continues from last checkpoint
Performance Tips
- Use time filters: Narrow
startTimestamp/stopTimestampfor faster runs - Batch FIDs: Process related users together to share dedup cache
- Tune
pageSize: Larger pages (1000) = fewer requests, but slower per-request - Set
maxRecords: Safety limit prevents runaway costs - Monitor rate limits: Default 600 req/min is conservative; increase if Hub allows
- Schedule tail runs: Run every 1-2 days to avoid event pruning
Limitations & Best Practices
Hub Event Pruning
- Limitation: Hubs prune events older than ~3 days
- Best Practice: Schedule tail runs every 1-2 days for continuous ingestion
Reaction/Link Time Filters
- Limitation: Hub API doesn't support time filters for reactions/links
- Workaround: Actor fetches all and filters manually in
byTimemode (slower)
Embed Fetching
- Limitation: Some URLs may be slow, dead, or behind auth
- Best Practice: Use
maxEmbedsPerRuncap and Apify Proxy to avoid timeouts
Rate Limiting
- Default: 600 req/min (conservative)
- Tuning: Increase
requestPerMinuteif your Hub supports higher rates - Public Hubs: May have stricter limits; monitor 429 responses
Pricing & Compute
Approximate compute units (based on default settings):
| Run Type | Records | Compute Units | Notes |
|---|---|---|---|
| Small backfill | <10k | ~0.01 | 2-3 FIDs, no embeds |
| Medium backfill | 100k | ~0.5 | 10-20 FIDs, all entities |
| Large backfill | 1M | ~5 | 100+ FIDs or full shard scan |
| Tail (1 hour) | 1k events | ~0.005 | Near-real-time streaming |
| With embeds | +100 URLs | +0.02 per 100 | Crawlee overhead |
Formula: ~0.5 CU per 100k records (without embeds)
Example Use Cases
Social Graph Analysis
{
"mode": "byFids",
"fids": [2, 3, 6833, 5650],
"include": {
"links": true,
"userData": true
}
}
Output: Follow relationships + user profiles for network analysis
Content Research
{
"mode": "byTime",
"fids": [2],
"startTimestamp": 100000000,
"stopTimestamp": 100050000,
"include": {
"casts": true,
"reactions": true
}
}
Output: All casts + reactions during a specific event
Real-Time Dashboard
{
"mode": "tailEvents",
"tail": { "fromEventId": "0" },
"maxRecords": 10000
}
Output: Live stream of all protocol events (schedule every hour)
Frame/Mini-App Catalog
{
"mode": "byFids",
"fids": [2, 3],
"fetchEmbeds": true,
"maxEmbedsPerRun": 200,
"include": {
"casts": true
}
}
Output: Casts with Frame/Mini-App metadata extracted
Troubleshooting
"Failed to connect to Hub"
- Verify
hubBaseUrlis correct and accessible - Check Hub is running and serving HTTP API on port 3381
- Try public Hub:
https://hub.pinata.cloud
"No data returned"
- Verify FIDs exist and have activity
- Check time window isn't too narrow (
byTimemode) - Ensure
include.*filters aren't excluding all data
"Max records limit reached"
- Increase
maxRecordsor remove limit for full backfill - Use checkpointing to resume in multiple runs
"Rate limit errors (429)"
- Decrease
requestPerMinute - Add delays between runs
- Use Neynar hosted Hub (better rate limits)
"Event tail missing data"
- Events pruned >3 days ago
- Schedule runs more frequently (every 1-2 days)
- Use
byFidsmode for historical backfill
Data Views
The actor provides pre-configured dataset views:
- Overview: All entities with key identifiers
- Casts: Cast content, timestamps, and URLs
- Reactions: Likes and recasts by FID
- Follows: Follow relationships (social graph edges)
- Users: User profiles and metadata
Access views in Apify Console → Dataset → Views tab
Support
- Email: kontakt@barrierefix.de
- Documentation: Farcaster Hub API Docs
- Issues: Report bugs or request features via email
Version History
- 1.0.0 (2025-01) - Initial release
- Three ingestion modes (byFids, byTime, tailEvents)
- Hub HTTP API integration
- State checkpointing
- Optional Frame/Mini-App parsing
- Neynar v2 support
🔗 Explore More of Our Actors
📰 Content & Publishing
| Actor | Description |
|---|---|
| Notion Marketplace Scraper | Scrape Notion templates and marketplace listings |
| Ghost Newsletter Scraper | Extract Ghost newsletter content and subscriber data |
| Google Play Reviews Scraper | Extract app reviews from Google Play Store |
💬 Social Media & Community
| Actor | Description |
|---|---|
| Reddit Scraper Pro | Monitor subreddits and track keywords with sentiment analysis |
| Discord Scraper Pro | Extract Discord messages and chat history for community insights |
| YouTube Comments Harvester | Comprehensive YouTube comments scraper with channel-wide enumeration |
| YouTube Contact Scraper | Extract YouTube channel contact information for outreach |
| YouTube Shorts Scraper | Scrape YouTube Shorts for viral content research |
License
MIT License - Free for commercial and non-commercial use