Bluesky & Mastodon Scraper API
Extract and monitor posts from Bluesky (AT Protocol) and Mastodon (Fediverse) with a unified, normalized JSON API. The most comprehensive social media scraper for decentralized networks - perfect for social listening, brand monitoring, market rese...
Bluesky & Mastodon Scraper API - Decentralized Social Media Data Aggregator
Extract and monitor posts from Bluesky (AT Protocol) and Mastodon (Fediverse) with a unified, normalized JSON API. The most comprehensive social media scraper for decentralized networks - perfect for social listening, brand monitoring, market research, sentiment analysis, and AI training data collection.
🔍 Search by keywords • 👥 Track specific users • 📊 Unified data format • 🪝 Real-time webhooks • 💰 Pay-per-post pricing
🚀 Features
- Multi-Platform Support: Scrape Bluesky and Mastodon simultaneously
- Keyword Search: Find posts mentioning specific terms or phrases
- Handle Tracking: Monitor specific users across platforms
- Date Range Filtering: Historical and real-time post collection
- Unified Schema: Normalized output format across all platforms
- Intelligent Deduplication: Automatic duplicate detection and removal
- Real-Time Webhooks: Send posts to your endpoints as they're discovered
- Language Filtering: Filter posts by language (BCP-47 codes)
- Pay-Per-Event Pricing: Only pay for posts collected, not compute time
- No Authentication Required: Works with public data (optional auth for higher limits)
📊 Supported Platforms
Bluesky (AT Protocol)
- Full keyword search via searchActors workaround
- User feed tracking
- Quote posts, replies, reposts, likes
- Media attachments (images, videos, GIFs)
- Rich metadata (DIDs, handles, timestamps)
Mastodon (Fediverse)
- Multi-instance support (mastodon.social, mas.to, fosstodon.org, etc.)
- Full keyword search across instances
- User timeline tracking
- Boosts, replies, favorites
- Media attachments with alt text
- Instance-specific data
💡 Use Cases
- Social Listening: Track brand mentions and industry keywords
- Market Research: Analyze trends and conversations in your niche
- Sentiment Analysis: Collect data for AI/ML sentiment models
- Brand Monitoring: Monitor your company and competitors
- Academic Research: Study social media behavior and network effects
- Content Discovery: Find engaging content for curation
- Influencer Tracking: Monitor key voices in your industry
🎯 Quick Start
Example 1: Search for AI-related posts
```json { "platforms": ["bluesky", "mastodon"], "query": "artificial intelligence", "maxItems": 100, "languages": ["en"] } ```
Example 2: Track specific users
```json { "platforms": ["bluesky", "mastodon"], "handles": ["jay.bsky.social", "@gargron@mastodon.social"], "maxItems": 500 } ```
Example 3: Historical search with date range
```json { "platforms": ["bluesky"], "query": "climate change", "since": "2025-09-01T00:00:00Z", "until": "2025-10-01T00:00:00Z", "maxItems": 1000 } ```
Example 4: Real-time monitoring with webhooks
```json { "platforms": ["bluesky", "mastodon"], "query": "crypto", "emitWebhooks": true, "webhooks": [ { "url": "https://your-api.com/webhook", "headers": {"Authorization": "Bearer YOUR_TOKEN"}, "mode": "per_item", "platforms": ["bluesky"] } ] } ```
📥 Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| `platforms` | Array | ✅ | Platforms to scrape: `["bluesky", "mastodon"]` |
| `query` | String | ❌ | Keywords to search for |
| `handles` | Array | ❌ | Specific user handles to track |
| `since` | String | ❌ | Start date (ISO 8601) |
| `until` | String | ❌ | End date (ISO 8601) |
| `maxItems` | Integer | ❌ | Max posts to collect (default: 1000) |
| `languages` | Array | ❌ | Language codes (e.g., `["en", "de"]`) |
| `includeReplies` | Boolean | ❌ | Include reply posts (default: false) |
| `emitWebhooks` | Boolean | ❌ | Enable webhook delivery |
| `webhooks` | Array | ❌ | Webhook endpoint configurations |
| `blueskyCredentials` | Object | ❌ | Optional auth for higher rate limits |
| `mastodonInstances` | Array | ❌ | Specific Mastodon instances to search |
| `maxConcurrency` | Integer | ❌ | Concurrent requests (default: 5) |
| `dryRun` | Boolean | ❌ | Test mode without storing data |
Note: You must provide either `query` OR `handles` (or both).
📤 Output Schema
Each post is normalized to a unified format:
```json { "platform": "bluesky", "postId": "at://did:plc:xyz/app.bsky.feed.post/3kff...", "url": "https://bsky.app/profile/jay.bsky.social/post/3kff...", "text": "Building the future of social media...", "language": "en", "author": { "handle": "jay.bsky.social", "did": "did:plc:xyz", "displayName": "Jay Graber", "profileUrl": "https://bsky.app/profile/jay.bsky.social" }, "createdAt": "2025-10-08T10:30:00Z", "metrics": { "replies": 42, "reposts": 128, "likes": 567, "quotes": 23 }, "entities": { "hashtags": ["decentralization", "atproto"], "mentions": ["@handle1.bsky.social"] }, "media": [ { "type": "image", "url": "https://cdn.bsky.app/...", "alt": "Screenshot of the app" } ], "source": { "instance": null }, "references": { "replyTo": null, "quotedPost": "at://did:plc:..." }, "ingest_meta": { "first_seen_at": "2025-10-08T11:00:00Z", "adapter_version": "1.0.0" } } ```
🔐 Authentication
Bluesky (Optional)
Works without authentication for public data. For higher rate limits: ```json { "blueskyCredentials": { "identifier": "your-handle.bsky.social", "password": "your-app-password" } } ``` Get app password: Settings → App Passwords → Add App Password
Mastodon
No authentication required for public posts.
🌐 Mastodon Instance Support
Auto-Detection
The actor automatically detects Mastodon instances from handles: ```json { "handles": ["@user@mastodon.social", "@dev@fosstodon.org"] } ```
Manual Configuration
Specify instances explicitly: ```json { "mastodonInstances": ["mastodon.social", "mas.to", "fosstodon.org"] } ```
🪝 Webhooks
Send posts to your endpoints in real-time:
```json { "emitWebhooks": true, "webhooks": [ { "url": "https://api.example.com/posts", "headers": { "Authorization": "Bearer YOUR_TOKEN", "Content-Type": "application/json" }, "secret": "shared-secret-key", "mode": "per_item", "platforms": ["bluesky", "mastodon"] } ] } ```
Webhook Modes:
- `per_item`: Send each post individually
- `batch`: Send posts in batches (coming soon)
💰 Pricing
Pay-Per-Event Model: Only pay for posts you collect
- $0.002 per post ($2 per 1,000 posts)
- No compute time charges
- No setup fees
- Cancel anytime
Examples:
- 100 posts = $0.20
- 1,000 posts = $2.00
- 10,000 posts = $20.00
- 100,000 posts = $200.00
Simple, transparent pricing - you only pay for what you use.
📅 Scheduling
Run every hour
``` 0 * * * * ```
Run daily at midnight
``` 0 0 * * * ```
Run every 15 minutes
``` */15 * * * * ```
🔄 Deduplication
The actor automatically:
- Tracks seen posts with state management
- Skips duplicates across runs
- Cleans up old state entries (30+ days)
⚡ Performance
- Speed: ~100-200 posts/minute per platform
- Rate Limits: Respects platform rate limits automatically
- Concurrency: Configurable (1-20 concurrent requests)
- Memory: ~256MB typical, ~512MB for large runs
🛠️ Advanced Configuration
Language Filtering
```json { "languages": ["en", "de", "ja", "es"] } ```
Date Range
```json { "since": "2025-09-01T00:00:00Z", "until": "2025-10-01T00:00:00Z" } ```
Include Replies
```json { "includeReplies": true } ```
Dry Run (Testing)
```json { "dryRun": true } ```
📊 Dataset Views
The actor provides three pre-configured views in Apify Console:
- Overview: All posts with key metrics
- By Platform: Posts grouped by source
- Top Engagement: Sorted by likes/reposts
🔍 Search Tips
Keyword Search
- Use specific terms: "machine learning" vs "AI"
- Combine keywords: "climate change policy"
- Use quotes for exact phrases (Bluesky only)
Handle Formats
- Bluesky: `jay.bsky.social` or `handle.domain.com`
- Mastodon: `@username@instance.social` or `instance.social/@username`
Date Ranges
- Use ISO 8601 format: `2025-10-08T10:30:00Z`
- Timezone: Always UTC (Z suffix)
⚠️ Limitations
- Bluesky: Keyword search uses searchActors workaround (may be slower than native search)
- Mastodon: Search quality depends on instance search capabilities
- Rate Limits: Public APIs have rate limits (authentication increases limits)
- Historical Data: Availability depends on platform retention policies
🆘 Support
- Email: kontakt@barrierefix.de
- Issues: Report bugs or request features
- Documentation: Full API docs in source code
📜 License
MIT License - Free to use commercially and privately
🏷️ Tags
`bluesky` `mastodon` `at-protocol` `fediverse` `social-media` `scraper` `aggregator` `decentralized` `web3` `social-listening` `brand-monitoring` `sentiment-analysis` `market-research` `data-collection` `apify`
🔗 Explore More of Our Actors
💬 Social Media & Community
| Actor | Description |
|---|---|
| Reddit Scraper Pro | Monitor subreddits and track keywords with sentiment analysis |
| Discord Scraper Pro | Extract Discord messages and chat history for community insights |
| YouTube Comments Harvester | Comprehensive YouTube comments scraper with channel-wide enumeration |
| YouTube Contact Scraper | Extract YouTube channel contact information for outreach |
| YouTube Shorts Scraper | Scrape YouTube Shorts for viral content research |
🏢 Business Intelligence
| Actor | Description |
|---|---|
| Indeed Salary Analyzer | Get salary data for compensation benchmarking and HR analytics |
| Crunchbase Scraper | Extract company data and funding information for business intelligence |
| Northdata Scraper | Extract German company data from Northdata for business research |
| Shopify Store Intelligence | Analyze Shopify stores for competitive intelligence and market research |
| Apify Store Radar | Monitor Apify Store actors for market intelligence |
Built by Barrierefix | Powered by Apify