Ghost Newsletter Scraper
Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.
Ghost Newsletter Scraper
Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.
Perfect For
- Content Strategists - Monitor competitor newsletters and track content trends
- Market Researchers - Analyze the creator economy and identify publishing patterns
- Business Development Teams - Find sponsorship opportunities and track pricing changes
- Newsletter Creators - Research successful strategies and analyze posting frequency
- Agencies - Monitor multiple client newsletters and generate performance reports
Quick Start
The simplest way to get started - just add a newsletter URL:
{
"startUrls": [
{ "url": "https://blog.ghost.org/" }
]
}
That's it! The scraper will automatically:
- ✅ Detect that it's a Ghost site
- ✅ Find all posts via RSS feed (fastest method)
- ✅ Extract titles, authors, publish dates, and content metadata
- ✅ Get site info like posting frequency and pricing
What You'll Get
Newsletter Sites
Information about each newsletter:
- Title and description
- Publishing frequency (posts per 30 days)
- Subscription pricing (if available)
- Social media links
- Last post date
Articles & Posts
Full metadata for each article:
- Title, URL, and excerpt
- Author(s) with profile links
- Tags and categories
- Publish and update dates
- Word count and reading time estimate
- OpenGraph and Twitter Card data
Writers & Contributors
Author profiles including:
- Name and bio
- Profile page URL
- Social media links (Twitter, LinkedIn, GitHub, website)
Common Use Cases
1. Competitive Intelligence
Scenario: Track what your competitors are publishing
{
"startUrls": [
{ "url": "https://competitor1.com" },
{ "url": "https://competitor2.com" }
],
"lookbackDays": 7,
"outputLevel": "posts"
}
Get only new posts from the last week. Combine with n8n to get Slack alerts when competitors publish.
2. Content Research
Scenario: Analyze successful newsletters in your niche
{
"startUrls": [
{ "url": "https://popular-newsletter.com" }
],
"limitPerSite": 100,
"emitAuthors": true,
"fetchPricingSignals": true
}
Get the last 100 posts, author info, and pricing to understand their strategy.
3. Sponsorship Prospecting
Scenario: Find newsletters that accept sponsors
{
"domains": ["newsletter1.com", "newsletter2.com", "newsletter3.com"],
"fetchPricingSignals": true,
"outputLevel": "publication",
"limitPerSite": 1
}
Quickly scan multiple newsletters for pricing pages and subscription info.
4. Publishing Pattern Analysis
Scenario: Track how often newsletters in a category post
{
"startUrls": [
{ "url": "https://tech-newsletter.com" }
],
"lookbackDays": 90,
"deltaCrawl": false
}
Get 3 months of data to analyze posting frequency and consistency.
Input Options
Start URLs (Required)
Choose one method to specify which newsletters to scrape:
- Start URLs: Full URLs like
https://blog.ghost.org/ - Domains: Just the domain like
blog.ghost.org(https:// added automatically)
What to Extract
| Option | What it does |
|---|---|
| What to extract | Choose "Everything" (posts + site info), "Posts only", or "Site info only" |
| Extract author profiles | Get author bios and social links (recommended: ON) |
| Extract pricing | Find subscription prices from /subscribe pages (recommended: ON) |
| Extract tags | Get article tags/categories (experimental, slower) |
Crawl Settings
| Setting | Description | Recommended |
|---|---|---|
| How to find posts | RSS first (fastest), Sitemap first (most complete), or Hybrid | RSS first |
| Max posts per newsletter | Stop after this many posts | 200 |
| Only posts from last X days | Filter by date (0 = all posts) | 0 or 30 |
| Skip already-seen posts | Delta crawling saves costs! | ON |
Filters (Optional)
Use regex patterns to include/exclude specific URLs:
- Include:
[".*/tag/tech/.*"]- Only posts tagged "tech" - Exclude:
[".*/author/.*"]- Skip author pages
Output Format
All data is saved to a dataset with multiple views for easy filtering:
View 1: All Records
Complete data with everything mixed together
View 2: Newsletter Sites
{
"type": "publication",
"domain": "blog.ghost.org",
"title": "Ghost Blog",
"post_velocity_30d": 12,
"last_post_at": "2025-09-28T10:00:00Z",
"pricing": {
"has_subscribe": true,
"plan_cards": [...]
}
}
View 3: Articles & Posts
{
"type": "post",
"domain": "blog.ghost.org",
"title": "How to Build a Newsletter",
"url": "https://blog.ghost.org/how-to-build/",
"authors": [{"name": "Jane Doe"}],
"tags": ["guides", "tutorials"],
"published_at": "2025-09-15T08:00:00Z",
"word_count_est": 1240,
"reading_time_min_est": 6
}
View 4: Writers & Contributors
{
"type": "author",
"name": "Jane Doe",
"profile_url": "https://blog.ghost.org/author/jane/",
"bio": "Writer and creator",
"social": {
"twitter": "https://twitter.com/jane",
"website": "https://janedoe.com"
}
}
Integrations
n8n Workflows
New Post Alert
- Trigger: Apify Dataset Item webhook (filter:
type=post) - OpenAI: Summarize the post
- Slack: Post to #content-monitoring
- Notion: Add to content calendar
Pricing Change Alert
- Trigger: Schedule (daily)
- Get latest dataset (filter:
type=publication) - Compare pricing with previous run
- Email: Alert team if prices changed
Weekly Digest
- Trigger: Schedule (Monday 9 AM)
- Get posts from last 7 days
- Group by newsletter
- Email: Send digest to team
Make.com / Zapier
The actor works with any automation tool that supports webhooks or API calls. Use Apify's integration to trigger workflows when new data is found.
How It Works
- Detects Ghost - Automatically identifies Ghost-powered sites using meta tags, Portal scripts, and RSS feeds
- Finds Posts - Uses RSS feeds (fastest), sitemaps, or HTML pagination to discover articles
- Extracts Data - Parses JSON-LD, OpenGraph tags, and HTML to get complete metadata
- Saves Results - Stores everything in a structured dataset with easy-to-use views
- Tracks Changes - Delta crawling means you only pay for new content (saves costs!)
Pricing & Performance
Pricing Model: Pay per item extracted (posts, authors, sites)
Typical Costs:
- Small newsletter (20 posts) = ~25 items
- 10 newsletters monitored daily = ~200 items/day
- Delta crawling reduces repeat costs by 80%+
Speed:
- RSS mode: ~5-10 seconds per newsletter
- Sitemap mode: ~10-20 seconds per newsletter
- Faster than headless browser scrapers by 5-10x
Ethical & Compliant
This scraper:
- ✅ Only accesses public content
- ✅ Respects robots.txt rules
- ✅ Implements rate limiting
- ✅ Identifies itself clearly
- ❌ Never bypasses paywalls
- ❌ Never accesses private content
- ❌ Never logs in or authenticates
Advanced Settings
For power users, we offer:
- Concurrency control - Adjust speed vs. politeness
- Circuit breakers - Auto-stop on errors to save costs
- Proxy support - Use Apify proxy for large-scale scraping
- Browser mode - Enable Playwright for JavaScript-heavy sites (rarely needed)
- Custom User-Agent - Identify your scraper however you want
Most users can ignore these - the defaults work great!
Limitations
- Ghost only - Only works with Ghost-powered sites (use our detector or check for "Powered by Ghost")
- Public content - Cannot access members-only or premium content
- Static pricing - Pricing detection works on static pages only (not Portal overlay)
- No authentication - Doesn't support logged-in scraping
Troubleshooting
"No posts found"
- Check if the site has an RSS feed at
/rss/or/feed/ - Try changing discovery mode to "Sitemap first" or "Hybrid"
- Verify it's actually a Ghost site (look for "Powered by Ghost" footer)
"Site isn't Ghost"
- The detector looks for Ghost-specific signals
- Some Ghost sites are heavily customized - try anyway, it might still work
- Turn off "Stop if site isn't Ghost" to scrape anyway
"Too slow"
- Use "RSS first" mode (fastest)
- Reduce "Max posts per newsletter"
- Enable "Skip already-seen posts" for repeat runs
"Hitting rate limits"
- Reduce "Max requests per site" (try 2)
- Enable "Respect robots.txt"
- Add delays by reducing concurrency
Support
- Email: kontakt@barrierefix.de
- Issues: Report bugs via Apify Console
- Documentation: This README + input field tooltips
Version History
1.0.0 (2025-10-01)
- Initial release
- Ghost detection with multi-signal verification
- RSS, sitemap, and HTML discovery modes
- Post, site, and author extraction
- Pricing detection for subscription newsletters
- Delta crawling with hash-based deduplication
- Circuit breakers and smart error handling
- n8n integration ready
🔗 Explore More of Our Actors
📰 Content & Publishing
| Actor | Description |
|---|---|
| Notion Marketplace Scraper | Scrape Notion templates and marketplace listings |
| Farcaster Hub Scraper | Scrape Farcaster decentralized social network data |
| Google Play Reviews Scraper | Extract app reviews from Google Play Store |
💬 Social Media & Community
| Actor | Description |
|---|---|
| Reddit Scraper Pro | Monitor subreddits and track keywords with sentiment analysis |
| Discord Scraper Pro | Extract Discord messages and chat history for community insights |
| YouTube Comments Harvester | Comprehensive YouTube comments scraper with channel-wide enumeration |
| YouTube Contact Scraper | Extract YouTube channel contact information for outreach |
| YouTube Shorts Scraper | Scrape YouTube Shorts for viral content research |
License
MIT - Use commercially, modify freely, no attribution required.
Made by Barrierefix - Building tools for the creator economy.