Ghost Newsletter Scraper

Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.

Perfect For

Content Strategists - Monitor competitor newsletters and track content trends
Market Researchers - Analyze the creator economy and identify publishing patterns
Business Development Teams - Find sponsorship opportunities and track pricing changes
Newsletter Creators - Research successful strategies and analyze posting frequency
Agencies - Monitor multiple client newsletters and generate performance reports

Quick Start

The simplest way to get started - just add a newsletter URL:

{
  "startUrls": [
    { "url": "https://blog.ghost.org/" }
  ]
}

That's it! The scraper will automatically:

✅ Detect that it's a Ghost site
✅ Find all posts via RSS feed (fastest method)
✅ Extract titles, authors, publish dates, and content metadata
✅ Get site info like posting frequency and pricing

What You'll Get

Newsletter Sites

Information about each newsletter:

Title and description
Publishing frequency (posts per 30 days)
Subscription pricing (if available)
Social media links
Last post date

Articles & Posts

Full metadata for each article:

Title, URL, and excerpt
Author(s) with profile links
Tags and categories
Publish and update dates
Word count and reading time estimate
OpenGraph and Twitter Card data

Writers & Contributors

Author profiles including:

Name and bio
Profile page URL
Social media links (Twitter, LinkedIn, GitHub, website)

Common Use Cases

1. Competitive Intelligence

Scenario: Track what your competitors are publishing

{
  "startUrls": [
    { "url": "https://competitor1.com" },
    { "url": "https://competitor2.com" }
  ],
  "lookbackDays": 7,
  "outputLevel": "posts"
}

Get only new posts from the last week. Combine with n8n to get Slack alerts when competitors publish.

2. Content Research

Scenario: Analyze successful newsletters in your niche

{
  "startUrls": [
    { "url": "https://popular-newsletter.com" }
  ],
  "limitPerSite": 100,
  "emitAuthors": true,
  "fetchPricingSignals": true
}

Get the last 100 posts, author info, and pricing to understand their strategy.

3. Sponsorship Prospecting

Scenario: Find newsletters that accept sponsors

{
  "domains": ["newsletter1.com", "newsletter2.com", "newsletter3.com"],
  "fetchPricingSignals": true,
  "outputLevel": "publication",
  "limitPerSite": 1
}

Quickly scan multiple newsletters for pricing pages and subscription info.

4. Publishing Pattern Analysis

Scenario: Track how often newsletters in a category post

{
  "startUrls": [
    { "url": "https://tech-newsletter.com" }
  ],
  "lookbackDays": 90,
  "deltaCrawl": false
}

Get 3 months of data to analyze posting frequency and consistency.

Input Options

Start URLs (Required)

Choose one method to specify which newsletters to scrape:

Start URLs: Full URLs like https://blog.ghost.org/
Domains: Just the domain like blog.ghost.org (https:// added automatically)

What to Extract

Option	What it does
What to extract	Choose "Everything" (posts + site info), "Posts only", or "Site info only"
Extract author profiles	Get author bios and social links (recommended: ON)
Extract pricing	Find subscription prices from /subscribe pages (recommended: ON)
Extract tags	Get article tags/categories (experimental, slower)

Crawl Settings

Setting	Description	Recommended
How to find posts	RSS first (fastest), Sitemap first (most complete), or Hybrid	RSS first
Max posts per newsletter	Stop after this many posts	200
Only posts from last X days	Filter by date (0 = all posts)	0 or 30
Skip already-seen posts	Delta crawling saves costs!	ON

Filters (Optional)

Use regex patterns to include/exclude specific URLs:

Include: [".*/tag/tech/.*"] - Only posts tagged "tech"
Exclude: [".*/author/.*"] - Skip author pages

Output Format

All data is saved to a dataset with multiple views for easy filtering:

View 1: All Records

Complete data with everything mixed together

View 2: Newsletter Sites

{
  "type": "publication",
  "domain": "blog.ghost.org",
  "title": "Ghost Blog",
  "post_velocity_30d": 12,
  "last_post_at": "2025-09-28T10:00:00Z",
  "pricing": {
    "has_subscribe": true,
    "plan_cards": [...]
  }
}

View 3: Articles & Posts

{
  "type": "post",
  "domain": "blog.ghost.org",
  "title": "How to Build a Newsletter",
  "url": "https://blog.ghost.org/how-to-build/",
  "authors": [{"name": "Jane Doe"}],
  "tags": ["guides", "tutorials"],
  "published_at": "2025-09-15T08:00:00Z",
  "word_count_est": 1240,
  "reading_time_min_est": 6
}

View 4: Writers & Contributors

{
  "type": "author",
  "name": "Jane Doe",
  "profile_url": "https://blog.ghost.org/author/jane/",
  "bio": "Writer and creator",
  "social": {
    "twitter": "https://twitter.com/jane",
    "website": "https://janedoe.com"
  }
}

Integrations

n8n Workflows

New Post Alert

Trigger: Apify Dataset Item webhook (filter: type=post)
OpenAI: Summarize the post
Slack: Post to #content-monitoring
Notion: Add to content calendar

Pricing Change Alert

Trigger: Schedule (daily)
Get latest dataset (filter: type=publication)
Compare pricing with previous run
Email: Alert team if prices changed

Weekly Digest

Trigger: Schedule (Monday 9 AM)
Get posts from last 7 days
Group by newsletter
Email: Send digest to team

Make.com / Zapier

The actor works with any automation tool that supports webhooks or API calls. Use Apify's integration to trigger workflows when new data is found.

How It Works

Detects Ghost - Automatically identifies Ghost-powered sites using meta tags, Portal scripts, and RSS feeds
Finds Posts - Uses RSS feeds (fastest), sitemaps, or HTML pagination to discover articles
Extracts Data - Parses JSON-LD, OpenGraph tags, and HTML to get complete metadata
Saves Results - Stores everything in a structured dataset with easy-to-use views
Tracks Changes - Delta crawling means you only pay for new content (saves costs!)

Pricing & Performance

Pricing Model: Pay per item extracted (posts, authors, sites)

Typical Costs:

Small newsletter (20 posts) = ~25 items
10 newsletters monitored daily = ~200 items/day
Delta crawling reduces repeat costs by 80%+

Speed:

RSS mode: ~5-10 seconds per newsletter
Sitemap mode: ~10-20 seconds per newsletter
Faster than headless browser scrapers by 5-10x

Ethical & Compliant

This scraper:

✅ Only accesses public content
✅ Respects robots.txt rules
✅ Implements rate limiting
✅ Identifies itself clearly
❌ Never bypasses paywalls
❌ Never accesses private content
❌ Never logs in or authenticates

Advanced Settings

For power users, we offer:

Concurrency control - Adjust speed vs. politeness
Circuit breakers - Auto-stop on errors to save costs
Proxy support - Use Apify proxy for large-scale scraping
Browser mode - Enable Playwright for JavaScript-heavy sites (rarely needed)
Custom User-Agent - Identify your scraper however you want

Most users can ignore these - the defaults work great!

Limitations

Ghost only - Only works with Ghost-powered sites (use our detector or check for "Powered by Ghost")
Public content - Cannot access members-only or premium content
Static pricing - Pricing detection works on static pages only (not Portal overlay)
No authentication - Doesn't support logged-in scraping

Troubleshooting

"No posts found"

Check if the site has an RSS feed at /rss/ or /feed/
Try changing discovery mode to "Sitemap first" or "Hybrid"
Verify it's actually a Ghost site (look for "Powered by Ghost" footer)

"Site isn't Ghost"

The detector looks for Ghost-specific signals
Some Ghost sites are heavily customized - try anyway, it might still work
Turn off "Stop if site isn't Ghost" to scrape anyway

"Too slow"

Use "RSS first" mode (fastest)
Reduce "Max posts per newsletter"
Enable "Skip already-seen posts" for repeat runs

"Hitting rate limits"

Reduce "Max requests per site" (try 2)
Enable "Respect robots.txt"
Add delays by reducing concurrency

Support

Email: kontakt@barrierefix.de
Issues: Report bugs via Apify Console
Documentation: This README + input field tooltips

Version History

1.0.0 (2025-10-01)

Initial release
Ghost detection with multi-signal verification
RSS, sitemap, and HTML discovery modes
Post, site, and author extraction
Pricing detection for subscription newsletters
Delta crawling with hash-based deduplication
Circuit breakers and smart error handling
n8n integration ready

🔗 Explore More of Our Actors

📰 Content & Publishing

Actor	Description
Notion Marketplace Scraper	Scrape Notion templates and marketplace listings
Farcaster Hub Scraper	Scrape Farcaster decentralized social network data
Google Play Reviews Scraper	Extract app reviews from Google Play Store

💬 Social Media & Community

Actor	Description
Reddit Scraper Pro	Monitor subreddits and track keywords with sentiment analysis
Discord Scraper Pro	Extract Discord messages and chat history for community insights
YouTube Comments Harvester	Comprehensive YouTube comments scraper with channel-wide enumeration
YouTube Contact Scraper	Extract YouTube channel contact information for outreach
YouTube Shorts Scraper	Scrape YouTube Shorts for viral content research

License

MIT - Use commercially, modify freely, no attribution required.

Made by Barrierefix - Building tools for the creator economy.