HeyGen Documentation Scraper
A collection of Python scripts to download and convert HeyGen API documentation to markdown format.
Directory Structure
heygen_docs/
├── scripts/ # All scraping scripts
│ ├── advanced_scraper.py # Advanced scraper with better content extraction
│ ├── simple_scraper.py # Basic HTML to markdown converter
│ ├── extract_markdown.py # Selenium-based scraper (uses Ask AI button)
│ ├── download_heygen_docs.py # Basic HTML downloader
│ └── requirements.txt # Python dependencies
├── advanced_docs/ # Output from advanced_scraper.py
│ └── heygen_complete_docs.md # Combined documentation file
├── scraped_docs/ # Output from simple_scraper.py
│ └── heygen_docs_complete.md # Combined documentation file
└── README.md # This file
Available Scripts
1. advanced_scraper.py (Recommended)
The most reliable scraper that successfully downloads available documentation.
cd scripts
python3 advanced_scraper.py
Features:
- Downloads 14 out of 21 documentation pages
- Converts HTML to clean markdown
- Creates individual markdown files for each page
- Generates a combined
heygen_complete_docs.mdfile - Creates an HTML index for easy navigation
Successfully downloads:
- Quick Start
- Create Avatar Videos
- Customize Video Background
- Using Audio Source as Voice
- Generate Video from Template
- Video Translate API
- Photo Avatars API
- Streaming API Overview
- Streaming Avatar SDK
- Firewall Configuration
- Interactive Avatar Deprecation Notice
- Bulk Video Translation
- HeyGen OAuth
- HeyGen MCP Server
2. simple_scraper.py
Basic scraper with HTML to markdown conversion.
cd scripts
python3 simple_scraper.py
3. extract_markdown.py
Uses Selenium to automate the "Ask AI > Copy Markdown" functionality.
cd scripts
python3 extract_markdown.py
Note: Requires Chrome and ChromeDriver to be installed.
4. download_heygen_docs.py
Basic HTML downloader without markdown conversion.
cd scripts
python3 download_heygen_docs.py
Installation
- Install required Python packages:
cd scripts
pip install -r requirements.txt
- For Selenium-based scraper, additionally install:
- Google Chrome browser
- ChromeDriver (matching your Chrome version)
Pages Not Available
The following pages returned 404 errors (may require authentication):
- Create Videos with Personal Avatar and Voice
- Create Transparent Avatar Videos in WebM Format
- HeyGen's Webhook Events
- Demo: Create a Vite Project with Streaming SDK
- Demo: Create an iOS App featuring Interactive Avatar
- Zapier Integration
- Personalized Video
Usage
- Navigate to the scripts directory:
cd scripts
- Run the recommended scraper:
python3 advanced_scraper.py
- Find the downloaded documentation in:
- Individual files:
../advanced_docs/*.md - Combined file:
../advanced_docs/heygen_complete_docs.md - HTML index:
../advanced_docs/index.html
- Individual files:
Output
The scrapers create markdown files with:
- Clean, formatted markdown content
- Source URLs for reference
- Generation timestamps
- Table of contents in combined files
Troubleshooting
- 404 errors: Some documentation pages may be behind authentication or have changed URLs
- Selenium issues: Ensure Chrome and ChromeDriver versions match
- Rate limiting: Scripts include delays between requests to be respectful of the server
Notes
- All scripts respect rate limiting with 1-2 second delays between requests
- Output directories are created automatically
- Scripts can be run multiple times; existing files will be overwritten