heygen
Markdown

README

HeyGen Documentation Scraper

A collection of Python scripts to download and convert HeyGen API documentation to markdown format.

Directory Structure

heygen_docs/
├── scripts/                   # All scraping scripts
│   ├── advanced_scraper.py    # Advanced scraper with better content extraction
│   ├── simple_scraper.py      # Basic HTML to markdown converter
│   ├── extract_markdown.py    # Selenium-based scraper (uses Ask AI button)
│   ├── download_heygen_docs.py # Basic HTML downloader
│   └── requirements.txt       # Python dependencies
├── advanced_docs/             # Output from advanced_scraper.py
│   └── heygen_complete_docs.md # Combined documentation file
├── scraped_docs/              # Output from simple_scraper.py
│   └── heygen_docs_complete.md # Combined documentation file
└── README.md                  # This file

Available Scripts

1. advanced_scraper.py (Recommended)

The most reliable scraper that successfully downloads available documentation.

cd scripts
python3 advanced_scraper.py

Features:

  • Downloads 14 out of 21 documentation pages
  • Converts HTML to clean markdown
  • Creates individual markdown files for each page
  • Generates a combined heygen_complete_docs.md file
  • Creates an HTML index for easy navigation

Successfully downloads:

  • Quick Start
  • Create Avatar Videos
  • Customize Video Background
  • Using Audio Source as Voice
  • Generate Video from Template
  • Video Translate API
  • Photo Avatars API
  • Streaming API Overview
  • Streaming Avatar SDK
  • Firewall Configuration
  • Interactive Avatar Deprecation Notice
  • Bulk Video Translation
  • HeyGen OAuth
  • HeyGen MCP Server

2. simple_scraper.py

Basic scraper with HTML to markdown conversion.

cd scripts
python3 simple_scraper.py

3. extract_markdown.py

Uses Selenium to automate the "Ask AI > Copy Markdown" functionality.

cd scripts
python3 extract_markdown.py

Note: Requires Chrome and ChromeDriver to be installed.

4. download_heygen_docs.py

Basic HTML downloader without markdown conversion.

cd scripts
python3 download_heygen_docs.py

Installation

  1. Install required Python packages:
cd scripts
pip install -r requirements.txt
  1. For Selenium-based scraper, additionally install:
    • Google Chrome browser
    • ChromeDriver (matching your Chrome version)

Pages Not Available

The following pages returned 404 errors (may require authentication):

  1. Create Videos with Personal Avatar and Voice
  2. Create Transparent Avatar Videos in WebM Format
  3. HeyGen's Webhook Events
  4. Demo: Create a Vite Project with Streaming SDK
  5. Demo: Create an iOS App featuring Interactive Avatar
  6. Zapier Integration
  7. Personalized Video

Usage

  1. Navigate to the scripts directory:
cd scripts
  1. Run the recommended scraper:
python3 advanced_scraper.py
  1. Find the downloaded documentation in:
    • Individual files: ../advanced_docs/*.md
    • Combined file: ../advanced_docs/heygen_complete_docs.md
    • HTML index: ../advanced_docs/index.html

Output

The scrapers create markdown files with:

  • Clean, formatted markdown content
  • Source URLs for reference
  • Generation timestamps
  • Table of contents in combined files

Troubleshooting

  • 404 errors: Some documentation pages may be behind authentication or have changed URLs
  • Selenium issues: Ensure Chrome and ChromeDriver versions match
  • Rate limiting: Scripts include delays between requests to be respectful of the server

Notes

  • All scripts respect rate limiting with 1-2 second delays between requests
  • Output directories are created automatically
  • Scripts can be run multiple times; existing files will be overwritten