Why migrating API documentation is so complex, and how our experts handle it
Henry Bassey

Henry Bassey

6 min readJul 01 2025

Why migrating API documentation is so complex, and how our experts handle it

In the first part of this documentation guide, I got William to walk us through a better way to create API documentation. If you haven’t read it yet, you should. However, while building documentation from scratch gives you the freedom to define structure, tools, and process, documentation migrations present a different kind of challenge.

You’re working with existing systems, legacy content, and past decisions you didn’t make. That means walking straight into traps like broken components, outdated specs, and a structure that no longer makes sense. A few wrong moves, and teams lose momentum, trust, or even visibility into what their docs are supposed to achieve.

In this guide, I asked William Imoh (CEO), Blessing Anyebe (Content Ops Manager), and Diana Payton (Documentation Manager) to break down how they handle documentation migrations, based on the projects we’ve delivered for various clients.

You’ll learn:

  • How to prepare for a migration without breaking anything mid-process
  • What goes into the migration brief, and why it’ll save you later
  • How to deal with mismatches, specs, and structure gaps
  • The tools and scripts we use
  • How to define “done” and validate success
  • The mistakes teams make, and how to avoid them

Note that we adopt Mintlify for all our documentation projects, and you’ll see several references to the platform in this guide. We partner with Mintlify to help teams deliver Stripe-level docs. Build your docs with Mintlify here.

Why documentation migrations are harder than they seem

Most teams that reach out to us already have documentation that's often years old, sprawling across different systems, and embedded into workflows that no longer match how the product works today. On paper, migrating that documentation should be simple. Just move the content, fix a few links, and update the structure. But that assumption rarely holds.

Why documentation migrations are harder than they seem

Some teams might be migrating from homegrown tools, legacy CMS setups, or because they’ve outgrown markdown-based systems like Docusaurus and ReadMe. The initial problem they report is that the docs are difficult to manage or don’t appear modern. But once we dig in, the issues morph into confusing structure, disconnected navigation, slow search, duplicated content, and growing technical debt.

In several cases, we’ve had to rewrite invalid OpenAPI files or replace components that had no equivalent in the new platform. We’ve seen documentation that looked usable on the frontend but was nearly impossible to maintain due to the poorly organized code folder structure. One client’s team had put most of their documentation in a flat directory with no clear hierarchy, making even basic edits risky.

Here's what to keep in mind:

  • Migrations involve system-wide planning and coordination.
  • Existing documentation needs to be audited like legacy software.
  • Structure, folder hierarchy, and maintenance should be reviewed upfront.
  • Broken specs, missing pages, and invalid links are common during migration.
  • Migration leaders must understand what makes documentation useful and reliable.

How do we scope and prepare for a documentation migration?

The worst kind of migration is the one that starts with copying files before understanding what those files are doing. That’s why we don’t touch any documentation until we’ve scoped the system and written a migration brief. Good preparation is what separates a clean migration from a chaotic one.

This is a sample scope of work.

A sample migration scope of work

Here’s how we scope every project before anything moves.

1. Start with a migration brief.

We begin by outlining what is being migrated, what will stay, and where everything is going. This includes existing documentation types, planned structure, tools being introduced or retired, and who owns each section.

If the client hasn’t defined a destination structure yet, we stop here. A migration without a structure plan is a content dump.

In one of our projects, we worked with Mintlify to create a shared brief that listed known issues, grouped documents into core product areas, and assigned reviewers for each group. That one document guided the entire process and made it easier to track progress and catch blind spots.

2. Audit the current documentation system.

We then run a full audit of what exists by digging into the backend folders, searching metadata, and version histories. The goal is to identify:

  • What types of content are being used (guides, references, changelogs, etc)
  • Where duplication exists across the site
  • Which pages are outdated or tied to deprecated features
  • Broken links, orphaned content, and structure gaps

A sample of a documentation audit result

This audit helps us surface problems early and avoid porting issues into the new system.

3. Map the current structure to the target structure.

Next, we create a side-by-side map of what exists versus what we’re building. This helps prevent folder-dump syndrome, where files are technically moved, but nothing makes sense in context. We color-code each section based on action: reuse, rewrite, deprecate, or reorganize. That way, everyone on the team knows what needs work and what can be moved as is.

Before and after structuring docs folders

4. Identify custom components and formatting quirks.

Many clients use proprietary components, like custom callouts, tab groups, or nested collapsibles. These don’t always carry over cleanly to the new platform.

We flag every custom component used in the old system and decide whether it needs to be replaced, rewritten, or rebuilt. This also applies to styling, link formats, and anchor behavior.

In some migrations, we’ve had to write conversion scripts to avoid hundreds of manual edits. The earlier we identify these quirks, the more reliable the handoff will be.

There are cases where clients didn’t send all their docs, and we had to write custom scripts (like the one below) to scrape them ourselves. This gives us control, especially when content is behind dynamic rendering or stored in irregular containers:

URLS = [
        'https://docs',
        'https://docs'
    ]
    
    DOMAIN = 'docs.example.com'
    VISITED = set()
    
    markdown_converter = html2text.HTML2Text()
    markdown_converter.ignore_links = False
    markdown_converter.ignore_images = True
    markdown_converter.body_width = 0
    
    def setup_driver():
        chrome_options = Options()
        chrome_options.add_argument('--headless')
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--disable-dev-shm-usage')
        service = Service(ChromeDriverManager().install())
        return webdriver.Chrome(service=service, options=chrome_options)
    
    def get_content(url):
        try:
            driver = setup_driver()
            print(f"Loading {url}")
            driver.get(url)
            
            wait = WebDriverWait(driver, 10)
            main = wait.until(EC.presence_of_element_located((By.TAG_NAME, "main")))
            time.sleep(2)
            
            page_source = driver.page_source
            soup = BeautifulSoup(page_source, 'html.parser')
            
            main = soup.find('main')
            if not main:
                main = soup.find('article')
            if not main:
                main = soup.find('div', class_='docMainContainer_gTbr')
            if not main:
                main = soup.find('div', class_='main-content')
                
            if not main:
                print(f"Could not find main content container for {url}")
                driver.quit()
                return None
                
            driver.quit()
            return main
        except Exception as e:
            print(f"Error processing {url}: {e}")
            if 'driver' in locals():
                driver.quit()
            return None
    
    def save_markdown(url, main_html):
        if not main_html:
            print(f"No content to save for {url}")
            return
    
        parsed = urlparse(url)
        path_parts = parsed.path.strip('/').split('/')
        base_dir = 'docs_outputs'
        current_dir = base_dir
    
        for part in path_parts[:-1]:
            current_dir = os.path.join(current_dir, part)
            os.makedirs(current_dir, exist_ok=True)
    
        filename = path_parts[-1] + '.md'
        filepath = os.path.join(current_dir, filename)
    
        try:
            markdown = markdown_converter.handle(str(main_html))
            with open(filepath, 'w', encoding='utf-8') as f:
                f.write(markdown)
            print(f"Saved {filepath}")
        except Exception as e:
            print(f"Failed to save {filepath}: {e}")
    
    def process_url(url):
        if url in VISITED:
            return
        print(f"Processing {url}")
        VISITED.add(url)
        main = get_content(url)
        if main:
            save_markdown(url, main)
    
    if __name__ == '__main__':
        for url in URLS:
            process_url(url)

Alternatively, we adopt the Mintlify scraping script when the docs are publicly accessible and follow a clean structure. This is faster to run and easier to plug into the Mintlify project.

Use custom scripts when the export is incomplete or the site is dynamic. Use Mintlify’s script when you need speed and structure that already aligns with expectations.

5. Assign owners for each section.

Finally, every section gets an owner and a technical reviewer. These are individuals who are well-versed in the product area and can identify what doesn’t make sense. Without this layer of review, it's easy to migrate docs that look polished but don’t help the reader.

Mapping existing documentation: methods and judgment

Before a migration begins, we need to know exactly what we’re migrating. That sounds obvious, but most teams don’t have a complete view of their documentation. Some assume their exported files or scraped content are enough. They’re not.

Scraping alone doesn’t reveal the most important pages. It doesn’t surface broken images, incorrect parameters, or poorly rendered components. Moreover, it doesn’t help you decide what belongs in the new system. That takes experience.

In every project, we treat this as a discovery phase. Some pages live on odd URLs, others hide behind subdomains, and many endpoints don’t have public-facing docs yet.

So we crawl the system ourselves. Most of the time, we write Python scripts (like the example below) that scan the sitemap, count paths, extract endpoints from OpenAPI specifications, and flag unusual directories. This provides us with a working inventory of what exists (such as pages, assets, dead links, and hidden content), allowing us to track what we’re dealing with.

import requests
    from urllib.parse import urljoin
    from bs4 import BeautifulSoup
    
    def crawl_docs(base_url, max_pages=50):
        visited = set()
        to_visit = {base_url}
        doc_links = []
    
        while to_visit and len(visited) < max_pages:
            url = to_visit.pop()
            try:
                response = requests.get(url, timeout=5)
                if response.status_code != 200:
                    print(f"⚠️ Broken: {url} (HTTP {response.status_code})")
                    continue
    
                soup = BeautifulSoup(response.text, 'html.parser')
                links = [a.get('href') for a in soup.find_all('a') if a.get('href')]
                
                for link in links:
                    absolute_link = urljoin(base_url, link)
                    if absolute_link.startswith(base_url) and absolute_link not in visited:
                        if "/docs/" in absolute_link:  # or other criteria
                            doc_links.append(absolute_link)
                        to_visit.add(absolute_link)
                
                visited.add(url)
            except Exception as e:
                print(f"🚨 Failed to crawl {url}: {str(e)}")
        
        return doc_links

Note: This is a minimal example. Production crawlers should handle authentication, rate limits, and export structured data (e.g., CSV with page metadata).

Some teams have content split across CMSs (like WordPress), markdown files, and HTML fragments from old teams. Each format behaves differently and breaks in different ways.

We treat each source as its own record. We scan them individually, resolve overlaps, and unify the structure. Sometimes we rewrite entire sections. At other times, we stitch pieces together, refine the formatting, and reevaluate the navigation.

What you should know:

  • Scrapes and exports only give you part of the picture. Always run your crawls.
  • Build basic tools to map pages and spot gaps.
  • Don’t skip manual review. Accuracy and quality depend on it.
  • Organize content in a way that mirrors site navigation for long-term maintenance.
  • This stage involves understanding what exists, what remains usable, and what doesn’t belong.

Not everything will survive migration. Outdated or invalid OpenAPI specs are common. When we encounter them, we:

  • Request a corrected version from the client, or
  • Exclude broken sections (with warnings), or
  • Reverse-engineer them from live API behavior.

Some docs rely on components that don’t have a Mintlify equivalent. In those cases, we create custom MDX components to replicate the intended structure or interaction. This usually happens with tabs, collapsible sections, or interactive code blocks.

For broken links, we run Mintlify’s built-in link checker to catch them before handoff. If the link points to a deleted or unreachable resource, we flag it. Then either remove it, suggest a replacement, or ask the client if it should be restored.

If the page throws a 404, we strip the link and document the change in our feedback log.

Step-by-step documentation migration workflow

Once structure and scope are finalized, we move into execution. This phase is predictable and repeatable, but it’s also where many migrations go off track if rushed or poorly scoped. Our workflow is designed to reduce rework and surface edge cases early.

This is a summary of how we triage documentation during migration:

Documentation for triage during the migration process.

Here’s how we do it:

1. Create the migration brief.

We start by writing a detailed brief. It includes the project goals, current export, target structure, branding requirements, edge cases (like media hosting or component exceptions), and timeline. We note whether to retain existing paths, and how the docs should be deployed and handed off.

This is where we also flag if the client expects to keep the docs live during migration. If so, we advise them to freeze deployments to prevent drift.

2. Set up GitHub and documentation environments.

We provision a clean GitHub repo and a new Mintlify project. If the client already has a repo, we either fork or duplicate it with a clear environment for staging. This setup becomes the ground truth for the entire migration. It’s where all edits, reviews, and handoffs happen.

3. Apply styles, fonts, and branding.

Before moving any content, we replicate the visual identity, including colors, fonts, spacing, logos, and link styles. We also configure metadata, like favicons, social previews, and canonical URLs. This step prevents design rework at the end and gives stakeholders a near-final preview early on.

4. Move the content manually.

We never bulk-import without checking. Every page is migrated by hand. This includes applying formatting rules, rewriting component markup, fixing broken links, standardizing metadata, and flagging any outdated content.

Sometimes we discover that the exported docs are incomplete. In such cases, we use scripts to crawl the live site and patch any missing content before proceeding.

5. Review every page manually.

Each page undergoes a QA review by a senior technical writer and reviewer. We verify structure, clarity, link accuracy, and Markdown/MDX formatting. We rely more on deep reading and experience here than linters or automation.

6. Handle client feedback quickly.

Feedback is often minor, typically involving typos, missed internal links, and minor copy changes. We log every edit, triage it immediately, and either implement or clarify the changes. If a decision was made during migration (e.g., splitting a long page), we explain the reasoning and leave it reversible.

7. Transfer project ownership.

Once approved, we either export the project or transfer GitHub ownership. We also ensure that the client’s team understands how the environment is structured, including folder paths, deployment workflow, updating pages, and handling new additions.

What defines a successful documentation migration?

A migration is only complete when the new documentation is accurate, structured, visually clean, and easy to maintain.

What defines a successful documentation migration

Here’s what defines success for us:

  • The content has been fully migrated and is now live. All pages have been moved, broken links have been fixed, components are working, and nothing is left behind.
  • The new docs are fast to navigate. We look at how quickly someone can find an answer. That’s our internal metric.
  • The docs look good and feel cohesive. Fonts, colors, spacing, and layout all match the client’s brand. But beyond visuals, the structure is clear and consistent.
  • Clients are ready to use and maintain the system. We deliver the working environment, explain how it’s structured, and make sure the team knows how to update their docs going forward.

Common mistakes teams make when migrating documentation

We’ve seen teams run into the same traps during documentation migrations. Here are the most common mistakes and how we’ve learned to avoid them.

Editing production docs during migration

Some teams continue to publish new docs while the migration is underway. This creates a version mismatch, forcing us to redo pieces that have already been migrated. We always request a freeze before we begin. In cases where a total freeze isn’t possible, we must freeze chunks of the documents and update the new version before total delivery.

Ignoring hierarchy and UX

A structure that makes sense to the product team may confuse developers. We’ve had to pause migrations when the client’s proposed structure led to dead ends or buried key guides. We now review and, when needed, restructure early.

Leaving technical writers out of structure decisions

It’s tempting to treat migration as a copy-paste job. But without experienced technical writers involved, issues like redundancy, broken flows, or improper page grouping often go unnoticed.

Relying too much on automation

Automated tools can extract content, but they can’t fix broken formatting, nested components, or outdated specs. We manually validate every page and line to make sure the result holds up in production.

What's next?

We’ve shared how we scope, plan, and execute migrations. You’ve seen how we handle edge cases, such as broken specifications and missing components, and how we make informed decisions about what to rewrite, retire, or reorganize.

But if there’s one thing we’ve learned from every project, it’s that migrations succeed because of process. So if you're planning a documentation migration:

  • Start by understanding what your documentation does, where it breaks, and what it should become.
  • Build a structure that mirrors how users think.
  • Treat maintenance as part of the design. Future contributors should know where things live and why.

If you want to avoid the headache of an unplanned migration, we’re here to help. We’ve done this for teams migrating from Docusaurus, ReadMe, WordPress, custom CMS setups, and markdown archives, each with its quirks and blockers.

We partner with Mintlify to help teams ship documentation that is not only beautiful but usable, scalable, and structured for both humans and AI systems.

If you’re ready to scope your migration, reach out. Or use this guide as your checklist and start mapping what your documentation needs to become.


About the author

Henry Bassey spearheads Content Strategy and Marketing Operations at Hackmamba. He holds an MBA from the prestigious Quantic School of Business and Technology with a solid technical background. A strong advocate for innovation and thought leadership, his commitment permeates every content he handles for clients at Hackmamba.

logo
Get insights and tips on technical Content creation

Augment your marketing and product team to publish original technical content written by subject-matter experts for developers.

+1 (302) 703 7275

hi@hackmamba.io

Copyright © 2025 Hackmamba. All rights reserved