John Siciliano Published 3/3/2024 Updated 4/9/2024 Some of my content contains affiliate links.

Creating a Hygraph Migration: Gotchas and Helpful Tips

Everything I wish I knew before creating my Hygraph migration scripts. Hope this helps you migrate your data to Hygraph!

Introduction

I migrated my 8 collections and 1,138 records from Webflow to Hygraph.

I'm going to share with you the gotchas (how I got around them), tips, and other insights that can assist you in creating and running your Hygraph migration.

Offload This To Me

Want your data migrated to your project 10x faster?

I would love to migrate your data into Hygraph.

Hire me.

HTML To Slate AST

Let's dive into the most "gotcha of gotchas" I found: converting HTML to Hygraph's expected format called Slate AST.

Hygraph uses a special flavor for this format so you must use their HTML to Slate AST converter.

Images to Links

In the converter, they look for the incoming element type. Then they conditionally convert the element based on whether it's an image, h2, link, etc.

In the case of images, they do the unexpected: convert the images to links.

Without any workarounds, your rich text will arrive with links pointing to the images instead of displaying them.

The code that handles this is in the index.ts file around this line and looks like this:

IMG: el => {
    const href = el.getAttribute('src');
    const title = Boolean(el.getAttribute('alt'))
        ? el.getAttribute('alt')
        : Boolean(el.getAttribute('title'))
            ? el.getAttribute('title')
            : '(Image)';
    if (href === null) return {};
    return {
        type: 'link',
        href: sanitizeUrl(href),
        title,
        openInNewTab: true,
    };
}

Explanation: "When I see an image tag, get the URL to the image and alt text (if it exists) and return a link with that information".

This needs a workaround. So here's how I did it:

Convert HTML to Slate AST
Loop over every node in the AST
Check if there's a link in it and if the link points to your images. I created a function to check if the URL matched my image CDN. If it didn't match then the link was just a regular content link.
I reverse-engineered a Slate AST image object and replaced the link object with the image object. Note, you need to upload the image and get the ID back for use in the image object. This is because images have a corresponding record and you reference that record in your AST.

I created a lot of code to combat this issue, but I'll highlight the important part.

// The Slate converter turns images into links, so we need to convert them back.
convertLinksToImagesInSlateAst = (slateAst: any[]) => {
    return slateAst.map((node) => {
        if (node.type === 'link' && this.isImageLink(node)) {
            node = {
                type: 'embed',
                nodeType: 'Asset',
                nodeId: 'add node id here',
                children: [{
                    text: ''
                }],
            };
        }

        return node;
    });
}

Figures

I migrated my content from Webflow which makes use of figure and figure captions.

I'm a fan of putting a little caption below my images to add some context.

Unfortunately, figures are not handled in the conversion. Therefore, I had to use some workarounds to find the image captions and load them into my custom image caption field defined in my asset model schema.

There are no custom hooks that let you modify how the conversion process goes down, so I found it best to modify the AST manually after the conversion is done. If there is any information that gets stripped out, then it's best to "save" that information to a field that doesn't get stripped, like alt text.

I made use of the alt text field to JSON.stringify() objects and stored them in the alt text field for later processing.

Note: It was important in my script to separate the migration process into three distinctive parts: extract, transform, load. Without this requirement, it may be easier to load the caption. But I built the script to be robust and handle different scenarios... like importing your data. Hire me.

Create or Upsert

Creating migrations is an iterative process.

Therefore, you're going to import the same record over and over again.

When you use the Create mutation, it'll throw an error when you try importing the same record twice (which is known because you'll likely have a field that enforces a unique value).

You could use the Upsert mutation which will update it if it exists and insert/create it if it doesn't.

I, however, did not use this because I was a fan of DX surrounding it. It seems from the Upsert docs that you have to basically send in both the create and update mutations. I would prefer if you could send them one mutation with the command upsert and they handle the logic of updating or inserting.

I believe I would disable the unique field validation to get around this just for test purposes.

Disabled set field as unique for migrations

Enumeration Fields Are Alphanumeric

I wanted to use enum fields for things like blog categories but found my categories would have to be alphanumeric (no spaces, no dashes, etc).

So reference fields are the place to create your categories and tags.

Plus references will let you store more information than just the category name such as a description.

API Rate Limiting

As with any platform, you can't send all your requests at once, you must throttle them to not be above the limits.

Here's word-for-word their rate limits:

Cached requests are not rate limited. Uncached requests that fetch content from the database are rate limited on the different tiers by default, as follows:

Community: 5req/sec
Professional: 25req/sec
Scale: 35req/sec
Enterprise: >50req/sec

UI Doesn't Always Update

"Why isn't my data showing up?!?!?" - Me

Sometimes, after running a migration and refreshing the dashboard, no new data would show.

I would try:

Refreshing the page
Using the reload content button
And banging my head against a wall

What works is searching for anything. Right when you search, some cache clear request runs, and your data will appear.

So just search like "a", wait for it to execute, then backspace, and all new data will show.

Refreshing the data in Hygraph by searching

Mutations With RichText Must Use Vars

GraphQL insert/create requests can have a key and value. For example: name: "John".

Additionally, you can use variables. For example: name: $name and pass the name in as a variable and even cast the type like $name: String! (exclamation means it's required).

In many of my mutations/requests, I would just use template literals like this: name: `${name}` and wouldn't pass variables in as parameters and just add the data directly.

However, with rich text, I would get errors that didn't really make sense but were resolved by passing the content as a variable and casting it as RichTextAST!.

Mutations With Multiple References Use Special Type

Another gotcha with mutations is passing in multi-reference values (honestly I'm not 100% sure it's only multireference or single reference as well, but I got in the habit of doing the following for any references in my models).

You must pass in the value as a variable, but more specifically, it needs to be cast as a very specific type that looks like this:

[MyReferencedModelIdWhereUniqueInput!]

Change out "MyReferencedModelId" with... the model ID of your reference, and boom, goodbye errors that are difficult to understand.

References Can Use Any Field (Not Just ID)

Here's an awesome trait of the platform: when referencing another record, you don't need to use the ID, you can use any field that enforces unique input.

This makes creating the connection a breeze.

In my Webflow migration, all the references in the source data used IDs. If Hygraph didn't support references to other fields, such as my slug field, I'd have to import the random IDs from Webflow. No thanks.

Published At, Created At, Updated At Date/Times

I was bummed to discover that it doesn't appear that you can set the value of system field dates in the schema such as:

Published At (#publishedAt)
Created At (#createdAt)
Updated At (#updatedAt)

When migrating content where dates matter, like blogs, it's critical to keep the dates from the source.

Unfortunately, it appears you have to create custom date/time fields (I used a component). But with any custom date fields, you're stuck with manually updating them unless there is a feature that lets you set triggers and actions such as when I update this content, set my custom update field to the current date.

Querying Content

"Why isn't my content showing up in my queries?!?" - Me

I occasionally would run queries to see how my content was stored.

When running a query, you must specify the stage the content is in. My new content was loaded as a draft, therefore to see the data I'd need to run:

{
  myProduct(
    where: { id: "ckdt47uio02al01044grc4ehf", stage: DRAFT }
  ) {
    id
  }
}

The content will fail to show in the response unless you specify that you are looking for draft content!

Alternatively, if you publish content then change out the content stage in your query.

Follow Hygraph's Own Migration Guide

When creating a new migration, I highly recommend checking out Hygraph's guide. My article you are reading supplements it but their guide is a step-by-step walkthrough.

Their high-level process is this:

Understand your current data model (data, schema, relations)
Understand Hygraph's capabilities and create new models and schemas, potentially altering your existing schemas to take advantage of the capabilities of Hygraph. You can use the Management SDK and UI to accomplish this.
Create a migration that extracts, transforms, and loads your data into Hygraph. To load data you'll need to use the Asset Upload API and the Content API.

Happy migrating! And hire me to do your migration :)