John Siciliano Published 4/1/2024 Updated 4/9/2024 Some of my content contains affiliate links.

Contentful Data Import: Gotchas, Best Practices, Helpful Tips

Everything I wish I knew before creating my Contentful migration scripts. Hope this helps you migrate your data to Contentful!

Introduction

Trying to migrate data from the CMS you outgrew to Contentful?

I migrated my content from Webflow to Contentful and documented the tricky areas.

I will share with you the gotchas (how I got around them), tips, and other insights that can assist you in creating and running your Contentful migration.

Let Your Engineers Focus on Engineering

Hire me to migrate your data to Contentful.

You can migrate your data to Contentful faster and launch your application faster.

You do not need your engineers to learn a one-time task and divert their time away from getting your new app launch-ready.

Nomenclature: Migration Not The Same As Import

When you're Googling and ChatGPTing, know there is a difference between migration and import.

Contentful Migration – Modifying existing Contentful data into a new structure (such as combining two fields).

Contentful Import Data – The process of adding data from an external source.

That said, I'll refer to extracting data from the source, transforming it, and loading it into Contentful as a migration. Technically speaking, Contentful is only aware of the import step, which is why they refer to it as that.

Two Ways to Import Data

JSON File

Format your import-ready data into a JSON file
Use the Contentful CLI to import the JSON file

I did not use this method. I like using API requests to handle my end-to-end migration. This means I retrieve the data from the source and load it in one go with APIs.

Exporting to a JSON file adds another step and doesn't fit well into my workflow.

Contentful Management API

Use the Content Management API, which is built for modifying content (as opposed to the Content Delivery API, which is high-performance read-only).
Use a Contentful Management API token
Call the API using a client library, sending it each entry or asset ready to go

I found it helpful to do a Contentful Space export using the Contentful CLI export tool to get sample JSON data so you know how to format your API requests or JSON files (depending on your import method). You can export entries and reference the exported data to ensure any transformations you make to incoming data will align with destination content models and the specific JSON structure the destination Space expects.

Can't Directly Call API After Creating Client

Usually, when you create a client, you can call the API.

However, when creating the client, the only parameter passed is the Contentful Management API token.

Once you create the client, you need to make two more calls:

Get the Contentful Space – Unfortunately, it appears you can't pass in the Contentful Space ID when creating the client, so you have to pass it in the `client.getSpace()` method to get an instance of the Contentful Space.
Get the Environment – Once you obtain the Contentful Space, you must then get the environment within the target Space.

Then, you use the environment instance to add entries to your Contentful Space.

No Create Or Update If Exists?

No migration will be perfect the first time. It's an interactive process.

This means you'll import the same record over and over.

But when importing content with the same ID, you'll get an error.

This is where a "Create or Update" function/method would be great, i.e., create it if it doesn't exist and update it if it does.

Apparently, in the .NET SDK, you can call `createOrUpdateEntry`, but I couldn't find this method in other documents.

This poses another problem...

API Docs Aren't Easy to Read/Not Up To Date

I have qualms with the Contentful API docs.

Qualm 1: The doc pages don't show you any code. You must click on a method to see its code in a flyout.
Qualm 2: The UX of the flyout is poor. The code is contained in a small box with an overflow scroll. Finding the request and response examples is annoying, as scrolling is difficult when scrollable boxes are nested in other scrollable boxes.
Qualm 3: By putting the code in hidden flyouts, it's not easy to search the page.
Qualm 4: I use JavaScript, and the docs show the old CommonJS approach and not ESM support. However, I can use ESM in my scripts, which means the docs are outdated.

I suppose I need to finish with a suggestion and not just complaints...

Take the docs with a grain of salt. They may have what you are looking for, and it's either hidden or not documented.

Default Asset/Media Model Can't Be Extended

Bummer!

However, the default fields include title and description, which can act as alt text and captions.

Default and only Contentful media fields

If you need to add fields to media, create a new content model with your extra fields.

Content Type Must Be Sent In Asset Uploads

When uploading an asset, you can't just provide the URL to the image. You must also specify the content type. I find this an odd step, as Contentful should be able to infer the asset type.

Here's what I do to get the content type:

private getAssetContentType = async (url: string) => {
    try {
        const response = await fetch(url, { method: 'HEAD' });
        const contentType = response.headers.get('content-type');
        return contentType;
    }
    catch (error) {
        console.error('Error fetching image content type:', error);
        throw new Error;
    }
}

Then, when I prepare my asset object with this code:

private toImageObject = async (assetUrl: string, title: string = '', description: string = ''): Promise<Asset> => {
    const contentType = await this.getAssetContentType(assetUrl);
    return {
        title: this.addLocales(title),
        description: this.addLocales(description),
        file: this.addLocales({
            contentType: contentType,
            upload: assetUrl,
            fileName: getFileNameFromUrl(assetUrl),
        }),
    };
}

Can't Import HTML!

This was a shock!

There is no way of mapping HTML directly to the Contentful Rich Text field.

The Rich Text field contains some sort of custom AST (Abstract Syntax Tree), and no library can convert directly to it.

There is a library that converts markdown text to rich text.

So many migrations will call for HTML -> Markdown -> Rich Text.

Have fun!

Here's roughly my code that is handling this (I stripped out some things that aren't relevant to this demo):

if (fieldType === 'richText') {
    const html = this.incomingFieldValue;
    const markdown = NodeHtmlMarkdown.translate(html);

    // The callback is for any nodes that aren't supported by the library.
    // @see https://github.com/contentful/rich-text/tree/master/packages/rich-text-from-markdown#advanced
    const richText = await richTextFromMarkdown(markdown, async (node) => {
        if (node.type === 'image') {
            return await this.imageToRichTextReference(node);
        }
        return null;
    });

    this.output = richText
}

Beyond the nuance of converting HTML twice, the rich text library has an additional nuance: it strips out images.

There is a callback function that lets you custom-handle stripped-out nodes. I use this callback to upload the images and add a reference/link.

Images Need to Be Processed After Upload

Here's what it looks like if you don't process images:

Image failed to load due to no processing

Here's what my (stripped-down) code looks like:

try {
    const uploadResponse = await this.environment.createAsset({ fields: data });
    await this.runContentfulProcess(uploadResponse);
}

Because I provide CMS migration services, I'm familiar with the "normal" way of doing things.

Processing images after upload is a first for me, especially when you provide the image URL.

This is the process according to Contetnful:

Creating an asset, which is an entry for the media file.
Processing an asset, Contentful downloads the media file from the URL supplied and processes it.
Publishing an asset, making the entry and media file publicly available.

The first and second items should be combined IMO.

The only time there should be a second step is if you need to provide a file from your local system and not a URL (this is the method used in Hygraph migrations, which is one of my two favorite enterprise CMSs).

All Field values Need To Be Wrapped With Locale(s)

Every field value, including files, needs to be the child of a locale (even if there is only one locale on the site).

It would be nice to have a default locale that automatically gets set for incoming data that does not have a set locale.

Happy migrating! And hire me to do your migration :)