Skip to content

sanity-io/import

Repository files navigation

@sanity/import

Imports documents from an ndjson-stream to a Sanity dataset

Requirements

  • Node.js >= 20.19.1 (or >= 22.12 for Node 22)

Installing

npm install --save @sanity/import

Usage

import fs from 'node:fs'
import {createClient} from '@sanity/client'
import {sanityImport} from '@sanity/import'

const client = createClient({
  projectId: '<your project id>',
  dataset: '<your target dataset>',
  token: '<token-with-write-perms>',
  useCdn: false,
})

// Input can either be a readable stream (for a `.tar.gz` or `.ndjson` file), a folder location (string), or an array of documents
const input = fs.createReadStream('my-documents.ndjson')

const options = {
  /**
   * A Sanity client instance, preconfigured with the project ID and dataset
   * you want to import data to, and with a token that has write access.
   */
  client: client,

  /**
   * Which mutation type to use for creating documents:
   * `create` (default)  - throws error if document IDs already exists
   * `createOrReplace`   - replaces documents with same IDs
   * `createIfNotExists` - skips document with IDs that already exists
   *
   * Optional.
   */
  operation: 'create',

  /**
   * Function called when making progress. Gets called with an object of
   * the following shape:
   * `step` (string) - the current step name of the import process
   * `current` (number) - the current progress of the step, only present on some steps
   * `total` (number) - total items before complete, only present on some steps
   */
  onProgress: (progress) => {
    /* report progress */
  },

  /**
   * Whether or not to allow assets in different datasets. This is usually
   * an error in the export, where asset documents are part of the export.
   *
   * Optional, defaults to `false`.
   */
  allowAssetsInDifferentDataset: false,

  /**
   * Whether or not to allow failing assets due to download/upload errors.
   *
   * Optional, defaults to `false`.
   */
  allowFailingAssets: false,

  /**
   * Whether or not to replace any existing assets with the same hash.
   * Setting this to `true` will regenerate image metadata on the server,
   * but slows down the import.
   *
   * Optional, defaults to `false`.
   */
  replaceAssets: false,

  /**
   * Whether or not to skip cross-dataset references. This may be required
   * when importing a dataset with cross-dataset references to a different
   * project, unless a dataset with the referenced name exists.
   *
   * Optional, defaults to `false`.
   */
  skipCrossDatasetReferences: false,

  /**
   * Whether or not to import system documents (like permissions, custom retention, and content releases).
   * This is usually not necessary, and may cause conflicts if the target dataset
   * already contains these documents. On a new dataset, it is recommended that roles are re-created
   * manually, and that any custom retention policies are re-created manually.
   *
   * Optional, defaults to `false`.
   */
  allowSystemDocuments: false,
}

sanityImport(input, options)
  .then(({numDocs, warnings}) => {
    console.log('Imported %d documents', numDocs)
    // Note: There might be warnings! Check `warnings`
  })
  .catch((err) => {
    console.error('Import failed: %s', err.message)
  })

CLI-tool

This functionality is built in to the sanity package as sanity dataset import, but is also usable through the sanity-import CLI tool, part of this package:

$ sanity-import --help

Import documents to a Sanity dataset

USAGE
  $ sanity-import  SOURCE -p <value> -d <value> [-t <value>]
    [--replace | --missing] [--allow-failing-assets]
    [--allow-assets-in-different-dataset] [--replace-assets]
    [--skip-cross-dataset-references] [--allow-system-documents]
    [--asset-concurrency <value>]

ARGUMENTS
  SOURCE  Source file (use "-" for stdin)

FLAGS
  -d, --dataset=<value>                    (required) Dataset to import to
  -p, --project=<value>                    (required) Project ID to import to
  -t, --token=<value>                      Token to authenticate with
      --allow-assets-in-different-dataset  Allow asset documents to reference
                                           different project/dataset
      --allow-failing-assets               Skip assets that cannot be
                                           fetched/uploaded
      --allow-system-documents             Imports system documents
      --asset-concurrency=<value>          Number of parallel asset imports
      --missing                            Skip documents that already exist
      --replace                            Replace documents with the same IDs
      --replace-assets                     Skip reuse of existing assets
      --skip-cross-dataset-references      Skips references to other datasets

DESCRIPTION
  Import documents to a Sanity dataset

EXAMPLES
  Import "./my-dataset.ndjson" into dataset "staging"

    $ sanity-import  -p myPrOj -d staging -t someSecretToken \
      my-dataset.ndjson

  Import into dataset "test" from stdin, read token from env var

    cat my-dataset.ndjson | sanity-import  -p myPrOj -d test -

Environment variables (fallbacks for missing flags)
  --token = SANITY_IMPORT_TOKEN

Future improvements

  • When documents are imported, record which IDs are actually touched
    • Only upload assets for documents that are still within that window
    • Only strengthen references for documents that are within that window
    • Only count number of imported documents from within that window
  • Asset uploads and strengthening can be done in parallel, but we need a way to cancel the operations if one of the operations fail
  • Introduce retrying of asset uploads based on hash + indexing delay
  • Validate that dataset exists upon start
  • Reference verification
    • Create a set of all document IDs in import file
    • Create a set of all document IDs in references
    • Create a set of referenced ID that do not exist locally
    • Batch-wise, check if documents with missing IDs exist remotely
    • When all missing IDs have been cross-checked with the remote API (or a max of say 100 items have been found missing), reject with useful error message.

License

MIT-licensed. See LICENSE.

About

Imports documents from an ndjson-stream to a Sanity dataset

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Contributors 10