Export, Format, & Import Elasticsearch Data

Cover Image for Export, Format, & Import Elasticsearch Data

Leveraging production grade Elasticsearch data in a localized environment mitigates risks associated with testing queries and reindexing. Despite the use case, bulk exporting from Elasticsearch isn't as straightforward as one might expect. This post expands upon a reasonable approach to exporting Elasticsearch data, formatting the output, and importing the result into a local Elasticsearch Docker instance.

Exporting Elasticsearch data

Dejavu, "The missing Web UI for Elasticsearch," provides easy to use Elasticsearch data export functionality. Depending upon the sensitivity of your data, you can use the public site or check out the contributing guidelines to run the app locally.

Once connected to your target Elasticsearch instance, export as JSON to download your data.

Formatting Exported Data for Import

After downloading a dump of your targeted Elasticsearch instance's data, we need to format the data for import into our local Elasticsearch Docker instance.

As the JSON format exported from Dejavu may not be in the correct JSON format, use module.exports to expose the Elasticsearch data as an array of objects and update the file extension to .js.

dejavuExport.js

modules.exports = [
  { ... Dejavu Export }
]

Then, use this script, or an alteration thereof, to format the Dejavu export into a data binary consumable by Elasticsearch.

const fs = require('fs')
const dejavuExport = require('./dejavuExport)

async function appendRecord(record, index) {
  const recordIndex = `{"index":{"_id":"${ index }"}}`
  const input = `${ recordIndex }\n${ JSON.stringify(record) }\n`

  return new Promise((resolve, reject) => {
    fs.appendFile('./output.json', input, (err) => {
      if (err) {
        reject(err)
      }

      resolve()
    })
  })
}

async function formatBinary() {
  dejavuExport.reduce(async (prevPromise, record, index) => {
    await prevPromise()
    return appendRecord(record, index)
  }, Promise.resolve())
}

formatBinary()

Running Elasticsearch with Docker

First, pull the desired Elasticsearch version. Here's a list of the available ES versions.

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.0.0

Then, run it

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.0.0

And create a new index

curl -X PUT "localhost:9200/__my_new_index__?pretty"
curl -X GET "localhost:9200/_cat/indices?v"

Bulk Import Data into Elasticsearch

curl -H "Content-Type: application/json" -XPOST "localhost:9200/__my_new_index__/_bulk?pretty&refresh" --data-binary @output.json

Use the Elasticsearch Bulk API to import the formatted output.json into your local Docker Elasticsearch instance. After running the following, your local Docker Elasticsearch instance's new index should contain all of your previously exported data.