Leveraging production grade Elasticsearch data in a localized environment mitigates risks associated with testing queries and reindexing. Despite the use case, bulk exporting from Elasticsearch isn't as straightforward as one might expect. This post expands upon a reasonable approach to exporting Elasticsearch data, formatting the output, and importing the result into a local Elasticsearch Docker instance.
Exporting Elasticsearch data
Dejavu, "The missing Web UI for Elasticsearch," provides easy to use Elasticsearch data export functionality. Depending upon the sensitivity of your data, you can use the public site or check out the contributing guidelines to run the app locally.
Once connected to your target Elasticsearch instance, export as JSON to download your data.
Formatting Exported Data for Import
After downloading a dump of your targeted Elasticsearch instance's data, we need to format the data for import into our local Elasticsearch Docker instance.
As the JSON format exported from Dejavu may not be in the correct JSON format, use module.exports
to expose the Elasticsearch data as an array of objects and update the file extension to .js
.
dejavuExport.js
modules.exports = [
{ ... Dejavu Export }
]
Then, use this script, or an alteration thereof, to format the Dejavu export into a data binary consumable by Elasticsearch.
const fs = require('fs')
const dejavuExport = require('./dejavuExport)
async function appendRecord(record, index) {
const recordIndex = `{"index":{"_id":"${ index }"}}`
const input = `${ recordIndex }\n${ JSON.stringify(record) }\n`
return new Promise((resolve, reject) => {
fs.appendFile('./output.json', input, (err) => {
if (err) {
reject(err)
}
resolve()
})
})
}
async function formatBinary() {
dejavuExport.reduce(async (prevPromise, record, index) => {
await prevPromise()
return appendRecord(record, index)
}, Promise.resolve())
}
formatBinary()
Running Elasticsearch with Docker
First, pull the desired Elasticsearch version. Here's a list of the available ES versions.
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.0.0
Then, run it
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.0.0
And create a new index
curl -X PUT "localhost:9200/__my_new_index__?pretty"
curl -X GET "localhost:9200/_cat/indices?v"
Bulk Import Data into Elasticsearch
curl -H "Content-Type: application/json" -XPOST "localhost:9200/__my_new_index__/_bulk?pretty&refresh" --data-binary @output.json
Use the Elasticsearch Bulk API to import the formatted output.json into your local Docker Elasticsearch instance. After running the following, your local Docker Elasticsearch instance's new index should contain all of your previously exported data.