elasticsearch bulk indexing using python

python linux elasticsearch

12,203

pyelasticsearch supports bulk indexing:

bulk_index(index, doc_type, docs, id_field='id', parent_field='_parent'[, other kwargs listed below])

For example,

cities = []
for line in f:
    fields = line.rstrip().split("\t")
    city = { "id" : fields[0], "city" : fields[1] }
    cities.append(cities)
    if len(cities) == 1000:
        es.bulk_index(es_index, "city", cities, id_field="id")
        cities = []
if len(cities) > 0:
    es.bulk_index(es_index, "city", cities, id_field="id")

12,203

Author by

krisdigitx

Updated on July 24, 2022

Comments

krisdigitx almost 2 years

i am trying to index a csv file with 6M records to elasticsearch using python pyes module,the code reads a record line by line and pushes it to elasticsearch...any idea how i can send this as bulk?

import csv
from pyes import *
import sys

header = ['col1','col2','col3','col3', 'col4', 'col5', 'col6']

conn = ES('xx.xx.xx.xx:9200')

counter = 0

for row in reader:
    #print len(row)
    if counter >= 0:
        if counter == 0:
            pass
        else:
            colnum = 0
            data = {}
            for j in row:
                data[header[colnum]] = str(j)
                colnum += 1
            print data
            print counter
            conn.index(data,'accidents-index',"accidents-type",counter)
    else:
        break

    counter += 1

Alexey Tigarev over 8 years

@krisdigitx I don't see why is this approach not good for 6M of data. Adjust the number of documents in chunk for best performance and you are fine. Bulks of 1000 documents each are a good starting point.
Soubriquet over 7 years

What's the limit of number of documents on bulk? If I push it to 10,000, will bulk be able to handle that? If not, would it be able to adapt and break that 10,000 into chunks?