elasticsearch bulk indexing using python

12,203

pyelasticsearch supports bulk indexing:

bulk_index(index, doc_type, docs, id_field='id', parent_field='_parent'[, other kwargs listed below])

For example,

cities = []
for line in f:
    fields = line.rstrip().split("\t")
    city = { "id" : fields[0], "city" : fields[1] }
    cities.append(cities)
    if len(cities) == 1000:
        es.bulk_index(es_index, "city", cities, id_field="id")
        cities = []
if len(cities) > 0:
    es.bulk_index(es_index, "city", cities, id_field="id")
Share:
12,203
krisdigitx
Author by

krisdigitx

Updated on July 24, 2022

Comments

  • krisdigitx
    krisdigitx almost 2 years

    i am trying to index a csv file with 6M records to elasticsearch using python pyes module,the code reads a record line by line and pushes it to elasticsearch...any idea how i can send this as bulk?

    import csv
    from pyes import *
    import sys
    
    header = ['col1','col2','col3','col3', 'col4', 'col5', 'col6']
    
    conn = ES('xx.xx.xx.xx:9200')
    
    counter = 0
    
    for row in reader:
        #print len(row)
        if counter >= 0:
            if counter == 0:
                pass
            else:
                colnum = 0
                data = {}
                for j in row:
                    data[header[colnum]] = str(j)
                    colnum += 1
                print data
                print counter
                conn.index(data,'accidents-index',"accidents-type",counter)
        else:
            break
    
        counter += 1
    
  • Alexey Tigarev
    Alexey Tigarev over 8 years
    @krisdigitx I don't see why is this approach not good for 6M of data. Adjust the number of documents in chunk for best performance and you are fine. Bulks of 1000 documents each are a good starting point.
  • Soubriquet
    Soubriquet over 7 years
    What's the limit of number of documents on bulk? If I push it to 10,000, will bulk be able to handle that? If not, would it be able to adapt and break that 10,000 into chunks?