elasticsearch bulk indexing using python
12,203
pyelasticsearch supports bulk indexing:
bulk_index(index, doc_type, docs, id_field='id', parent_field='_parent'[, other kwargs listed below])
For example,
cities = []
for line in f:
fields = line.rstrip().split("\t")
city = { "id" : fields[0], "city" : fields[1] }
cities.append(cities)
if len(cities) == 1000:
es.bulk_index(es_index, "city", cities, id_field="id")
cities = []
if len(cities) > 0:
es.bulk_index(es_index, "city", cities, id_field="id")
Author by
krisdigitx
Updated on July 24, 2022Comments
-
krisdigitx almost 2 years
i am trying to index a csv file with 6M records to elasticsearch using python pyes module,the code reads a record line by line and pushes it to elasticsearch...any idea how i can send this as bulk?
import csv from pyes import * import sys header = ['col1','col2','col3','col3', 'col4', 'col5', 'col6'] conn = ES('xx.xx.xx.xx:9200') counter = 0 for row in reader: #print len(row) if counter >= 0: if counter == 0: pass else: colnum = 0 data = {} for j in row: data[header[colnum]] = str(j) colnum += 1 print data print counter conn.index(data,'accidents-index',"accidents-type",counter) else: break counter += 1
-
Alexey Tigarev over 8 years@krisdigitx I don't see why is this approach not good for 6M of data. Adjust the number of documents in chunk for best performance and you are fine. Bulks of 1000 documents each are a good starting point.
-
Soubriquet over 7 yearsWhat's the limit of number of documents on bulk? If I push it to 10,000, will bulk be able to handle that? If not, would it be able to adapt and break that 10,000 into chunks?