Export csv file from scrapy (not via command line)

16,689

Solution 1

Why not use an item pipeline?

WriteToCsv.py

   import csv
   from YOUR_PROJECT_NAME_HERE import settings

   def write_to_csv(item):
       writer = csv.writer(open(settings.csv_file_path, 'a'), lineterminator='\n')
       writer.writerow([item[key] for key in item.keys()])

   class WriteToCsv(object):
        def process_item(self, item, spider):
            write_to_csv(item)
            return item

settings.py

   ITEM_PIPELINES = { 'project.pipelines_path.WriteToCsv.WriteToCsv' : A_NUMBER_HIGHER_THAN_ALL_OTHER_PIPELINES}
   csv_file_path = PATH_TO_CSV

If you wanted items to be written to separate csv for separate spiders you could give your spider a CSV_PATH field. Then in your pipeline use your spiders field instead of path from setttigs.

This works I tested it in my project.

HTH

http://doc.scrapy.org/en/latest/topics/item-pipeline.html

Solution 2

That's what Feed Exports are for: http://doc.scrapy.org/en/latest/topics/feed-exports.html

One of the most frequently required features when implementing scrapers is being able to store the scraped data properly and, quite often, that means generating a “export file” with the scraped data (commonly called “export feed”) to be consumed by other systems.

Scrapy provides this functionality out of the box with the Feed Exports, which allows you to generate a feed with the scraped items, using multiple serialization formats and storage backends.

Share:
16,689

Related videos on Youtube

Chris
Author by

Chris

Updated on January 19, 2022

Comments

  • Chris
    Chris over 2 years

    I successfully tried to export my items into a csv file from the command line like:

       scrapy crawl spiderName -o filename.csv
    

    My question is: What is the easiest solution to do the same in the code? I need this as i extract the filename from another file. End scenario should be, that i call

      scrapy crawl spiderName
    

    and it writes the items into filename.csv

    • Chris
      Chris almost 10 years
      @PadraicCunningham scrapy has its own way of exporting to files. what i have seen so far i need to define spider_closed, spider_opened, ... in order to do this and that seems like a lot of overhead compared to the command line solution
  • mayank_io
    mayank_io over 8 years
    Scrapy's documentation mentions that writing to file is better done using Feed Exports as mentioned in the answer below by @Arthur. Here is a snippet from Scrapy's docs - "The purpose of JsonWriterPipeline is just to introduce how to write item pipelines. If you really want to store all scraped items into a JSON file you should use the Feed exports."
  • rocktheartsm4l
    rocktheartsm4l over 8 years
    Op clearly did not want to use an exporter... I mean op clearly states an example of using an export right off the bat... If you wanted to be constructive you could point out that this implementation causes disk io on a per item basis and signals could be used to store the data and at the end of crawl do one disk op... Either way, petty downvote. Merry Christmas.
  • mayank_io
    mayank_io over 8 years
    Merry Christmas. My apologies rocktheartsm4l. Looks like I pressed that downvote while scrolling. I am unable to remove the downvote now :-(. It mentions though that I can vote again if the answer is edited. If you can make a small edit to your answer then I can fix my mistake.
  • Eric G
    Eric G about 7 years
    This would be a great answer if it had code or linked to code.