Python check if value in csv file

13,676

Solution 1

Read the file into a variable-

with open('urls_list.csv', 'r') as fp:
    s = fp.read()

Check to see if each list item is in the file, if not save it

missing = []
for url in urls_list:
    if url not in s:
        missing.append(url + '\n')

Write the missing url's to the file

if missing:
    with open('urls_list.csv', 'a+') as fp:
        fp.writelines(missing)

Solution 2

Considering your file has only one column, the csv module might be an overkill.

Here's a version that first reads all the lines from the file and reopens the file to write urls that are not already in the file:

lines = open('urls_list.csv', 'r').read()

with open('urls_list.csv', 'a+') as fp:
    for url in urls_list:
        if url in lines:
            print "YEY!"
        else:
            fp.write(url+'\n')
Share:
13,676
Konstantin Rusanov
Author by

Konstantin Rusanov

Updated on June 04, 2022

Comments

  • Konstantin Rusanov
    Konstantin Rusanov almost 2 years

    i got list of URLs, for example:

    urls_list = [
        "http://yandex.ru",
        "http://google.ru",
        "http://rambler.ru",
        "http://google.ru",
        "http://gmail.ru",
        "http://mail.ru"
    ]
    

    I need to open the csv file, check if each value from list in file - skip to next value, else (if value not in a list) add this value in list.

    Result: 1st run - add all lines (if file is empty), 2nd run - doing nothing, because all elements in already in file.

    A wrote code, but it's work completely incorrect:

    import csv
    
    
    urls_list = [
        "http://yandex.ru",
        "http://google.ru",
        "http://rambler.ru",
        "http://google.ru",
        "http://gmail.ru",
        "http://mail.ru"
    ]
    
    
    
    with open('urls_list.csv', 'r') as fp:
        for row in fp:
            for url in urls_list:
                if url in row:
                    print "YEY!"
                with open('urls_list.csv', 'a+') as fp:
                    wr = csv.writer(fp, dialect='excel')
                    wr.writerow([url])
    
    • Mauro Baraldi
      Mauro Baraldi over 7 years
      You are open a file in read mode, than, while reading, reopen it to append. There is the root of all problems.
    • ngulam
      ngulam over 7 years
      As Mauro stated: use a second file to append.
    • Konstantin Rusanov
      Konstantin Rusanov over 7 years
      But i need to add list element in csv file and check if element isn't in file - do some code and write element in file, if element in file skip this element and go to next, if next element isn't in file - do some code and write element in file, etc...
    • Moses Koledoye
      Moses Koledoye over 7 years
      Do you actually need csv? Considering your file has only one column
    • Konstantin Rusanov
      Konstantin Rusanov over 7 years
      My main target to make this happen: 1. I parse xml sitemap and go for each link. 2. Parse content, and store in file. 3. I need to check if i already parsed this url - skip, and go to next url.
  • Swapnil B.
    Swapnil B. about 4 years
    Would this work for csv file with 524,000 (even 1.3B lines as well) lines? this would need large machine with sufficient RAM to hold this much amount of data.
  • wwii
    wwii about 4 years
    @SwapnilB. - you would need to try it or experiment with something smaller and determine memory requirements. There are other ways to approach the problem if either the search list or the search space are too large to fit in memory.
  • Swapnil B.
    Swapnil B. about 4 years
    this deserves another post but this is how I will handle large processing. Do lazy reading, process each row (and discard it), store processed data in pickle file. Finally write distributed (probably using dispy) jobs to process pickle file. This will convert memory based to file based distributed processing with low cost.
  • wwii
    wwii about 4 years
    @SwapnilB. ... IIRC there are numerous Q&A's here regarding processing large (csv) files.