Python check if value in csv file

python loops csv if-statement

13,676

Solution 1

Read the file into a variable-

with open('urls_list.csv', 'r') as fp:
    s = fp.read()

Check to see if each list item is in the file, if not save it

missing = []
for url in urls_list:
    if url not in s:
        missing.append(url + '\n')

Write the missing url's to the file

if missing:
    with open('urls_list.csv', 'a+') as fp:
        fp.writelines(missing)

Solution 2

Considering your file has only one column, the csv module might be an overkill.

Here's a version that first reads all the lines from the file and reopens the file to write urls that are not already in the file:

lines = open('urls_list.csv', 'r').read()

with open('urls_list.csv', 'a+') as fp:
    for url in urls_list:
        if url in lines:
            print "YEY!"
        else:
            fp.write(url+'\n')

13,676

Author by

Konstantin Rusanov

Updated on June 04, 2022

Comments

Konstantin Rusanov almost 2 years
i got list of URLs, for example:
```
urls_list = [
    "http://yandex.ru",
    "http://google.ru",
    "http://rambler.ru",
    "http://google.ru",
    "http://gmail.ru",
    "http://mail.ru"
]
```
I need to open the csv file, check if each value from list in file - skip to next value, else (if value not in a list) add this value in list.

Result: 1st run - add all lines (if file is empty), 2nd run - doing nothing, because all elements in already in file.

A wrote code, but it's work completely incorrect:
```
import csv


urls_list = [
    "http://yandex.ru",
    "http://google.ru",
    "http://rambler.ru",
    "http://google.ru",
    "http://gmail.ru",
    "http://mail.ru"
]



with open('urls_list.csv', 'r') as fp:
    for row in fp:
        for url in urls_list:
            if url in row:
                print "YEY!"
            with open('urls_list.csv', 'a+') as fp:
                wr = csv.writer(fp, dialect='excel')
                wr.writerow([url])
```
- Mauro Baraldi over 7 years
  
  You are open a file in read mode, than, while reading, reopen it to append. There is the root of all problems.
- ngulam over 7 years
  
  As Mauro stated: use a second file to append.
- Konstantin Rusanov over 7 years
  
  But i need to add list element in csv file and check if element isn't in file - do some code and write element in file, if element in file skip this element and go to next, if next element isn't in file - do some code and write element in file, etc...
- Moses Koledoye over 7 years
  
  Do you actually need csv? Considering your file has only one column
- Konstantin Rusanov over 7 years
  
  My main target to make this happen: 1. I parse xml sitemap and go for each link. 2. Parse content, and store in file. 3. I need to check if i already parsed this url - skip, and go to next url.
Swapnil B. about 4 years

Would this work for csv file with 524,000 (even 1.3B lines as well) lines? this would need large machine with sufficient RAM to hold this much amount of data.
wwii about 4 years

@SwapnilB. - you would need to try it or experiment with something smaller and determine memory requirements. There are other ways to approach the problem if either the search list or the search space are too large to fit in memory.
Swapnil B. about 4 years

this deserves another post but this is how I will handle large processing. Do lazy reading, process each row (and discard it), store processed data in pickle file. Finally write distributed (probably using dispy) jobs to process pickle file. This will convert memory based to file based distributed processing with low cost.
wwii about 4 years

@SwapnilB. ... IIRC there are numerous Q&A's here regarding processing large (csv) files.