How do I remove duplicate words from a list in python without using sets?

13,373

Solution 1

You did have a couple logic error with your code. I fixed them, hope it helps.

fname = "stuff.txt"
fhand = open(fname)
AllWords = list()      #create new list
ResultList = list()    #create new results list I want to append words to

for line in fhand:
    line.rstrip()   #strip white space
    words = line.split()    #split lines of words and make list
    AllWords.extend(words)   #make the list from 4 lists to 1 list

AllWords.sort()  #sort list

for word in AllWords:   #for each word in line.split()
    if word not in ResultList:    #if a word isn't in line.split            
        ResultList.append(word)   #append it.


print(ResultList)

Tested on Python 3.4, no importing.

Solution 2

mylist = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
newlist = sorted(set(mylist), key=lambda x:mylist.index(x))
print(newlist)
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

newlist contains a list of the set of unique values from mylist, sorted by each item's index in mylist.

Solution 3

Below function might help.

   def remove_duplicate_from_list(temp_list):
        if temp_list:
            my_list_temp = []
            for word in temp_list:
                if word not in my_list_temp:
                    my_list_temp.append(word)
            return my_list_temp
        else: return []

Solution 4

A good alternative to using a set would be to use a dictionary. The collections module contains a class called Counter which is specialized dictionary for counting the number of times each of its keys are seen. Using it you could do something like this:

from collections import Counter

wordlist = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and',
            'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is',
            'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun',
            'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']

newlist = sorted(Counter(wordlist), 
                 key=lambda w: w.lower())  # case insensitive sort
print(newlist)

Output:

['already', 'and', 'Arise', 'breaks', 'But', 'east', 'envious', 'fair',
 'grief', 'is', 'It', 'Juliet', 'kill', 'light', 'moon', 'pale', 'sick',
 'soft', 'sun', 'the', 'through', 'what', 'Who', 'window', 'with', 'yonder']

Solution 5

This should work, it walks the list and adds elements to a new list if they are not the same as the last element added to the new list.

def unique(lst):
    """ Assumes lst is already sorted """
    unique_list = []
    for el in lst:
        if el != unique_list[-1]:
            unique_list.append(el)
    return unique_list

You could also use collections.groupby which works similarly

from collections import groupby

# lst must already be sorted 
unique_list = [key for key, _ in groupby(lst)]
Share:
13,373
MikeT
Author by

MikeT

Updated on June 04, 2022

Comments

  • MikeT
    MikeT almost 2 years

    I have the following python code which almost works for me (I'm SO close!). I have text file from one Shakespeare's plays that I'm opening: Original text file:

    "But soft what light through yonder window breaks

    It is the east and Juliet is the sun

    Arise fair sun and kill the envious moon

    Who is already sick and pale with grief"

    And the result of the code I worte gives me is this:

    ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']

    So this is almost what I want: It's already in a list sorted the way I want it, but how do I remove the duplicate words? I'm trying to create a new ResultsList and append the words to it, but it gives me the above result without getting rid of the duplicate words. If I "print ResultsList" it just dumps a ton of words out. They way I have it now is close, but I want to get rid of the extra "and's", "is's", "sun's" and "the's".... I want to keep it simple and use append(), but I'm not sure how I can get it to work. I don't want to do anything crazy with the code. What simple thing am I missing from my code inorder to remove the duplicate words?

    fname = raw_input("Enter file name: ")  
    fhand = open(fname)
    NewList = list()      #create new list
    ResultList = list()    #create new results list I want to append words to
    
    for line in fhand:
        line.rstrip()       #strip white space
        words = line.split()    #split lines of words and make list
            NewList.extend(words)   #make the list from 4 lists to 1 list
    
        for word in line.split():   #for each word in line.split()
            if words not in line.split():    #if a word isn't in line.split
                NewList.sort()             #sort it
                ResultList.append(words)   #append it, but this doesn't work.
    
    
    print NewList
    #print ResultList (doesn't work the way I want it to)
    
  • MikeT
    MikeT about 9 years
    Thank you. I was wondering if I could do this without using a set. Keeping with the list theme....?
  • TigerhawkT3
    TigerhawkT3 about 9 years
    Why are you trying to avoid sets? They're ideal for unique membership.
  • TigerhawkT3
    TigerhawkT3 about 9 years
    That's what I get for trying to save a few keystrokes here and there while testing. Thanks for pointing it out. :)
  • MikeT
    MikeT about 9 years
    Thank you! That is what I was looking for. I wanted to keep it "simple", I see what I did wrong now. Once again, thank you kindly, much appreciated.
  • MikeT
    MikeT about 9 years
    Thank you, that is an even better way to alphabatize.
  • MrAlexBailey
    MrAlexBailey about 9 years
    you could also check / remove the quotes with line = line.strip().strip('"')
  • Paul Rooney
    Paul Rooney about 9 years
    If the same word but in a different case appeared in the file, this would report duplicate words. e.g. if 'is' and 'Is' appear in the file, you would get 'is' and 'Is' in the final list. As there are no cases of this in your sample data I suppose its not too much of an issue.
  • MikeT
    MikeT about 9 years
    Thank you, that's something I could use in the future.
  • MikeT
    MikeT about 9 years
    Thank you, I mixed that up.
  • MikeT
    MikeT about 9 years
    Much obliged. I didn't know about counter.
  • MikeT
    MikeT about 9 years
    I am not looking to avoid sets for ever. I'm just new to python and wanted to progress in such away where I understand the basics, then move to more advanced stuff. But I appreciate your code. I can implement that in the future. Thanks again!