How do I remove repeated elements from ArrayList?

888,294

Solution 1

If you don't want duplicates in a Collection, you should consider why you're using a Collection that allows duplicates. The easiest way to remove repeated elements is to add the contents to a Set (which will not allow duplicates) and then add the Set back to the ArrayList:

Set<String> set = new HashSet<>(yourList);
yourList.clear();
yourList.addAll(set);

Of course, this destroys the ordering of the elements in the ArrayList.

Solution 2

Although converting the ArrayList to a HashSet effectively removes duplicates, if you need to preserve insertion order, I'd rather suggest you to use this variant

// list is some List of Strings
Set<String> s = new LinkedHashSet<>(list);

Then, if you need to get back a List reference, you can use again the conversion constructor.

Solution 3

In Java 8:

List<String> deduped = list.stream().distinct().collect(Collectors.toList());

Please note that the hashCode-equals contract for list members should be respected for the filtering to work properly.

Solution 4

Suppose we have a list of String like:

List<String> strList = new ArrayList<>(5);
// insert up to five items to list.        

Then we can remove duplicate elements in multiple ways.

Prior to Java 8

List<String> deDupStringList = new ArrayList<>(new HashSet<>(strList));

Note: If we want to maintain the insertion order then we need to use LinkedHashSet in place of HashSet

Using Guava

List<String> deDupStringList2 = Lists.newArrayList(Sets.newHashSet(strList));

Using Java 8

List<String> deDupStringList3 = strList.stream().distinct().collect(Collectors.toList());

Note: In case we want to collect the result in a specific list implementation e.g. LinkedList then we can modify the above example as:

List<String> deDupStringList3 = strList.stream().distinct()
                 .collect(Collectors.toCollection(LinkedList::new));

We can use parallelStream also in the above code but it may not give expected performace benefits. Check this question for more.

Solution 5

If you don't want duplicates, use a Set instead of a List. To convert a List to a Set you can use the following code:

// list is some List of Strings
Set<String> s = new HashSet<String>(list);

If really necessary you can use the same construction to convert a Set back into a List.

Share:
888,294
user25778
Author by

user25778

Updated on July 20, 2022

Comments

  • user25778
    user25778 almost 2 years

    I have an ArrayList<String>, and I want to remove repeated strings from it. How can I do this?

  • Shervin Asgari
    Shervin Asgari over 14 years
    Why use ArrayList in parameter? Why not just List? Will that not work?
  • volley
    volley over 14 years
    A List will absolutely work as in-parameter for the first method listed. The method is however optimized for use with a random access list such as ArrayList, so if a LinkedList is passed instead you will get poor performance. For example, setting the n:th element in a LinkedList takes O(n) time, whereas setting the n:th element in a random access list (such as ArrayList) takes O(1) time. Again, though, this is probably overkill... If you need this kind of specialized code it will hopefully be in an isolated situation.
  • volley
    volley over 14 years
    See also LinkedHashSet, if you wish to retain the order.
  • Matt Briançon
    Matt Briançon about 13 years
    Does LinkedHashSet make any guarantees as to which of several duplicates are kept from the list? For instance, if position 1, 3, and 5 are duplicates in the original list, can we assume that this process will remove 3 and 5? Or maybe remove 1 and 3? Thanks.
  • user2756501
    user2756501 about 13 years
    @Matt: yes, it does guarantee that. The docs say: "This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). Note that insertion order is not affected if an element is re-inserted into the set."
  • Chetan
    Chetan about 12 years
    But this will just create the set without duplicates , I want to know which number was duplicate in O(n) time
  • WowBow
    WowBow about 12 years
    Very interesting. I have a different situation here. I am not trying to sort String but another object called AwardYearSource. This class has an int attribute called year. So I want to remove duplicates based on the year. i.e if there is year 2010 mentioned more than once, I want to remove that AwardYearSource object. How can I do that?
  • volley
    volley about 12 years
    Chetan, finding the items in O(n) is possible if the set of possible values is small (think Byte or Short); a BitSet or similar can then be used to store and look up already encountered values in O(1) time. But then again - with such a small value set, doing it in O(n log n) might not be a problem anyway since n is low. (This comment is not applicable to original poster, who needs to do this with String.)
  • Ondrej Bozek
    Ondrej Bozek almost 12 years
    @Chetan finding all duplicates from ArrayList in O(n), its important to have correctly defined equals method on objects which you have in the list (no problem for numbers): public Set<Object> findDuplicates(List<Object> list) { Set<Object> items = new HashSet<Object>(); Set<Object> duplicates = new HashSet<Object>(); for (Object item : list) { if (items.contains(item)) { duplicates.add(item); } else { items.add(item); } } return duplicates; }
  • Ondrej Bozek
    Ondrej Bozek almost 12 years
    @WowBow For example you can define Wrapper object which holds AwardYearSource. And define this Wrapper objects equals method based on AwardYearSources year field. Then you can use Set with these Wrapper objects.
  • shrini1000
    shrini1000 over 11 years
    @WowBow or implement Comparable/Comparator
  • Kevik
    Kevik almost 11 years
    this is great, and it gets even better if you change HashSet to LinkedHashSet
  • Jonik
    Jonik over 10 years
    A good practice would be to define variables using the interface types List and Set (instead of implementation types ArrayList and HashSet as in your example).
  • maaartinus
    maaartinus over 10 years
    It's slow and you might get a ConcurrentModificationException.
  • CarlJohn
    CarlJohn over 10 years
    @maaartinus Have you tried that code ?. It won't produce any exceptions.Also it is pretty fast. I tried the code before posting.
  • maaartinus
    maaartinus over 10 years
    You're right, it doesn't as you iterate the array instead of the list. However, it's slow like hell. Try it with a few millions elements. Compare it to ImmutableSet.copyOf(lst).toList().
  • ashes999
    ashes999 over 10 years
    You can clean this up by using new HashSet(al) instead of initializing it to empty and calling addAll.
  • abarnert
    abarnert over 9 years
    Why would you post a quadratic solution to a question that already has 2-year-old linear and log-linear solutions, that are also simpler?
  • jean d'arme
    jean d'arme over 8 years
    can I add rules for setting what's duplicate to me? For example: when my Object has several values if two of them repeat I consider them as duplicate (other values can be different) and use Set?
  • neo7
    neo7 over 8 years
    This implementation return no element in the list because of the last j--
  • Manash Ranjan Dakua
    Manash Ranjan Dakua over 8 years
    This implementation work's very fine.there is no issue behind this and for this task i am only use one arraylist.so this answer is completely good.before giving negative feedback you shold also add testcase also so that every one can understand the result.Thanks Manash
  • Maytham Fahmi
    Maytham Fahmi over 8 years
    @jonathan-stafford nice and useful peace of code. voteup
  • randers
    randers over 8 years
    This answer lacks two things: 1) It does not use generics, but raw types (ArrayList<T> should be used instead of ArrayList) 2) The explicit iterator creating can be avoided by using a for (T current : l1) { ... }. Even if you wanted to use an Iterator explicitly, iterador is misspelled.
  • Aniket Paul
    Aniket Paul about 8 years
    answers the question I was asked in the interview .. How to remove repeated values from an ArrayList without using Sets. Thanx
  • Patrick M
    Patrick M almost 8 years
    And this implementation runs in quadratic time, compared to the linked hash set implementation running in linear time. (i.e. this takes 10 times longer on a list with 10 elements, 10,000 times longer on a list with 10,000 elements. JDK 6 implementation for ArrayList.contains, JDK8 impl is the same.)
  • Patrick M
    Patrick M almost 8 years
    Internally, indexOf iterates the lst using a for loop.
  • Neon Warge
    Neon Warge almost 8 years
    One reason that I am force to use collection instead of set is due to HTTP API calls. The API returns a POJO of certain objects, I need to act data on it so I really need it not to be duplicated. For some reason, it wasn't part of the API, to return to me a list of non-duplicated data, I cannot wait until it is developed so I must find a way to present a list with no duplicates.
  • r00tandy
    r00tandy over 7 years
    And the OneLiner would be: myArrayList = new ArrayList(new HashSet(myArrayList)); (But just do it if you really need the ArrayList before AND after this line!)
  • Jarred Allen
    Jarred Allen over 7 years
    @jeand'arme If you use a TreeSet instead of a HashSet, you can define your own Comparator to use, and the TreeSet will consider two items to be duplicates if the Comparators .compare(e1, e2) returns 0. Note that this will destroy the existing order of the arraylist.
  • StackFlowed
    StackFlowed over 7 years
    How do i do this for case insensitive distinct ?
  • Muhammad Adil
    Muhammad Adil over 7 years
    Similarly at the bottom of the thread, I have given an answer where I am using Set for Custom Object. In a case if anyone have custom object like "Contact" or "Student" can use that answer that works fine for me.
  • Andy Turner
    Andy Turner over 6 years
    Note that there is an ImmutableSet.asList() method, returning an ImmutableList, if you need it back as a List.
  • Paul
    Paul over 6 years
    @StackFlowed If you don't need to preserve the order of the list you can addAll to new TreeSet<String>(String.CASE_INSENSITIVE_ORDER). The first element added will remain in the set so if your list contains "Dog" and "dog" (in that order) the TreeSet will contain "Dog". If order must be preserved then before the line in the answer put list.replaceAll(String::toUpperCase);.
  • Tushar Gogna
    Tushar Gogna over 6 years
    I liked this solution better.
  • Samir
    Samir about 6 years
    I am getting this error :incompatible types: List<Object> cannot be converted to List<String>
  • Diablo
    Diablo almost 6 years
    Yah, When i typed my previous comments, I was in a impression that parallel streams will give better performance always. But it's a myth. I later learned that there are certain scenarios where parallel streams should be used. In this scenario parallel streams will not give any better performance. and yes parallel streams might not give desired results some cases. List<String> deDupStringList3 = stringList.stream().map(String::toLowerCase).distinct().coll‌​ect(Collectors.toLis‌​t()); should be the suitable solution in this case
  • Giacomo
    Giacomo over 5 years
    what is the time complexity ?
  • ByWaleed
    ByWaleed about 5 years
    I think this is the best way of removing duplicated in an ArrayList. Definitely recommended. Thank you @Nenad for the answer.
  • TheRealChx101
    TheRealChx101 about 5 years
    The problem comes when you have to specifically access an element. For instance when binding an object to a list item view in Android, you are given its index. So Set cannot be used here.
  • Holger
    Holger almost 5 years
    When you use T element = iter.next();, you don’t need the unchecked type casts. Or you use for(T element: list) … instead of dealing with an Iterator manually.
  • Holger
    Holger almost 5 years
    “without using any other data structure”, except another ArrayList. Which will be very inefficient for larger lists.
  • Holger
    Holger almost 5 years
    “…and also not be quadratic in run time” Of course, this is quadratic in run time. It’s even worse than other quadratic solutions.
  • Holger
    Holger almost 5 years
    Calling indexOf on an ArrayList within a forEach is O(n²).
  • Holger
    Holger almost 5 years
    “if you are concerned about the overhead of HashSet” and are ignorant towards the overhead of Collectiors.sort
  • Holger
    Holger almost 5 years
    Not easier than Set<String> strSet = new HashSet<>(strList);
  • Holger
    Holger almost 5 years
    Try ListIterator<String> i = hashList.listIterator(); i.add("hello"); i.add("hello");. Alternatively, you can use hashList.add("a"); hashList.add("b"); hashList.replaceAll(x -> "hello");. And there might be more ways to counteract this approach in the future. The takeaway is that you shouldn’t try to enforce new contracts via subclassing with classes not designed for that. You will find more about this topic when you search for the “Prefer composition over inheritance” OOP rule.
  • Laser Infinite
    Laser Infinite over 4 years
    This is a simple solution in general but how do you remove the duplicates from an Arraylist of int[]?
  • Ajay Mistry
    Ajay Mistry about 4 years
    not working with ArrayList having model instead of string .
  • jvargas
    jvargas about 4 years
    How can I aproach this when the list is an object list
  • Holger
    Holger almost 4 years
    Alternative: Set<Object> set = new HashSet<>(); yourList.removeIf(x -> !set.add(x)); The advantage is that this alternative allows you to decide what to use in the set.add(…) expression, for your particular notion of “duplicate”. It’s also independent of the list’s actual elements type. It also retains the order, regardless of whether the set maintains the order or not. Also usable with a TreeSet (e.g. with custom comparator) instead of HashSet.
  • LiNKeR
    LiNKeR over 3 years
    that makes a lot of sense
  • kfir
    kfir over 2 years
    Perfect - just missing "repeated = false;" in the internal loop after the "if(!repeated) l2.add(c1);" otherwise it return a short list
  • Jwala Kumar
    Jwala Kumar about 2 years
    For those, who have problem in understanding ArrayList<Integer> arrList3 = new ArrayList<Integer>(Arrays.asList(1, 23, 1, 3, 3, 2, 3, 2, 12, 67, 23, 12, 34, 9)); Collection<Integer> set = new HashSet<>(arrList3); arrList3.clear(); arrList3.addAll(set); System.out.println(set);