How to efficiently remove duplicates from an array without using Set

461,898

Solution 1

Since this question is still getting a lot of attention, I decided to answer it by copying this answer from Code Review.SE:

You're following the same philosophy as the bubble sort, which is very, very, very slow. Have you tried this?:

  • Sort your unordered array with quicksort. Quicksort is much faster than bubble sort (I know, you are not sorting, but the algorithm you follow is almost the same as bubble sort to traverse the array).

  • Then start removing duplicates (repeated values will be next to each other). In a for loop you could have two indices: source and destination. (On each loop you copy source to destination unless they are the same, and increment both by 1). Every time you find a duplicate you increment source (and don't perform the copy). @morgano

Solution 2

you can take the help of Set collection

int end = arr.length;
Set<Integer> set = new HashSet<Integer>();

for(int i = 0; i < end; i++){
  set.add(arr[i]);
}

now if you will iterate through this set, it will contain only unique values. Iterating code is like this :

Iterator it = set.iterator();
while(it.hasNext()) {
  System.out.println(it.next());
}

Solution 3

If you are allowed to use Java 8 streams:

Arrays.stream(arr).distinct().toArray();

Solution 4

Note: I am assuming the array is sorted.

Code:

int[] input = new int[]{1, 1, 3, 7, 7, 8, 9, 9, 9, 10};
int current = input[0];
boolean found = false;

for (int i = 0; i < input.length; i++) {
    if (current == input[i] && !found) {
        found = true;
    } else if (current != input[i]) {
        System.out.print(" " + current);
        current = input[i];
        found = false;
    }
}
System.out.print(" " + current);

output:

  1 3 7 8 9 10

Solution 5

Slight modification to the original code itself, by removing the innermost for loop.

public static int[] removeDuplicates(int[] arr){
    int end = arr.length;

    for (int i = 0; i < end; i++) {
        for (int j = i + 1; j < end; j++) {
            if (arr[i] == arr[j]) {                  
                /*int shiftLeft = j;
                for (int k = j+1; k < end; k++, shiftLeft++) {
                    arr[shiftLeft] = arr[k];
                }*/
                arr[j] = arr[end-1];
                end--;
                j--;
            }
        }
    }

    int[] whitelist = new int[end];
    /*for(int i = 0; i < end; i++){
        whitelist[i] = arr[i];
    }*/
    System.arraycopy(arr, 0, whitelist, 0, end);
    return whitelist;
}
Share:
461,898
ashur
Author by

ashur

Updated on July 08, 2022

Comments

  • ashur
    ashur almost 2 years

    I was asked to write my own implementation to remove duplicated values in an array. Here is what I have created. But after tests with 1,000,000 elements it took very long time to finish. Is there something that I can do to improve my algorithm or any bugs to remove ?

    I need to write my own implementation - not to use Set, HashSet etc. Or any other tools such as iterators. Simply an array to remove duplicates.

    public static int[] removeDuplicates(int[] arr) {
    
        int end = arr.length;
    
        for (int i = 0; i < end; i++) {
            for (int j = i + 1; j < end; j++) {
                if (arr[i] == arr[j]) {                  
                    int shiftLeft = j;
                    for (int k = j+1; k < end; k++, shiftLeft++) {
                        arr[shiftLeft] = arr[k];
                    }
                    end--;
                    j--;
                }
            }
        }
    
        int[] whitelist = new int[end];
        for(int i = 0; i < end; i++){
            whitelist[i] = arr[i];
        }
        return whitelist;
    }