codility absolute distinct count from an array

29,219

Solution 1

If the array is sorted you can find duplicates by looking a neightbours. To compare absolute values to need to start at both the start and the end. This avoid creating a new structure.

EDIT: IMHO HashMap/HashSet is O(log(log(n)) due to collisions, it is only O(1) if there is a perfect hash function. I would have thought not creating object which be much much faster but appears to be only 4x fast on my machine.

In summary, you can see that using a Set is simpler, clearer and easier to maintain. It is still very fast and would be the best solution in 98% of cases.

public static void main(String[] args) throws Exception {
    for (int len : new int[]{100 * 1000 * 1000, 10 * 1000 * 1000, 1000 * 1000, 100 * 1000, 10 * 1000, 1000}) {
        int[] nums = new int[len];
        for (int i = 0; i < len; i++)
            nums[i] = (int) (Math.random() * (Math.random() * 2001 - 1000));
        Arrays.sort(nums);

        long timeArray = 0;
        long timeSet = 0;
        int runs = len > 1000 * 1000 ? 10 : len >= 100 * 1000 ? 100 : 1000;
        for (int i = 0; i < runs; i++) {
            long time1 = System.nanoTime();
            int count = countDistinct(nums);
            long time2 = System.nanoTime();
            int count2 = countDistinctUsingSet(nums);
            long time3 = System.nanoTime();
            timeArray += time2 - time1;
            timeSet += time3 - time2;
            assert count == count2;
        }
        System.out.printf("For %,d numbers, using an array took %,d us on average, using a Set took %,d us on average, ratio=%.1f%n",
                len, timeArray / 1000 / runs, timeSet / 1000 / runs, 1.0 * timeSet / timeArray);
    }
}

private static int countDistinct(int[] nums) {
    int lastLeft = Math.abs(nums[0]);
    int lastRight = Math.abs(nums[nums.length - 1]);
    int count = 0;
    for (int a = 1, b = nums.length - 2; a <= b;) {
        int left = Math.abs(nums[a]);
        int right = Math.abs(nums[b]);
        if (left == lastLeft) {
            a++;
            lastLeft = left;
        } else if (right == lastRight) {
            b--;
            lastRight = right;
        } else if (lastLeft == lastRight) {
            a++;
            b--;
            lastLeft = left;
            lastRight = right;
            count++;
        } else if (lastLeft > lastRight) {
            count++;
            a++;
            lastLeft = left;
        } else {
            count++;
            b--;
            lastRight = right;
        }
    }
    count += (lastLeft == lastRight ? 1 : 2);
    return count;
}

private static int countDistinctUsingSet(int[] nums) {
    Set<Integer> s = new HashSet<Integer>();
    for (int n : nums)
        s.add(Math.abs(n));
    int count = s.size();
    return count;
}

prints

For 100,000,000 numbers, using an array took 279,623 us on average, using a Set took 1,270,029 us on average, ratio=4.5

For 10,000,000 numbers, using an array took 28,525 us on average, using a Set took 126,591 us on average, ratio=4.4

For 1,000,000 numbers, using an array took 2,846 us on average, using a Set took 12,131 us on average, ratio=4.3

For 100,000 numbers, using an array took 297 us on average, using a Set took 1,239 us on average, ratio=4.2

For 10,000 numbers, using an array took 42 us on average, using a Set took 156 us on average, ratio=3.7

For 1,000 numbers, using an array took 8 us on average, using a Set took 30 us on average, ratio=3.6


On @Kevin K's point, even Integer can have collision even through it's hash values are unique, it can map to the same bucket as the capacity is limited.

public static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

public static void main(String[] args) throws Exception {
    Map<Integer, Integer> map = new HashMap<Integer, Integer>(32, 2.0f);
    for (int i = 0; i < 10000 && map.size() < 32 * 2; i++) {
        if (hash(i) % 32 == 0)
            map.put(i, i);
    }
    System.out.println(map.keySet());
}

prints

[2032, 2002, 1972, 1942, 1913, 1883, 1853, 1823, 1763, 1729, 1703, 1669, 1642, 1608, 1582, 1548, 1524, 1494, 1456, 1426, 1405, 1375, 1337, 1307, 1255, 1221, 1187, 1153, 1134, 1100, 1066, 1032, 1016, 986, 956, 926, 881, 851, 821, 791, 747, 713, 687, 653, 610, 576, 550, 516, 508, 478, 440, 410, 373, 343, 305, 275, 239, 205, 171, 137, 102, 68, 34, 0]

The values are in reverse order because the HashMap has generated into a LinkedList.

Solution 2

You should have pay attention to the fact that the array is sorted in ascending order.

Lets assume that there are only positive numbers, or the question was not about absolute distinct.

The you could count the Number by iterating trough the list, and increment the counter by one, if the actual element is different from the last. (and +1 for the first element)

If you understand that, you can add the absolute distinct constraint. For example by improving the algorithm by two pointers, one starting from the begin, one from the end. Then you have also to take care that both pointers works like in parallel, so that both pointers will end at 0 or the absoulte lowest number (positive/negative) - This will complicate the whole stuff a bit, but it is possible.

Solution 3

this is what i came up with, not sure if the fact that it contains a for loop inside the while deems it as typical noobie mistake.

    private int getDistict(int[] myaa) {
        int dupcount=0;
        int i = 0;
        int j = myaa.length - 1;
        while (i < j) {
    //      check neighbors 
            if(Math.abs(myaa[i]) == Math.abs(myaa[i+1])) {
                dupcount++;
                i++;
                continue;
            }
//      check the other side
            if(myaa[i] < 0) {
                for(int k = j; Math.abs(myaa[i]) <= Math.abs(myaa[k]); k-- ) {
                    if(Math.abs(myaa[i])==Math.abs(myaa[k])){
                        dupcount++;
                    }
                }               
            }
            i++;
        }
        return myaa.length - dupcount;
    }

Solution 4

int count(vector<int> &A) {

    int len = B.size();
    if (len <= 0)
        return 0;

    // make a copy and calc absolutes of all items
    vector<int> B = vector<int>(A);
    for (int i = 0; i < len; i++) {
        if (B[i] < 0) 
        B[i] = -B[i];
    }

    // and sort so we have a simple absolute count
    sort(B.begin(), B.end());

    int result = 1; //count first number always
    for (int j = 1; j < len; j++) {
        if (B[j] != B[j-1])
            result++;
    }
    return result;

}

Solution 5

Heres what I coded.....Let me know if it can be improved....

import java.util.Arrays;
import java.util.HashSet;

/********
Joel 
Jun 19, 2013
 ********/

public class CodilityTest {

    private int[] inputArray;
    public static int count=0;

    public void setInput(int [] givenIP){
        this.inputArray= new int[givenIP.length];
        for(int i=0; i<givenIP.length;i++)
        { inputArray[i] = givenIP[i];}
    }

    public int[] getInput(){
        return this.inputArray;
    }

    public CodilityTest(){
        count=0;
    }

    public static void main(String[] args) throws Exception {
        // TODO Auto-generated method stub

        CodilityTest o_tc = new CodilityTest();

        int [] x = {1,2,-3,4,-5,-11,-2,3,-4,5};
        int [] y = new int[0];
        o_tc.setInput(x);
        o_tc.getOutput(x);
        System.out.println(count);

        CodilityTest o_tc1 = new CodilityTest();
        o_tc1.getOutput(y);     
    }

    public void getOutput(int [] givenInput)
    {
        if(givenInput == null)
        {System.out.println("Please Enter some variables Sir."); return;}

        if(givenInput.length<1)
        {System.out.println("Please Enter some variables Sir."); return; }

        if(givenInput.length < 2)
        {count+=1; return;}

        HashSet<Integer> o_hs = new HashSet<Integer>();
        for(int numCount=0; numCount<givenInput.length;numCount++)
        {           
            if(o_hs.add(Math.abs(givenInput[numCount])))
            {count+=1;}
        }

    }

}
Share:
29,219
yahh
Author by

yahh

Updated on July 09, 2022

Comments

  • yahh
    yahh almost 2 years

    so i took the codility interview test yesterday and was informed today that i failed, unfortunately i wasnt given any other information by either codility nor the employer as to where i screwed up so i would appreciate some help in knowing where i went wrong. i know codility pays alot of emphasis on how fast the program runs and how it behaves for large numbers. now i didnt copy paste the questions so this the approx of what i remember

    1. count the number of elements in an array a which are absolute distinct, what it means is if an array had -3 and 3 in it these numbers are not distinct because|-3|=|3|. i think an example would clear it up better

    a={-5,-3,0,1,-3} the result would be 4 because there are 4 absolute distinct elements in this array.

    The question also stated that a.length would be <=10000, and most importantly it stated that assume that the array is sorted in ascending order but i didnt really understand why we would need it to be sorted

    IF YOU THINK I MISSED SOMETHING ASK AND I WILL TRY TO CLEAR UP THE QUESTION FURTHER.

    here is my code

    import java.util.HashMap;
    import java.util.HashSet;
    import java.util.Set;
    
    
    public class test2 {
    
        int test(int[] a){
            Set<Integer> s=new HashSet<Integer>();
    
            for(int i=0;i<a.length;i++){
                s.add(Math.abs(a[i]));
    
            }
            return s.size();
    
        }
    
        public static void main(String[] args) {
            test2 t=new test2();
            int[] a={1,1,1,2,-1};
            System.out.println(t.test(a));
    
        }
    
    }
    
  • yahh
    yahh about 13 years
    but even though i created a new structure i am still taking the same amount of time to solve the problem
  • Vishy
    Vishy about 13 years
    It may appear to be the same amount of time, but if you measure it for a large array of up to 10,000 entries you will find there is a very big relative difference. However, you are right that a simple solution is often the best and fast enough.
  • Ralph
    Ralph about 13 years
    @yahh: the trick is that you iterate only once over the array. You do not need the time to manage the set (insert). -- You need to think in complexity (O-Notation) not in secondes.
  • Vishy
    Vishy about 13 years
    Or create any objects which is the really expensive part.
  • Buhb
    Buhb about 13 years
    Same time complexity, bur a lookup in a set still takes time. Also, the set needs memory. Try it in a profiler, I'm guessing your solution takes twice as long.
  • yahh
    yahh about 13 years
    i do understand the fact that my algorithm would waste space but it would take the same amount of time to solve. the website codility doesnt really bother much with space more with running time
  • Stephen Chung
    Stephen Chung about 13 years
    @yahh, your method is not O(n) time. It requires a lookup for containment within a set, which is probably O(nLogn) if the set if implemented as a tree, or O(n) if implemented as a hash table, but then you'll have the hash function to calculate. Multiply this by huge amounts of data, and the CPU and memory costs will add up quickly. Do not assume something has no or little cost when the costs may be hidden inside its implementation (e.g. in your HashSet).
  • Vishy
    Vishy about 13 years
    I would bet closer to 100x times as long. ;)
  • Buhb
    Buhb about 13 years
    @stephen: insertion and lookup are both O(1) for HashSet.
  • Buhb
    Buhb about 13 years
    @peter: I meant at least twice.
  • Christian
    Christian about 13 years
    @yahh Well I think in an interview what really counts is the BEST way of doing it. This means in terms of space and memory. Ralphs method with two pointers runs in O(n) and takes O(1) memory. That is the best you can do. So this is obviously the correct answer. I suppose only the perfect answer gives the points.
  • Christian
    Christian about 13 years
    by the way: What I learned from those interviews / tests is that an error in the code is no problem. If your code does not compile that's not as interesting as the algorithm / idea behind! In most interviews you write code on a piece of paper, where normally even the best programmers do make syntax errors etc.
  • brain
    brain about 13 years
    I dont think you should assume that the list doesn't contain duplicates.
  • blaze
    blaze about 13 years
    yahh: your code us actually slower. java.util.HashSet has constant time add() only if hash function is perfect and no conflicts occurs. It's O(bucket length) otherwise, so in worst case you have O(n^2) performance.
  • yahh
    yahh about 13 years
    @blaze: shouldnt i assume that when using the hash function for primitives it would work perfectly.
  • blaze
    blaze about 13 years
    @yahh: definitely no. Perfect hashing is when each value hashes to distinct result and your values are "just numbers", without range limitations. Perfect hash function can't do better then return number itself as hash value and it's total waste of memory to allocate 2^N buckets. So you will definitely have some imperfect hashing and possibly buckets reallocations.
  • Buhb
    Buhb about 13 years
    Nice to have numbers on the comparison
  • Nikolay Polivanov
    Nikolay Polivanov over 12 years
    You can implement comparator for sort method so that comparison would be made for absolute values. It would simplify countDistinct algorithm.
  • Vishy
    Vishy over 12 years
    @p.kolya That might not work so well with an int[] without using extra memory.
  • Nikolay Polivanov
    Nikolay Polivanov over 12 years
    @PeterLawrey: if the array is sorted by absolute values next step would be to iterate once through the array counting absolutely distinct values (calling abs everytime) and leaving out absolutely equal ones. It doesn't requires extra memory, has the same complexity, but simplifies algorithm.
  • Vishy
    Vishy over 12 years
    @p.kolya True, if you had a sort which sorted by absolute value, it would make the algo simpler. However, I don't know of any built in one and you can't use a Comparator to do it because it only works on Integer not int
  • Kevin K
    Kevin K almost 11 years
    "IMHO HashMap/HashSet is O(log(log(n)) due to collisions, it is only O(1) if there is a perfect hash function" - In the case of Integers, the hash function is perfect...it's just the value of the integer, so there are no collisions.
  • Vishy
    Vishy almost 11 years
    @KevinK It is not perfect because you don't have an unlimited capacity. With a capacity, even Integer will give you collisions.
  • Vishy
    Vishy almost 11 years
    @KevinK I have added an example to be answer showing how Integers can have significant collisions (in this case every integer has a collision)
  • Kevin K
    Kevin K almost 11 years
    d'oh, I forgot about that! There are no hash code collisions, but of course the map only has a limited number of buckets :) Thanks for taking the time to answer, and sorry for the added confusion!
  • Vishy
    Vishy almost 11 years
    @KevinK +1 Most people don't get as far and think about what you did. It was a very good point to raise.
  • JehandadK
    JehandadK over 8 years
    But the correct answer is 5 for the first set of input!
  • Richard Erickson
    Richard Erickson about 8 years
    welcome to SO and thank you for posting an answer. Please consider expanding it and explaining your code.
  • user2286810
    user2286810 over 3 years
    You got 78% on codility's server:▶example example test✔OK expand allCorrectness tests ▶one_element✔OK ▶two_elements✔OK ▶same_elements✔OK ▶simple✔OK ▶simple_no_zero✘RUNTIME ERROR tested program terminated with exit code 1 ▶simple_no_same✔OK ▶simple_no_negative✔OK ▶simple_no_positive✔OK ▶arith_overlow✔OK ▶medium_chaotic1✘RUNTIME ERROR tested program terminated with exit code 1 ▶medium_chaotic2✔OK expand allPerformance tests ▶long_sequence_no_negative✔OK ▶long_sequence_no_positive✔OK ▶long_sequence✘RUNTIME ERROR tested program terminated with exit code 1
  • Kunal
    Kunal almost 3 years
    This is the smallest and the most precise answer written in java 8, please mail me if the explanation of each step is needed, I will be happy to share