codility absolute distinct count from an array
Solution 1
If the array is sorted you can find duplicates by looking a neightbours. To compare absolute values to need to start at both the start and the end. This avoid creating a new structure.
EDIT: IMHO HashMap/HashSet is O(log(log(n)) due to collisions, it is only O(1) if there is a perfect hash function. I would have thought not creating object which be much much faster but appears to be only 4x fast on my machine.
In summary, you can see that using a Set is simpler, clearer and easier to maintain. It is still very fast and would be the best solution in 98% of cases.
public static void main(String[] args) throws Exception {
for (int len : new int[]{100 * 1000 * 1000, 10 * 1000 * 1000, 1000 * 1000, 100 * 1000, 10 * 1000, 1000}) {
int[] nums = new int[len];
for (int i = 0; i < len; i++)
nums[i] = (int) (Math.random() * (Math.random() * 2001 - 1000));
Arrays.sort(nums);
long timeArray = 0;
long timeSet = 0;
int runs = len > 1000 * 1000 ? 10 : len >= 100 * 1000 ? 100 : 1000;
for (int i = 0; i < runs; i++) {
long time1 = System.nanoTime();
int count = countDistinct(nums);
long time2 = System.nanoTime();
int count2 = countDistinctUsingSet(nums);
long time3 = System.nanoTime();
timeArray += time2 - time1;
timeSet += time3 - time2;
assert count == count2;
}
System.out.printf("For %,d numbers, using an array took %,d us on average, using a Set took %,d us on average, ratio=%.1f%n",
len, timeArray / 1000 / runs, timeSet / 1000 / runs, 1.0 * timeSet / timeArray);
}
}
private static int countDistinct(int[] nums) {
int lastLeft = Math.abs(nums[0]);
int lastRight = Math.abs(nums[nums.length - 1]);
int count = 0;
for (int a = 1, b = nums.length - 2; a <= b;) {
int left = Math.abs(nums[a]);
int right = Math.abs(nums[b]);
if (left == lastLeft) {
a++;
lastLeft = left;
} else if (right == lastRight) {
b--;
lastRight = right;
} else if (lastLeft == lastRight) {
a++;
b--;
lastLeft = left;
lastRight = right;
count++;
} else if (lastLeft > lastRight) {
count++;
a++;
lastLeft = left;
} else {
count++;
b--;
lastRight = right;
}
}
count += (lastLeft == lastRight ? 1 : 2);
return count;
}
private static int countDistinctUsingSet(int[] nums) {
Set<Integer> s = new HashSet<Integer>();
for (int n : nums)
s.add(Math.abs(n));
int count = s.size();
return count;
}
prints
For 100,000,000 numbers, using an array took 279,623 us on average, using a Set took 1,270,029 us on average, ratio=4.5
For 10,000,000 numbers, using an array took 28,525 us on average, using a Set took 126,591 us on average, ratio=4.4
For 1,000,000 numbers, using an array took 2,846 us on average, using a Set took 12,131 us on average, ratio=4.3
For 100,000 numbers, using an array took 297 us on average, using a Set took 1,239 us on average, ratio=4.2
For 10,000 numbers, using an array took 42 us on average, using a Set took 156 us on average, ratio=3.7
For 1,000 numbers, using an array took 8 us on average, using a Set took 30 us on average, ratio=3.6
On @Kevin K's point, even Integer can have collision even through it's hash values are unique, it can map to the same bucket as the capacity is limited.
public static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
public static void main(String[] args) throws Exception {
Map<Integer, Integer> map = new HashMap<Integer, Integer>(32, 2.0f);
for (int i = 0; i < 10000 && map.size() < 32 * 2; i++) {
if (hash(i) % 32 == 0)
map.put(i, i);
}
System.out.println(map.keySet());
}
prints
[2032, 2002, 1972, 1942, 1913, 1883, 1853, 1823, 1763, 1729, 1703, 1669, 1642, 1608, 1582, 1548, 1524, 1494, 1456, 1426, 1405, 1375, 1337, 1307, 1255, 1221, 1187, 1153, 1134, 1100, 1066, 1032, 1016, 986, 956, 926, 881, 851, 821, 791, 747, 713, 687, 653, 610, 576, 550, 516, 508, 478, 440, 410, 373, 343, 305, 275, 239, 205, 171, 137, 102, 68, 34, 0]
The values are in reverse order because the HashMap has generated into a LinkedList.
Solution 2
You should have pay attention to the fact that the array is sorted in ascending order.
Lets assume that there are only positive numbers, or the question was not about absolute distinct.
The you could count the Number by iterating trough the list, and increment the counter by one, if the actual element is different from the last. (and +1 for the first element)
If you understand that, you can add the absolute distinct constraint. For example by improving the algorithm by two pointers, one starting from the begin, one from the end. Then you have also to take care that both pointers works like in parallel, so that both pointers will end at 0 or the absoulte lowest number (positive/negative) - This will complicate the whole stuff a bit, but it is possible.
Solution 3
this is what i came up with, not sure if the fact that it contains a for loop inside the while deems it as typical noobie mistake.
private int getDistict(int[] myaa) {
int dupcount=0;
int i = 0;
int j = myaa.length - 1;
while (i < j) {
// check neighbors
if(Math.abs(myaa[i]) == Math.abs(myaa[i+1])) {
dupcount++;
i++;
continue;
}
// check the other side
if(myaa[i] < 0) {
for(int k = j; Math.abs(myaa[i]) <= Math.abs(myaa[k]); k-- ) {
if(Math.abs(myaa[i])==Math.abs(myaa[k])){
dupcount++;
}
}
}
i++;
}
return myaa.length - dupcount;
}
Solution 4
int count(vector<int> &A) {
int len = B.size();
if (len <= 0)
return 0;
// make a copy and calc absolutes of all items
vector<int> B = vector<int>(A);
for (int i = 0; i < len; i++) {
if (B[i] < 0)
B[i] = -B[i];
}
// and sort so we have a simple absolute count
sort(B.begin(), B.end());
int result = 1; //count first number always
for (int j = 1; j < len; j++) {
if (B[j] != B[j-1])
result++;
}
return result;
}
Solution 5
Heres what I coded.....Let me know if it can be improved....
import java.util.Arrays;
import java.util.HashSet;
/********
Joel
Jun 19, 2013
********/
public class CodilityTest {
private int[] inputArray;
public static int count=0;
public void setInput(int [] givenIP){
this.inputArray= new int[givenIP.length];
for(int i=0; i<givenIP.length;i++)
{ inputArray[i] = givenIP[i];}
}
public int[] getInput(){
return this.inputArray;
}
public CodilityTest(){
count=0;
}
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
CodilityTest o_tc = new CodilityTest();
int [] x = {1,2,-3,4,-5,-11,-2,3,-4,5};
int [] y = new int[0];
o_tc.setInput(x);
o_tc.getOutput(x);
System.out.println(count);
CodilityTest o_tc1 = new CodilityTest();
o_tc1.getOutput(y);
}
public void getOutput(int [] givenInput)
{
if(givenInput == null)
{System.out.println("Please Enter some variables Sir."); return;}
if(givenInput.length<1)
{System.out.println("Please Enter some variables Sir."); return; }
if(givenInput.length < 2)
{count+=1; return;}
HashSet<Integer> o_hs = new HashSet<Integer>();
for(int numCount=0; numCount<givenInput.length;numCount++)
{
if(o_hs.add(Math.abs(givenInput[numCount])))
{count+=1;}
}
}
}
yahh
Updated on July 09, 2022Comments
-
yahh almost 2 years
so i took the codility interview test yesterday and was informed today that i failed, unfortunately i wasnt given any other information by either codility nor the employer as to where i screwed up so i would appreciate some help in knowing where i went wrong. i know codility pays alot of emphasis on how fast the program runs and how it behaves for large numbers. now i didnt copy paste the questions so this the approx of what i remember
- count the number of elements in an array a which are absolute distinct, what it means is if an array had -3 and 3 in it these numbers are not distinct because|-3|=|3|. i think an example would clear it up better
a={-5,-3,0,1,-3} the result would be 4 because there are 4 absolute distinct elements in this array.
The question also stated that a.length would be <=10000, and most importantly it stated that assume that the array is sorted in ascending order but i didnt really understand why we would need it to be sorted
IF YOU THINK I MISSED SOMETHING ASK AND I WILL TRY TO CLEAR UP THE QUESTION FURTHER.
here is my code
import java.util.HashMap; import java.util.HashSet; import java.util.Set; public class test2 { int test(int[] a){ Set<Integer> s=new HashSet<Integer>(); for(int i=0;i<a.length;i++){ s.add(Math.abs(a[i])); } return s.size(); } public static void main(String[] args) { test2 t=new test2(); int[] a={1,1,1,2,-1}; System.out.println(t.test(a)); } }
-
yahh about 13 yearsbut even though i created a new structure i am still taking the same amount of time to solve the problem
-
Vishy about 13 yearsIt may appear to be the same amount of time, but if you measure it for a large array of up to 10,000 entries you will find there is a very big relative difference. However, you are right that a simple solution is often the best and fast enough.
-
Ralph about 13 years@yahh: the trick is that you iterate only once over the array. You do not need the time to manage the set (insert). -- You need to think in complexity (O-Notation) not in secondes.
-
Vishy about 13 yearsOr create any objects which is the really expensive part.
-
Buhb about 13 yearsSame time complexity, bur a lookup in a set still takes time. Also, the set needs memory. Try it in a profiler, I'm guessing your solution takes twice as long.
-
yahh about 13 yearsi do understand the fact that my algorithm would waste space but it would take the same amount of time to solve. the website codility doesnt really bother much with space more with running time
-
Stephen Chung about 13 years@yahh, your method is not O(n) time. It requires a lookup for containment within a set, which is probably O(nLogn) if the set if implemented as a tree, or O(n) if implemented as a hash table, but then you'll have the hash function to calculate. Multiply this by huge amounts of data, and the CPU and memory costs will add up quickly. Do not assume something has no or little cost when the costs may be hidden inside its implementation (e.g. in your HashSet).
-
Vishy about 13 yearsI would bet closer to 100x times as long. ;)
-
Buhb about 13 years@stephen: insertion and lookup are both O(1) for HashSet.
-
Buhb about 13 years@peter: I meant at least twice.
-
Christian about 13 years@yahh Well I think in an interview what really counts is the BEST way of doing it. This means in terms of space and memory. Ralphs method with two pointers runs in O(n) and takes O(1) memory. That is the best you can do. So this is obviously the correct answer. I suppose only the perfect answer gives the points.
-
Christian about 13 yearsby the way: What I learned from those interviews / tests is that an error in the code is no problem. If your code does not compile that's not as interesting as the algorithm / idea behind! In most interviews you write code on a piece of paper, where normally even the best programmers do make syntax errors etc.
-
brain about 13 yearsI dont think you should assume that the list doesn't contain duplicates.
-
blaze about 13 yearsyahh: your code us actually slower. java.util.HashSet has constant time add() only if hash function is perfect and no conflicts occurs. It's O(bucket length) otherwise, so in worst case you have O(n^2) performance.
-
yahh about 13 years@blaze: shouldnt i assume that when using the hash function for primitives it would work perfectly.
-
blaze about 13 years@yahh: definitely no. Perfect hashing is when each value hashes to distinct result and your values are "just numbers", without range limitations. Perfect hash function can't do better then return number itself as hash value and it's total waste of memory to allocate 2^N buckets. So you will definitely have some imperfect hashing and possibly buckets reallocations.
-
Buhb about 13 yearsNice to have numbers on the comparison
-
Nikolay Polivanov over 12 yearsYou can implement comparator for sort method so that comparison would be made for absolute values. It would simplify
countDistinct
algorithm. -
Vishy over 12 years@p.kolya That might not work so well with an
int[]
without using extra memory. -
Nikolay Polivanov over 12 years@PeterLawrey: if the array is sorted by absolute values next step would be to iterate once through the array counting absolutely distinct values (calling
abs
everytime) and leaving out absolutely equal ones. It doesn't requires extra memory, has the same complexity, but simplifies algorithm. -
Vishy over 12 years@p.kolya True, if you had a sort which sorted by absolute value, it would make the algo simpler. However, I don't know of any built in one and you can't use a Comparator to do it because it only works on
Integer
notint
-
Kevin K almost 11 years"IMHO HashMap/HashSet is O(log(log(n)) due to collisions, it is only O(1) if there is a perfect hash function" - In the case of
Integer
s, the hash function is perfect...it's just the value of the integer, so there are no collisions. -
Vishy almost 11 years@KevinK It is not perfect because you don't have an unlimited capacity. With a capacity, even Integer will give you collisions.
-
Vishy almost 11 years@KevinK I have added an example to be answer showing how Integers can have significant collisions (in this case every integer has a collision)
-
Kevin K almost 11 yearsd'oh, I forgot about that! There are no hash code collisions, but of course the map only has a limited number of buckets :) Thanks for taking the time to answer, and sorry for the added confusion!
-
Vishy almost 11 years@KevinK +1 Most people don't get as far and think about what you did. It was a very good point to raise.
-
JehandadK over 8 yearsBut the correct answer is 5 for the first set of input!
-
Richard Erickson about 8 yearswelcome to SO and thank you for posting an answer. Please consider expanding it and explaining your code.
-
user2286810 over 3 yearsYou got 78% on codility's server:▶example example test✔OK expand allCorrectness tests ▶one_element✔OK ▶two_elements✔OK ▶same_elements✔OK ▶simple✔OK ▶simple_no_zero✘RUNTIME ERROR tested program terminated with exit code 1 ▶simple_no_same✔OK ▶simple_no_negative✔OK ▶simple_no_positive✔OK ▶arith_overlow✔OK ▶medium_chaotic1✘RUNTIME ERROR tested program terminated with exit code 1 ▶medium_chaotic2✔OK expand allPerformance tests ▶long_sequence_no_negative✔OK ▶long_sequence_no_positive✔OK ▶long_sequence✘RUNTIME ERROR tested program terminated with exit code 1
-
Kunal almost 3 yearsThis is the smallest and the most precise answer written in java 8, please mail me if the explanation of each step is needed, I will be happy to share