Groovy: Difference with a.intersect( b ) and b.intersect( a )

13,433

If you look at the source for Collection.intersect, you can see that the logic of the method follows this flow:

for two collections, left and right

  1. Swap left and right if left is smaller than right
  2. Add all of left into a Set (removes duplicates)
  3. For each element in right if it exists in the leftSet, then add it to the results

So, for your last 2 examples;

def array1 = ["hello", "world", "test", "test"]
def array2 = ["world", "world", "world", "test"]

array1.intersect( array2 ) would give (if we wrote that same algorithm in Groovy):

leftSet = new TreeSet( array1 ) // both same size, so no swap
// leftSet = [ 'hello', 'world', 'test' ]
right   = array2
result = right.findAll { e -> leftSet.contains( e ) }

Which (if you run it), you can see means result has the value [world, world, world, test] (as you found). This is because every element in right can be found in the leftSet

Not sure why the first example should return ["world","world"] though...

later...

So, what I think you are looking for would be something like this:

def array1 = ["hello", "world", "test", "test"]
def array2 = ["world", "world", "world", "test"]
def intersect1 = array1.intersect( array2 ) as TreeSet
def intersect2 = array2.intersect( array1 ) as TreeSet
assert intersect1 == intersect2

in order that you cope with the duplicates in the collections, as then both intersect1 and intersect2 will be equal to

[test, world]

later still

I believe this does what you want:

[array1,array2]*.groupBy{it}.with { a, b -> assert a == b }
Share:
13,433
divillysausages
Author by

divillysausages

Updated on June 20, 2022

Comments

  • divillysausages
    divillysausages almost 2 years

    Why in Groovy, when I create 2 lists, is there a difference if I do a.intersect( b ) and b.intersect( a ):

    def list1 = ["hello", "world", "world"];
    def list2 = ["world", "world", "world"];
    
    println( "Intersect list1 with list2: " + list1.intersect( list2 ) );
    println( "Intersect list2 with list1: " + list2.intersect( list1) );
    

    traces:

    Intersect list1 with list2: [world, world, world]
    Intersect list2 with list1: [world, world]
    

    (you can copy it here: http://groovyconsole.appspot.com/ if you want to test it)

    If the arrays all contain unique elements, then it works as normal. Once you begin to add duplicates, it gets weird:

    def list1 = ["hello", "world", "test", "test"];
    def list2 = ["world", "world", "world", "test"];
    
    println( "Intersect list1 with list2: " + list1.intersect( list2 ) );
    println( "Intersect list2 with list1: " + list2.intersect( list1 ) );
    

    traces:

    Intersect list1 with list2: [world, world, world, test]
    Intersect list2 with list1: [world, test, test]
    

    I thought the whole point of intersect() was to give you the common elements, so it didn't matter which order you put them in?

    If this isn't the case, how can I get only the common elements (expect duplicates in the array). E.g. example one should return ["world", "world"] and example two should return ["world", "test"]

    Edit

    To clarify a bit, this code should test that user data is still the same (assuming they disconnected in the middle of something and we want to make sure the data hasn't been tampered with, or is in the same state as before).

    The order of the lists can't be guaranteed (the user could reorder it, but it's still technically the "same"), and duplicates are possible.

    So something like: ["one", "one", "two"] should match ["two", "one", "one"], whereas any addition to the lists, or change in data shouldn't match.

  • divillysausages
    divillysausages almost 13 years
    "Not sure why the first example should return ["world","world"]" - basically i'm trying to see if a list has changed between 2 different times (server code - i want to make sure the user has the same data when he comes back and it's not been tampered with). In the first example, there's 2 occurences of "world" in each list, which is why I was looking for a result like ["world", "world"]
  • divillysausages
    divillysausages almost 13 years
    thanks for the code example, but it won't work with something like array1 as ["hello", "world", "world"];, while array2 is ["hello", "hello", "world"]; - they both contain duplicates, but using TreeSet will assume both lists are the same (as duplicates are removed)
  • tim_yates
    tim_yates almost 13 years
    @divilly Ahhhh... That's true... Can you explain more what it is you're trying to do? And why assert ["hello", "world", "world"] != ["hello", "hello", "world"] isn't enough? I'm not sure intersection is going to help you here...but with our combined powers, we must be able to come up with a working algorithm...
  • divillysausages
    divillysausages almost 13 years
    ok, got it working with the following: 1) check size of array1 and array2 before. if they're different, then fail. 2) def array3 = new ArrayList( array1 );, 3) array2.each { array3.remove( it ); } 4) if array3's size is 0 at the end, they're the same
  • OverZealous
    OverZealous almost 13 years
    @tim_yates For that last one, couldn't you just do assert array1.sort() == array2.sort()? This would be a lot easier to understand what's happening!
  • tim_yates
    tim_yates almost 13 years
    @Over yeah... Though you probably want Groovy 1.8.1+ and array1.sort( false ), as otherwise sort actually changes the order of array1