How would I compare two Lists(Of <CustomClass>) in VB?

16,736

Your example should return false. List<> is ordered collection, and if X contains foo,bar,baz, it is not the same as baz,foo,bar. You are not supposed to change this kind of expected behavior! What you describe is a set or bag, not a list. Bag is similar to list, but doesn't care about the order of items. Then it would obviously return true in your example.

Edit for clarity: List<> is ordered by indices of items, not their values. And this important property of this kind of collection affects its behavior and speed in all operations.

OK, forget about it, and let's say you just want to test it as you defined. The fastest universally-applicable algorithm will be O(n*log2n) if you sort the contents and then compare.

Options:

  1. You can sort both lists to two temp collections, then simply compare their items.
  2. You can sort X to a temp collection, and then sequentially go thru Y and try to find each item of Y in X and remove it. In the end you can see if temp is empty, then X and Y are the same.
  3. If you often compare for equality and not so often insert new items, you can create some new "smart" List, which will sort items on insertion. It will consist of a real List<> and some kind of sorted collection inside.

Rough analysis of complexity:

Options 1. and 2. are O(n) on insertion of n items, and O(n*log2n) on comparison of two lists.

Option 3. is O(n*log2n) on insertion of n items, and O(n) on comparison of two lists.

If you want to get under O(n*log2n), you probably have to go to a different collection than List<>. If you use class HashSet instead (present since .NET 3.5), you can probably get a much faster result. But I haven't tried it in praxis. HashSet is an implementation of unordered collection aka set/bag. It doesn't support comparison of two hashsets, but you can do this on your own this way:

Use foreach to iterate thru items of X and try to find each item in Y.

if(X.Count != Y.Count) return false;
foreach(var i in X) {
  if(Y.Contains(i)==false) return false;
}
return true;

This code is very slow on lists, it's O(n^2). But it should be quite fast on HashSets, O(n). If you don't encounter any more hidden problems, this should fly like a rocket!

In praxis, if you need to stay with List<> and want the speed of HashSet<>, you can rewrite just one of your collection (either X or Y) to a new class which will contain two collections inside - a List<> and a HashSet<> - and will store all items to both of them.

You can also use a classic Dictionary<>, HashTable or some other classic collection of .NET to implement the same thing. They will run faster than comparing two Lists, but probably need more memory and not be as fast as HashSets.

Share:
16,736
Kumba
Author by

Kumba

Look at me still talking when there’s science to do, When I look out there it makes me glad I’m not you, I’ve experiments to run, there is research to be done, On the people who are still alive...

Updated on June 04, 2022

Comments

  • Kumba
    Kumba almost 2 years

    I'm working on implementing the equality operator = for a custom class of mine. The class has one property, Value, which is itself a List(Of OtherClass), where OtherClass is yet another custom class in my project.

    I've already implemented the IComparer, IComparable, IEqualityComparer, and IEquatable interfaces, the operators =, <>, bool and not, and overriden Equals and GetHashCode for OtherClass. This should give me all the tools I need to compare these objects, and various tests comparing two singular instances of these objects so far checks out.

    However, I'm not sure how to approach this when they are in a List. I don't care about the list order. Given:

    Dim x As New List(Of OtherClass) From
        {New OtherClass("foo"),
         New OtherClass("bar"),
         New OtherClass("baz")}
    
    Dim y As New List(Of OtherClass) From
        {New OtherClass("baz"),
         New OtherClass("foo"),
         New OtherClass("bar")}
    

    Then (x = y).ToString should print out True.

    I need to compare the same (not distinct) set of objects in this list. The list shouldn't support dupes of OtherClass, but I'll have to figure out how to add that in later as an exception. Not interested in using LINQ. It looks nice, but in the few examples I've played with, adds a performance overhead in that bugs me. Loops are ugly, but they are fast :)

    A straight code answer is fine, but I'd like to understand the logic needed for such a comparison as well. I'm probably going to have to implement said logic more than a few times down the road.