is there a faster way to extract unique values from object collection?

12,431

Solution 1

If you want to get or count the distinct areas in the employee list, you can use a set of strings. I'm changing the variable names to match Java standards. You can get the count afterwards. ideally, these would be lazy methods.

Imperative Code

public Set<String> areas(final List<Employee> employees) {
    Set<String> areas = new HashSet<>();
    for(final Employee employee: employees) {
        areas.add(employee.getArea());
    }
    return areas;
}

Functional Code (Google Guava)

public Set<String> areas(final List<Employee> employees) {
    return Sets.newHashSet(
        Lists.transform(employees, new Function<Employee, String>() {
            public String apply(Employee e) {
                return e.getArea();
            }
        }));
}

Lambdas (Java 8)

public Set<String> areas(final List<Employee> employees) {
    return new HashSet<String>(employees.map(e => e.getArea()));
}

Solution 2

Insert all employees into the HashSet. From the definition of Set, they will be all unique.

Set<Employee> unique = new HashSet<Employee>(Arrays.asList(employeeTress));
// unique.toArray() if needed

If you want Employee objects to be considered equal when they have the same AREA, you need to properly override the equals() method in Employee class.

Solution 3

You can use a Set to do this, as others have already stated, but if you want items to be considered equal when they have the same AREA then you'll need to override the equals method in your Employee object to make it compare itself to others based on that variable.

You need to know a few things before just overidding the equals method. There's a discussion about it here: What issues should be considered when overriding equals and hashCode in Java?

Solution 4

Just use HashSet, it will ONLY add unique elements to the HashSet.

The objectOfHashSet.add(Object) function of HashSet will return true on successful addition of the object,

Set<Employee> hs = new HashSet<Employee>();

    if(!hs.add(i2)){
      // do some operation here
    }

You will also need to override the equals method here.

public boolean equals(Object obj) {
        if (obj == null)
            return false;
        if (obj == this)
            return true;
        if (!(obj instanceof Employee))
            return false;

        // HERE PERFORM YOUR CHECK
        if("Employee.NAME".isequals(obj.NAME))
        {return true;}
    }

Also make sure that the hashCode() of the key objects that you put into the collection never changes while the object is in the collection. The best way to ensure this is to make your keys immutable.

Share:
12,431

Related videos on Youtube

montelof
Author by

montelof

Updated on September 15, 2022

Comments

  • montelof
    montelof over 1 year

    I have a method to extract the values from an object collection that is a employee information:

    public class Employee
    {
        public String AREA;
        public String EMPLOYEE_ID;
        public String EMPLOYEE_NAME;
    }
    

    I'd like to get all the distinct Areas I did what I thought would be the easier, just check if the ArrayList contains the value, if not the add it, it takes 187ms to complete, :

        long startTime = System.currentTimeMillis();
        ArrayList<String> distinct_areas = new ArrayList<String>();
        for (int i = 0; i < this.employeeTress.length; i++)
        {
            if (!distinct_areas.contains(this.employeeTress[i].AREA))
                distinct_areas.add(this.employeeTress[i].AREA);
        }
        String[] unique = new String[distinct_areas.size()];
        distinct_areas.toArray(unique);
        long endTime = System.currentTimeMillis();
        System.out.println("Total execution time: " + (endTime - startTime) + "ms");
    

    then I thought to do it differently to see if it gets faster, sorting the array then check only the last item if its different then add it, and its a little bit faster, it takes 121ms to complete:

        startTime = System.currentTimeMillis();
        String[] vs = new String[this.employeeTress.length];
        for (int i = 0; i < this.employeeTress.length; i++)
        {
            vs[i] = this.employeeTress[i].AREA;
        }
        Arrays.sort(vs);
        ArrayList<String> vsunique = new ArrayList<String>();
        vsunique.add(vs[0]);
        for (int i = 0; i < vs.length; i++)
        {
            if (!vsunique.get(vsunique.size()-1).equals(vs[i]))
            {
                vsunique.add(vs[i]);
            }
        }
        String[] uni = new String[vsunique.size()];
        vsunique.toArray(uni);
        endTime = System.currentTimeMillis();
        System.out.println("Total execution time: " + (endTime - startTime) + "ms");
    

    I'm new to Java I'd like to know a better way to do this. *Note, this code should work in android gingerbread API LVL 10 regards.

    • Luiggi Mendoza
      Luiggi Mendoza
      Use a Set instead of a List. Also, I would not worry for this performance improvement until it demonstrates to be a real bottleneck in the application.
  • Luiggi Mendoza
    Luiggi Mendoza over 10 years
    In fact, the Set should be to hold the Employee data instead of the current array OP's using.
  • montelof
    montelof over 10 years
    Imperative form using HashSet is much faster than checking if item is already in the collection. thank you.
  • Eric Jablow
    Eric Jablow over 10 years
    That's because the JRE is doing the same thing. The other versions simply remove the explicit looping. They don't buy you that much snce this isn't a lazy problem; you need to compute the entire thing. Still, you should consider other ways of storing the data. A relational database could store your employees, and with appropriate indexing, the db could work extremely fast. I did fix a typo.