Which is more efficient, a for-each loop, or an iterator?

java collections foreach

198,683

Solution 1

If you are just wandering over the collection to read all of the values, then there is no difference between using an iterator or the new for loop syntax, as the new syntax just uses the iterator underwater.

If however, you mean by loop the old "c-style" loop:

for(int i=0; i<list.size(); i++) {
   Object o = list.get(i);
}

Then the new for loop, or iterator, can be a lot more efficient, depending on the underlying data structure. The reason for this is that for some data structures, get(i) is an O(n) operation, which makes the loop an O(n²) operation. A traditional linked list is an example of such a data structure. All iterators have as a fundamental requirement that next() should be an O(1) operation, making the loop O(n).

To verify that the iterator is used underwater by the new for loop syntax, compare the generated bytecodes from the following two Java snippets. First the for loop:

List<Integer>  a = new ArrayList<Integer>();
for (Integer integer : a)
{
  integer.toString();
}
// Byte code
 ALOAD 1
 INVOKEINTERFACE java/util/List.iterator()Ljava/util/Iterator;
 ASTORE 3
 GOTO L2
L3
 ALOAD 3
 INVOKEINTERFACE java/util/Iterator.next()Ljava/lang/Object;
 CHECKCAST java/lang/Integer
 ASTORE 2 
 ALOAD 2
 INVOKEVIRTUAL java/lang/Integer.toString()Ljava/lang/String;
 POP
L2
 ALOAD 3
 INVOKEINTERFACE java/util/Iterator.hasNext()Z
 IFNE L3

And second, the iterator:

List<Integer>  a = new ArrayList<Integer>();
for (Iterator iterator = a.iterator(); iterator.hasNext();)
{
  Integer integer = (Integer) iterator.next();
  integer.toString();
}
// Bytecode:
 ALOAD 1
 INVOKEINTERFACE java/util/List.iterator()Ljava/util/Iterator;
 ASTORE 2
 GOTO L7
L8
 ALOAD 2
 INVOKEINTERFACE java/util/Iterator.next()Ljava/lang/Object;
 CHECKCAST java/lang/Integer
 ASTORE 3
 ALOAD 3
 INVOKEVIRTUAL java/lang/Integer.toString()Ljava/lang/String;
 POP
L7
 ALOAD 2
 INVOKEINTERFACE java/util/Iterator.hasNext()Z
 IFNE L8

As you can see, the generated byte code is effectively identical, so there is no performance penalty to using either form. Therefore, you should choose the form of loop that is most aesthetically appealing to you, for most people that will be the for-each loop, as that has less boilerplate code.

Solution 2

The difference isn't in performance, but in capability. When using a reference directly you have more power over explicitly using a type of iterator (e.g. List.iterator() vs. List.listIterator(), although in most cases they return the same implementation). You also have the ability to reference the Iterator in your loop. This allows you to do things like remove items from your collection without getting a ConcurrentModificationException.

e.g.

This is ok:

Set<Object> set = new HashSet<Object>();
// add some items to the set

Iterator<Object> setIterator = set.iterator();
while(setIterator.hasNext()){
     Object o = setIterator.next();
     if(o meets some condition){
          setIterator.remove();
     }
}

This is not, as it will throw a concurrent modification exception:

Set<Object> set = new HashSet<Object>();
// add some items to the set

for(Object o : set){
     if(o meets some condition){
          set.remove(o);
     }
}

Solution 3

To expand on Paul's own answer, he has demonstrated that the bytecode is the same on that particular compiler (presumably Sun's javac?) but different compilers are not guaranteed to generate the same bytecode, right? To see what the actual difference is between the two, let's go straight to the source and check the Java Language Specification, specifically 14.14.2, "The enhanced for statement":

The enhanced for statement is equivalent to a basic for statement of the form:

for (I #i = Expression.iterator(); #i.hasNext(); ) {
    VariableModifiers(opt) Type Identifier = #i.next();    
    Statement 
}

In other words, it is required by the JLS that the two are equivalent. In theory that could mean marginal differences in bytecode, but in reality the enhanced for loop is required to:

Invoke the .iterator() method
Use .hasNext()
Make the local variable available via .next()

So, in other words, for all practical purposes the bytecode will be identical, or nearly-identical. It's hard to envisage any compiler implementation which would result in any significant difference between the two.

Solution 4

The foreach underhood is creating the iterator, calling hasNext() and calling next() to get the value; The issue with the performance comes only if you are using something that implements the RandomomAccess.

for (Iterator<CustomObj> iter = customList.iterator(); iter.hasNext()){
   CustomObj custObj = iter.next();
   ....
}

Performance issues with the iterator-based loop is because it is:

allocating an object even if the list is empty (Iterator<CustomObj> iter = customList.iterator(););
iter.hasNext() during every iteration of the loop there is an invokeInterface virtual call (go through all the classes, then do method table lookup before the jump).
the implementation of the iterator has to do at least 2 fields lookup in order to make hasNext() call figure the value: #1 get current count and #2 get total count
inside the body loop, there is another invokeInterface virtual call iter.next(so: go through all the classes and do method table lookup before the jump) and as well has to do fields lookup: #1 get the index and #2 get the reference to the array to do the offset into it (in every iteration).

A potential optimiziation is to switch to an index iteration with the cached size lookup:

for(int x = 0, size = customList.size(); x < size; x++){
  CustomObj custObj = customList.get(x);
  ...
}

Here we have:

one invokeInterface virtual method call customList.size() on the initial creation of the for loop to get the size
the get method call customList.get(x) during the body for loop, which is a field lookup to the array and then can do the offset into the array

We reduced a ton of method calls, field lookups. This you don't want to do with LinkedList or with something that is not a RandomAccess collection obj, otherwise the customList.get(x) is gonna turn into something that has to traverse the LinkedList on every iteration.

This is perfect when you know that is any RandomAccess based list collection.

Solution 5

foreach uses iterators under the hood anyway. It really is just syntactic sugar.

Consider the following program:

import java.util.List;
import java.util.ArrayList;

public class Whatever {
    private final List<Integer> list = new ArrayList<>();
    public void main() {
        for(Integer i : list) {
        }
    }
}

Let's compile it with javac Whatever.java,
And read the disassembled bytecode of main(), using javap -c Whatever:

public void main();
  Code:
     0: aload_0
     1: getfield      #4                  // Field list:Ljava/util/List;
     4: invokeinterface #5,  1            // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;
     9: astore_1
    10: aload_1
    11: invokeinterface #6,  1            // InterfaceMethod java/util/Iterator.hasNext:()Z
    16: ifeq          32
    19: aload_1
    20: invokeinterface #7,  1            // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
    25: checkcast     #8                  // class java/lang/Integer
    28: astore_2
    29: goto          10
    32: return

We can see that foreach compiles down to a program which:

Creates iterator using List.iterator()
If Iterator.hasNext(): invokes Iterator.next() and continues loop

As for "why doesn't this useless loop get optimized out of the compiled code? we can see that it doesn't do anything with the list item": well, it's possible for you to code your iterable such that .iterator() has side-effects, or so that .hasNext() has side-effects or meaningful consequences.

You could easily imagine that an iterable representing a scrollable query from a database might do something dramatic on .hasNext() (like contacting the database, or closing a cursor because you've reached the end of the result set).

So, even though we can prove that nothing happens in the loop body… it is more expensive (intractable?) to prove that nothing meaningful/consequential happens when we iterate. The compiler has to leave this empty loop body in the program.

The best we could hope for would be a compiler warning. It's interesting that javac -Xlint:all Whatever.java does not warn us about this empty loop body. IntelliJ IDEA does though. Admittedly I have configured IntelliJ to use Eclipse Compiler, but that may not be the reason why.

View more solutions

198,683

Paul Wagland

I have been a computer geek for pretty much as long as I can remember. I first got into computers when my original hobby of electronics started getting too expensive for my pocket money, and have since managed to turn my hobby into my career. I like to keep up with the latest in technology, and how to apply it to what I am working on. SOreadytohelp

Updated on November 22, 2020

Comments

Paul Wagland over 3 years
Which is the most efficient way to traverse a collection?
```
List<Integer>  a = new ArrayList<Integer>();
for (Integer integer : a) {
  integer.toString();
}
```
or
```
List<Integer>  a = new ArrayList<Integer>();
for (Iterator iterator = a.iterator(); iterator.hasNext();) {
   Integer integer = (Integer) iterator.next();
   integer.toString();
}
```
Please note, that this is not an exact duplicate of this, this, this, or this, although one of the answers to the last question comes close. The reason that this is not a dupe, is that most of these are comparing loops where you call get(i) inside the loop, rather than using the iterator.

As suggested on Meta I will be posting my answer to this question.
- Hassan Syed over 14 years
  
  I would think it does not make a difference since its Java and the templating mechanism is little more than syntactic sugar
- OMG Ponies over 14 years
  
  Potential Duplicate: stackoverflow.com/questions/89891/…
- Paul Wagland over 14 years
  
  @OMG Ponies: I don't believe that this is a duplicate, since that does not compare the loop with the iterator, but rather asks why do the collections return iterators, rather than having the iterators directly on the class themselves.
Mark Elliot over 14 years

Thanks -- the original answer suggested the opposite of your clarification, what you've written makes far more sense.
Paul Wagland over 14 years

This is very true, even though it doesn't directly answer the question I have given it +1 for being informative, and answering the logical follow-on question.
Michael Krauklis over 14 years

I believe he was saying the opposite, that foo.get(i) can be a lot less efficient. Think of LinkedList. If you do a foo.get(i) on the middle of a LinkedList it has to traverse all the previous nodes to get to i. An iterator, on the other hand, will keep a handle to the underlying data structure and will allow you to walk over the nodes one at a time.
Paul Wagland over 14 years

Actually, the test I did was with the Eclipse compiler, but your general point still stands. +1
Paul Wagland over 14 years

@Michael: The answer as it currently stands makes that clear, originally however it was not so well worded, I updated it after Mark's comment to clarify the answer.
Rajesh Pitty over 11 years

please add some illustrative examples to support your statements.
Brett Ryan about 11 years

Not a big thing but a for(int i; i < list.size(); i++) { style loop also must evaluate list.size() at the end of each iteration, if it is used it's sometimes more efficient to cache the result of list.size() first.
andresp over 10 years

Actually, the original statement is also true for the case of ArrayList and all others that implement the RandomAccess interface. The "C-style" loop is faster than the Iterator-based one. docs.oracle.com/javase/7/docs/api/java/util/RandomAccess.htm‌l
Dan over 10 years

One reason to use the old C-style loop rather than the Iterator approach, regardless of whether it's the foreach or the desugar'd version, is garbage. Many data structures instantiate a new Iterator when .iterator() is called, however they can be accessed allocation-free using the C-style loop. This can be important in certain high-performance environments where one is trying to avoid (a) hitting the allocator or (b) garbage collections.
Akash5288 over 10 years

Yes we can access collection elements with foreach loop, but we cannot remove them, but we can remove elements with Iterator.
Leo over 9 years

Just to add to @BrettRyan 's comment, a nice and compact way to cache the size is to do the following: for (int i = 0, len = list.size(); i <len ; i++). The len=list.size() will only get executed once in the beginning of the loop, and the len cached value will be tested each time instead
Leo over 9 years

Just as another comment, for ArrayLists, the for(int i = 0 .... ) loop is about 2x faster than using the iterator or the for (:) approach, so it really does depend on the underlying structure. And as a side note, iterating HashSets is also very expensive (much more than an Array List), so avoid those like the plague (if you can).
Michael Ramos about 9 years

By "underwater" I am taking this to mean Abstraction. Am I right in doing so?
Ashwin over 8 years

@PaulWagland : Hi Paul. Does for each use iterator underneath?
Paul Wagland over 8 years

@ashwin yes, as you can see from the disassembled code.
kaiser over 8 years

@Chandan Sorry but what you've written is wrong. For example: std::vector is also a collection but its access costs O(1). So a traditional for loop over a vector is just O(n). I think you want to say, if the access of the underlaying container has access cost of O(n), so it is for std::list, than there is a complexity of O(n^2). Using iterators in that case will reduce the cost to O(n), because iterators allows direct access to elements.
ydobonebi over 8 years

If you do the time difference calculation, make sure both sets are sorted (or fairly distributed random unsorted) and run the test twice for each set and calculate the second run of each only. Check your timings again with this (its a long explanation as to why you need to run the test twice). You need to demonstrate (perhaps with code) how this is true. Otherwise as far as I've known both are identical in terms of performance, but not capability.
Paul Wagland about 7 years

Your statement about the difference isn't true, the for each loop also uses an iterator underwater, and so has the same behaviour.
eccentricCoder about 7 years

@Pault Wagland, I have modified my answer thank you for pointing out the mistake
Paul Wagland about 7 years

your updates are still not accurate. The two code snippets that you have are defined by the language to be the same. If there is any difference in behaviour that is a bug in the implementation. The only difference is whether or not you have access to the iterator.
eccentricCoder about 7 years

@Paul Wagland Even if you use the default implementation of for each loop that uses an iterator, it will still throw and exception if u try to use remove() method during concurrent operations. Checkout the following for more information here
Paul Wagland about 7 years

with the for each loop, you don't get access to the iterator, so you can't call remove on it. But that is beside the point, in your answer you claim that one is thread safe, while the other isn't. According to the language specification they are equivalent, so they are both only as thread safe as the underlying collections.
eccentricCoder about 7 years

@Sorry my mistake.
denis_lor about 6 years

Totally agree with @andresp, here I posted a more detailed view on it: stackoverflow.com/a/49334587/3564632