How to optimize vlookup for high search count ? (alternatives to VLOOKUP)

44,628

Solution 1

I considered the following alternatives:

  • VLOOKUP array-formula
  • MATCH / INDEX
  • VBA (using a dictionary)

The compared performance is:

  • VLOOKUP simple formula : ~10 minutes
  • VLOOKUP array-formula : ~10 minutes (1:1 performance index)
  • MATCH / INDEX : ~2 minutes (5:1 performance index)
  • VBA (using a dictionary) : ~6 seconds (100:1 performance index)

Using the same reference sheet

1) Lookup sheet: (vlookup array formula version)

         A          B
     1
     2   key51359    {=VLOOKUP(A2:A10001;sheet1!$A$2:$B$100001;2;FALSE)}
     3   key41232    formula in B2
     4   key10102    ... extends to
   ...   ...         ... 
 99999   key4153     ... cell B100001
100000   key12818    ... (select whole range, and press
100001   key35032    ... CTRL+SHIFT+ENTER to make it an array formula)
100002

2) Lookup sheet: (match+index version)

         A           B                                       C
      1
      2  key51359    =MATCH(A2;sheet1!$A$2:$A$100001;)       =INDEX(sheet1!$B$2:$B$100001;B2)
      3  key41232    =MATCH(A3;sheet1!$A$2:$A$100001;)       =INDEX(sheet1!$B$2:$B$100001;B3)
      4  key10102    =MATCH(A4;sheet1!$A$2:$A$100001;)       =INDEX(sheet1!$B$2:$B$100001;B4)
    ...  ...         ...                                     ...
  99999  key4153     =MATCH(A99999;sheet1!$A$2:$A$100001;)   =INDEX(sheet1!$B$2:$B$100001;B99999)
 100000  key12818    =MATCH(A100000;sheet1!$A$2:$A$100001;)  =INDEX(sheet1!$B$2:$B$100001;B100000)
 100001  key35032    =MATCH(A100001;sheet1!$A$2:$A$100001;)  =INDEX(sheet1!$B$2:$B$100001;B100001)
 100002

3) Lookup sheet: (vbalookup version)

       A          B
     1
     2  key51359    {=vbalookup(A2:A50001;sheet1!$A$2:$B$100001;2)}
     3  key41232    formula in B2
     4  key10102    ... extends to
   ...  ...         ...
 50000  key91021    ... 
 50001  key42       ... cell B50001
 50002  key21873    {=vbalookup(A50002:A100001;sheet1!$A$2:$B$100001;2)}
 50003  key31415    formula in B50001 extends to
   ...  ...         ...
 99999  key4153     ... cell B100001
100000  key12818    ... (select whole range, and press
100001  key35032    ... CTRL+SHIFT+ENTER to make it an array formula)
100002

NB : For some (external internal) reason, the vbalookup fails to return more than 65536 data at a time. So I had to split the array formula in two.

and the associated VBA code :

Function vbalookup(lookupRange As Range, refRange As Range, dataCol As Long) As Variant
  Dim dict As New Scripting.Dictionary
  Dim myRow As Range
  Dim I As Long, J As Long
  Dim vResults() As Variant

  ' 1. Build a dictionnary
  For Each myRow In refRange.Columns(1).Cells
    ' Append A : B to dictionnary
    dict.Add myRow.Value, myRow.Offset(0, dataCol - 1).Value
  Next myRow

  ' 2. Use it over all lookup data
  ReDim vResults(1 To lookupRange.Rows.Count, 1 To lookupRange.Columns.Count) As Variant
  For I = 1 To lookupRange.Rows.Count
    For J = 1 To lookupRange.Columns.Count
      If dict.Exists(lookupRange.Cells(I, J).Value) Then
        vResults(I, J) = dict(lookupRange.Cells(I, J).Value)
      End If
    Next J
  Next I

  vbalookup = vResults
End Function

NB: Scripting.Dictionary requires a referenc to Microsoft Scripting Runtime which must be added manually (Tools->References menu in the Excel VBA window)

Conclusion :

In this context, VBA using a dictionary is 100x faster than using VLOOKUP and 20x faster than MATCH/INDEX

Solution 2

You also may want to consider using the “double Vlookup” method (not my idea - seen elsewhere). I tested it on 100,000 lookup values on sheet 2 (randomly sorted) with an identical data set as the one you’ve described on sheet 1, and timed it at just under 4 seconds. The code is also a bit simpler.

Sub FastestVlookup()

    With Sheet2.Range("B1:B100000")
        .FormulaR1C1 = _
        "=IF(VLOOKUP(RC1,Sheet1!R1C1:R100000C1,1)=RC1,VLOOKUP(RC1,Sheet1!R1C1:R100000C2,2),""N/A"")"
        .Value = .Value
    End With

End Sub

Solution 3

Switch to Excel 2013 and use Data Model. There you can define a column with unique ID keys in both tables and bind those two tables with relationship in Pivot Table. Than if absolutely necessary you can use Getpivotdata() to fill the first table. I had a ~250K rows table doing vlookup in the similar ~250K rows table. Stopped Excel calculating it after an hour. With Data Model it took less than 10sec.

Share:
44,628
d-stroyer
Author by

d-stroyer

Software enthousiast, autodidact. 12 years experience

Updated on February 01, 2020

Comments

  • d-stroyer
    d-stroyer about 4 years

    I am looking for alternatives to vlookup, with improved performance within the context of interest.

    The context is the following:

    • I have a data set of {key;data} which is big (~ 100'000 records)
    • I want to perform a lot of VLOOKUP operations on the dataset (typical use is to reorder the whole dataset)
    • My data set has no duplicate keys
    • I am looking only for exact matches (last argument to VLOOKUP is FALSE)

    A schema to explain :

    Reference sheet : ("sheet1")

            A           B
         1
         2  key1        data1
         3  key2        data2
         4  key3        data3
       ...  ...         ...
     99999  key99998    data99998
    100000  key99999    data99999
    100001  key100000   data100000
    100002
    

    Lookup sheet:

            A           B
         1
         2  key51359    =VLOOKUP(A2;sheet1!$A$2:$B$100001;2;FALSE)
         3  key41232    =VLOOKUP(A3;sheet1!$A$2:$B$100001;2;FALSE)
         4  key10102    =VLOOKUP(A3;sheet1!$A$2:$B$100001;2;FALSE)
       ...  ...         ...
     99999  key4153     =VLOOKUP(A99999;sheet1!$A$2:$B$100001;2;FALSE)
    100000  key12818    =VLOOKUP(A100000;sheet1!$A$2:$B$100001;2;FALSE)
    100001  key35032    =VLOOKUP(A100001;sheet1!$A$2:$B$100001;2;FALSE)
    100002
    

    On my Core i7 M 620 @2.67 GHz, this computes in ~10 minutes

    Are there alternatives to VLOOKUP with better performance in this context ?

    • phuclv
      phuclv about 6 years
      with such a large dataset then Excel may not be a good choice. Using a professional database like *SQL or MS access would be much better
  • fatSlave
    fatSlave over 9 years
    I am trying to use your vbalookup to replace vlookup functionality in my excel. I have the data in one tab and various formulas in other tab. if i got it right I should use in the same syntax as vlookup: vlookup(value, range, col). Now here is the issue: it always return me #value as the result. Did u see the same?
  • davejal
    davejal about 7 years
    Could you explain your code a little. What changes should be done?
  • davejal
    davejal about 7 years
    All fields get this value :"#NAME?"
  • Admin
    Admin about 7 years
    If you're getting the "#NAME?" error, chances are you've misspelled "VLOOKUP" in your code, or you have a "(" in the wrong place. Check that & try again.
  • Nick van H.
    Nick van H. over 6 years
    Instead of declaring it as a scripting.dictionary you could also use Set dict = CreateObject("Scripting.Dictionary"), this doesn't require an additional VBA reference and the code is compatible with the default VBA installation
  • Pierre44
    Pierre44 about 6 years
    If someone find an Answer to fatSlave's Question please post it here as well: stackoverflow.com/questions/48209023/…