Excel VBA Performance - 1 million rows - Delete rows containing a value, in less than 1 min
Solution 1
I'm providing the first answer as a reference
Others may find it useful, if there are no other options available
- Fastest way to achieve the result is not to use the Delete operation
- Out of 1 million records it removes 100,000 rows in an average of 33 seconds
.
Sub DeleteRowsWithValuesNewSheet() '100K records 10K to delete
'Test 1: 2.40234375 sec
'Test 2: 2.41796875 sec
'Test 3: 2.40234375 sec
'1M records 100K to delete
'Test 1: 32.9140625 sec
'Test 2: 33.1484375 sec
'Test 3: 32.90625 sec
Dim oldWs As Worksheet, newWs As Worksheet, rowHeights() As Long
Dim wsName As String, t As Double, oldUsedRng As Range
FastWB True: t = Timer
Set oldWs = Worksheets(1)
wsName = oldWs.Name
Set oldUsedRng = oldWs.Range("A1", GetMaxCell(oldWs.UsedRange))
If oldUsedRng.Rows.Count > 1 Then 'If sheet is not empty
Set newWs = Sheets.Add(After:=oldWs) 'Add new sheet
With oldUsedRng
.AutoFilter Field:=1, Criteria1:="<>Test String"
.Copy 'Copy visible data
End With
With newWs.Cells
.PasteSpecial xlPasteColumnWidths
.PasteSpecial xlPasteAll 'Paste data on new sheet
.Cells(1, 1).Select 'Deselect paste area
.Cells(1, 1).Copy 'Clear Clipboard
End With
oldWs.Delete 'Delete old sheet
newWs.Name = wsName
End If
FastWB False: InputBox "Duration: ", "Duration", Timer - t
End Sub
.
At high level:
- It creates a new worksheet, and keeps a reference to the initial sheet
- AutoFilters column 1 on the searched text:
.AutoFilter Field:=1, Criteria1:="<>Test String"
- Copies all (visible) data from initial sheet
- Pastes column widths, formats, and data to the new sheet
- Deletes initial sheet
- Renames the new sheet to the old sheet name
It uses the same helper functions posted in the question
The 99% of the duration is used by the AutoFilter
.
There are a couple limitations I found so far, the first can be addressed:
If there are any hidden rows on the initial sheet, it unhides them
- A separate function is needed to hide them back
- Depending on implementation, it might significantly increase duration
VBA related:
- It changes the Code Name of the sheet; other VBA referring to Sheet1 will be broken (if any)
- It deletes all VBA code associated with the initial sheet (if any)
.
A few notes about using large files like this:
- The binary format (.xlsb) reduce file size dramatically (from 137 Mb to 43 Mb)
Unmanaged Conditional Formatting rules can cause exponential performance issues
- The same for Comments, and Data validation
Reading file or data from network is much slower than working with a locall file
Solution 2
A significant gain in speed can be achieved if the source data do not contain formulas, or if the scenario would allow (or want) the formulas to be converted into hard values during the conditional row deletions.
With the above as a caveat, my solution uses the AdvancedFilter of the range object. It's about twice as fast as DeleteRowsWithValuesNewSheet().
Public Sub ExcelHero()
Dim t#, crit As Range, data As Range, ws As Worksheet
Dim r&, fc As Range, lc As Range, fr1 As Range, fr2 As Range
FastWB True
t = Timer
Set fc = ActiveSheet.UsedRange.Item(1)
Set lc = GetMaxCell
Set data = ActiveSheet.Range(fc, lc)
Set ws = Sheets.Add
With data
Set fr1 = data.Worksheet.Range(fc, fc.Offset(, lc.Column))
Set fr2 = ws.Range(ws.Cells(fc.Row, fc.Column), ws.Cells(fc.Row, lc.Column))
With fr2
fr1.Copy
.PasteSpecial xlPasteColumnWidths: .PasteSpecial xlPasteAll
.Item(1).Select
End With
Set crit = .Resize(2, 1).Offset(, lc.Column + 1)
crit = [{"Column 1";"<>Test String"}]
.AdvancedFilter xlFilterCopy, crit, fr2
.Worksheet.Delete
End With
FastWB False
r = ws.UsedRange.Rows.Count
Debug.Print "Rows: " & r & ", Duration: " & Timer - t & " seconds"
End Sub
Solution 3
On my elderly Dell Inspiron 1564 (Win 7 Office 2007) this:
Sub QuickAndEasy()
Dim rng As Range
Set rng = Range("AA2:AA1000001")
Range("AB1") = Now
Application.ScreenUpdating = False
With rng
.Formula = "=If(A2=""Test String"",0/0,A2)"
.Cells.SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete
.Clear
End With
Application.ScreenUpdating = True
Range("AC1") = Now
End Sub
took about 10 seconds to run. I am assuming that column AA is available.
EDIT#1:
Please note that this code does not set Calculation to Manual. Performance will improve if the Calculation mode is set to Manual after the "helper" column is allowed to calculate.
paul bica
Over the years I found amazing vba solutions here I'd like to help if I can - minimal questions are most effective . SOReadyToHelp Email [email protected]
Updated on November 26, 2020Comments
-
paul bica over 3 years
I am trying to find a way to filter large data and remove rows in a worksheet, in less than one minute
The goal:
- Find all records containing specific text in column 1, and delete the entire row
- Keep all cell formatting (colors, font, borders, column widths) and formulas as they are
.
Test Data:
:
.
How the code works:
- It starts by turning all Excel features Off
-
If the workbook is not empty and the text value to be removed exists in column 1
- Copies the used range of column 1 to an array
- Iterates over every value in array backwards
-
When it finds a match:
- Appends the cell address to a tmp string in the format
"A11,A275,A3900,..."
- If the tmp variable length is close to 255 characters
- Deletes rows using
.Range("A11,A275,A3900,...").EntireRow.Delete Shift:=xlUp
- Resets tmp to empty and moves on to the next set of rows
- Appends the cell address to a tmp string in the format
- At the end, it turns all Excel features back On
.
The main issue is the Delete operation, and total duration time should be under one minute. Any code-based solution is acceptable as long as it performs under 1 minute.
This narrows the scope to very few acceptable answers. The answers already provided are also very short and easy to implement. One performs the operation in about 30 seconds, so there is at least one answer that provides an acceptable solution, and other may find it useful as well
.
My main initial function:
Sub DeleteRowsWithValuesStrings() Const MAX_SZ As Byte = 240 Dim i As Long, j As Long, t As Double, ws As Worksheet Dim memArr As Variant, max As Long, tmp As String Set ws = Worksheets(1) max = GetMaxCell(ws.UsedRange).Row FastWB True: t = Timer With ws If max > 1 Then If IndexOfValInRowOrCol("Test String", , ws.UsedRange) > 0 Then memArr = .Range(.Cells(1, 1), .Cells(max, 1)).Value2 For i = max To 1 Step -1 If memArr(i, 1) = "Test String" Then tmp = tmp & "A" & i & "," If Len(tmp) > MAX_SZ Then .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp tmp = vbNullString End If End If Next If Len(tmp) > 0 Then .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp End If .Calculate End If End If End With FastWB False: InputBox "Duration: ", "Duration", Timer - t End Sub
Helper functions (turn Excel features off and on):
Public Sub FastWB(Optional ByVal opt As Boolean = True) With Application .Calculation = IIf(opt, xlCalculationManual, xlCalculationAutomatic) .DisplayAlerts = Not opt .DisplayStatusBar = Not opt .EnableAnimations = Not opt .EnableEvents = Not opt .ScreenUpdating = Not opt End With FastWS , opt End Sub Public Sub FastWS(Optional ByVal ws As Worksheet = Nothing, _ Optional ByVal opt As Boolean = True) If ws Is Nothing Then For Each ws In Application.ActiveWorkbook.Sheets EnableWS ws, opt Next Else EnableWS ws, opt End If End Sub Private Sub EnableWS(ByVal ws As Worksheet, ByVal opt As Boolean) With ws .DisplayPageBreaks = False .EnableCalculation = Not opt .EnableFormatConditionsCalculation = Not opt .EnablePivotTable = Not opt End With End Sub
Finds last cell with data (thanks @ZygD - now I tested it in several scenarios):
Public Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range 'Returns the last cell containing a value, or A1 if Worksheet is empty Const NONEMPTY As String = "*" Dim lRow As Range, lCol As Range If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange If WorksheetFunction.CountA(rng) = 0 Then Set GetMaxCell = rng.Parent.Cells(1, 1) Else With rng Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _ After:=.Cells(1, 1), _ SearchDirection:=xlPrevious, _ SearchOrder:=xlByRows) If Not lRow Is Nothing Then Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _ After:=.Cells(1, 1), _ SearchDirection:=xlPrevious, _ SearchOrder:=xlByColumns) Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column) End If End With End If End Function
Returns the index of a match in the array, or 0 if a match is not found:
Public Function IndexOfValInRowOrCol( _ ByVal searchVal As String, _ Optional ByRef ws As Worksheet = Nothing, _ Optional ByRef rng As Range = Nothing, _ Optional ByRef vertical As Boolean = True, _ Optional ByRef rowOrColNum As Long = 1 _ ) As Long 'Returns position in Row or Column, or 0 if no matches found Dim usedRng As Range, result As Variant, searchRow As Long, searchCol As Long result = CVErr(9999) '- generate custom error Set usedRng = GetUsedRng(ws, rng) If Not usedRng Is Nothing Then If rowOrColNum < 1 Then rowOrColNum = 1 With Application If vertical Then result = .Match(searchVal, rng.Columns(rowOrColNum), 0) Else result = .Match(searchVal, rng.Rows(rowOrColNum), 0) End If End With End If If IsError(result) Then IndexOfValInRowOrCol = 0 Else IndexOfValInRowOrCol = result End Function
.
Update:
Tested 6 solutions (3 tests each): Excel Hero's solution is the fastest so far (removes formulas)
.
Here are the results, fastest to the slowest:
.
Test 1. Total of 100,000 records, 10,000 to be deleted:
1. ExcelHero() - 1.5 seconds 2. DeleteRowsWithValuesNewSheet() - 2.4 seconds 3. DeleteRowsWithValuesStrings() - 2.45 minutes 4. DeleteRowsWithValuesArray() - 2.45 minutes 5. QuickAndEasy() - 3.25 minutes 6. DeleteRowsWithValuesUnion() - Stopped after 5 minutes
.
Test 2. Total of 1 million records, 100,000 to be deleted:
1. ExcelHero() - 16 seconds (average) 2. DeleteRowsWithValuesNewSheet() - 33 seconds (average) 3. DeleteRowsWithValuesStrings() - 4 hrs 38 min (16701.375 sec) 4. DeleteRowsWithValuesArray() - 4 hrs 37 min (16626.3051757813 sec) 5. QuickAndEasy() - 5 hrs 40 min (20434.2104492188 sec) 6. DeleteRowsWithValuesUnion() - N/A
.
Notes:
- ExcelHero method: easy to implement, reliable, extremely fast, but removes formulas
- NewSheet method: easy to implement, reliable, and meets the target
- Strings method: more effort to implement, reliable, but doesn't meet requirement
- Array method: similar to Strings, but ReDims an array (faster version of Union)
- QuickAndEasy: easy to implement (short, reliable and elegant), but doesn't meet requirement
- Range Union: implementation complexity similar to 2 and 3, but too slow
I also made the test data more realistic by introducing unusual values:
- empty cells, ranges, rows, and columns
- special characters, like =[`~!@#$%^&*()_-+{}[]\|;:'",.<>/?, separate and multiple combinations
- blank spaces, tabs, empty formulas, border, font, and other cell formatting
- large and small numbers with decimals (=12.9999999999999 + 0.00000000000000001)
- hyperlinks, conditional formatting rules
- empty formatting inside and outside data ranges
- anything else that might cause data issues
-
Doug Glancy almost 9 yearsAutoFilter seems like the best approach, good call. You must have a powerful computer if you can even open a sheet with 1m rows. You could change the codename back using the VBE object model. It requires that "Access to the VBA object model" be enabled in the front end, so would only work for computers you have control of.
-
paul bica almost 9 years@DougGlancy: AutoFilter works well; I expected the copy / paste to take longer, but it's faster than the filter. My machine is about 6 years old actually - I got it back in 2009, but it's an I7 with 9 Gb mem. VBE would be the only way to fix the codename, but I wouldn't impose it on others
-
Ioannis almost 9 yearsThis is a good idea (+1). A small glitch: if there are cells with errors in the original range, they will be deleted.
-
Gary's Student almost 9 years@Ioannis Thank you for the feedback.........as for the glitch, I assumed that column AA was completely empty and my code ignores errors in the other columns.
-
paul bica almost 9 yearsGreat approach. +1 for simplicity and elegance! However, I'm not sure what data, and how much of it you tested to get 10 seconds, because I copied and pasted your code and it's been running now for more than 10 minutes... I still think the deletion of rows is the problem
-
paul bica almost 9 yearsThank you for the feedback Andrew. 1. Always a good idea to check for the value at start 2, There are some issues trying to accurately determine the last cell with conventional methods 3. and 4. Interacting with the range in a loop is one of the main performance issues in VBA, especially for very large ranges 5. Try manually deleting 200,000 rows from a 500K sheet - it's extremely slow
-
Gary's Student almost 9 years@paulbica Thanks for the feedback..............you are correct, the only good way to compare program performance is to run them on the same dataset on the same computer.
-
paul bica almost 9 yearsThanks Gary - I really like the solution and intend on finding ways to use it (the code is still running) The only change I made was to remove the last 900K of rows, so it's only using 100K. My data also has some formulas with empty values (=""). Not sure if that makes a difference
-
paul bica almost 9 yearsIt just completed, and I know what happened: I removed 900K rows but didn't update your range, so it did it for 1 Million (took almost 1 hour). I did it again with 100K and it completed in 26.04 seconds in 3 tests (avg)
-
Gary's Student almost 9 years@paulbica See my EDIT#1 Once the "helper" column is in place, you can set the Calculation mode to Manual. This may help you gain some speed.
-
paul bica almost 9 yearsThanks - turning off calculation during execution will never harm anything. In my case, I only have those simple formulas so it may not improve much. The main issue still remains with "EntireRow.Delete" - it takes forever. But its simplicity makes it beneficial for all normal situations.
-
paul bica over 8 yearsExcellent work! AdvancedFilter seems to be at least twice as fast as AutoFilter. I'll update the summary at the top to point it out
-
n8. over 8 yearsThis is the process I've used in the past and it is certainly much faster.
-
Conor about 7 yearsThis has crashed my computer and runs incredibly slow.
-
Passer-by almost 4 yearsIt seems that after applying AutoFilter, and then .copy seems still copied everthing, not just visible data, if I try .SpecialCells(xlCellTypeVisible).Copy, it shows the run-time error message "Excel cannot create or use the data range reference because it is too complex", similar to that if I use Auto Filter and then try to copy the filtered data manually. I got about 50k records, and about half of them need to be deleted btw.
-
W-hit about 3 yearson your
crit = [{"Column 1";"<>Test String"}]
line, how would I modify it to filter for multiple criteria?