Conditionally removing duplicates in Excel based on largest value in a Column

31,028

Solution 1

If you don not want to use VBA you can try this:

Select the table and from HOME -> Sort & Filter select Custom Sort.

First sort by word with order : A to Z then Add Level and sort by value with order: Smallest to Largest.

Create a column Key with the following formula (i am assuming you data have headers and start from cell A1. Then in cell C2 put the formula =IF(B2<>B3,0,1) and copy down.

Then copy and paste special as values column C and sort on the column Key Descending or filter on 1

Finally, copy the filtered table and paste it elsewhere then select it all and Data -> Remove Duplicates and your done. Or you can do this last part first and then run the steps above.

Solution 2

if you have a date set with multiple columns, when you delete duplicates from a specific column (i.e. cat, dog, etc), (it also deletes the entire row of that data selection), the delete duplicate function keeps the first duplicate in the column and removes the rest. (this is assuming that you have sorted the data set a-z for the column that you want to remove duplicates from). Therefore, if you sort the data set first with the duplicate column as A-Z, then add a layer with the values column sorted as Highest - Lowest, then when you remove duplicates, automatically, you get to keep the duplicate with the highest value and remove all the rest. Unfortunately, this is a scenario that works specifically for your case. i don't know of any other method where conditional removal would have more options.

Solution 3

Try this short macro:

Sub SaveTheFirstItem()
    Dim N As Long, i As Long

    N = Cells(Rows.Count, 1).End(xlUp).Row
    For i = N To 2 Step -1
        If Cells(i, 2) = Cells(i - 1, 2) Then
            Cells(i, 2).EntireRow.Delete
        End If
    Next i
End Sub

Macros are very easy to install and use:

  1. ALT-F11 brings up the VBE window
  2. ALT-I ALT-M opens a fresh module
  3. paste the stuff in and close the VBE window

If you save the workbook, the macro will be saved with it. If you are using a version of Excel later then 2003, you must save the file as .xlsm rather than .xlsx

To remove the macro:

  1. bring up the VBE window as above
  2. clear the code out
  3. close the VBE window

To use the macro from Excel:

  1. ALT-F8
  2. Select the macro
  3. Touch RUN

To learn more about macros in general, see:

http://www.mvps.org/dmcritchie/excel/getstarted.htm

and

http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx

Macros must be enabled for this to work!

Share:
31,028
owwoow14
Author by

owwoow14

Updated on August 10, 2020

Comments

  • owwoow14
    owwoow14 over 3 years

    I have a 2-column data in Excel that looks like this: The first column is a value, and the Second column contains a corresponding word. However, I want to remove rows in this dataset such that, in the end, for each unique word in Column two, only one row is retained for which the value in the Column one is the largest for that word, removing even those rows having duplicates of the largest value for each unique word and leaving one row for each unique word.

    2   cat
    2   cat
    1   cat
    3   dog
    2   dog
    1   dog
    2   milk
    1   milk
    2   juice
    1   juice
    

    The columns are ordered firstly by Column 2 (therefore the words are in alphabetical order: A-Z) and secondly by Column 1 (from largest to smallest, therefore the largest value is always the first value for each word).

    I have been trying to do this using the advanced filtering in Excel for Mac 2011, but I cannot seem to include the condition of removing all duplicate entries except for the entry that has the largest value in Column 1.

    The desired output is as follows:

    2   cat
    3   dog
    2   milk
    2   juice
    

    I have been searching for answers based on the query conditionally removing duplicates, here and here, however each of the solutions provided seems to be based on a boolean condition (i.e. there is information or not), and not determining which value is the largest as the condition for which to remove the duplicate or not.

    Any ideas on how to go about solving this?