Duplicates can cause lots of problems when working in a large Excel database. That’s when you should know How to Remove Duplicates in Excel. They can result in mistakes in your tallies or counts and can make you report totally wrong information.
Consider the below example –
You need to send a very important post to one of your employees. But when you look at the database, you find that you have the same employee listed in your database two times. Obviously, both the name and the employee ID are the same. However, the residential addresses are different since the person moved house.
Now, which is the correct record(address) that you need to use now?
Difficult right? This is where the need to remove duplicates in Excel arises. Contrary to popular beliefs, removing duplicates in Excel is not so uncommon, especially when working with large amounts of data. When different excel tables are combined, or when several people have access to the same document, you often end up with repeated entries in your spreadsheet. This results in duplication of data.
As you would have realized by now, duplicate records can skew reporting and analysis. Luckily, whether you’re finding duplicates in a single column or looking for duplicate records, Excel can do most of the work for you.
Excel provides two tools for handling this situation and cleaning up your database.
But before we get into that, we need to understand in detail what constitutes duplicate records and why you should remove them in the first place.
What are Duplicate Records in Excel, and Why should you Remove them?
A duplicate value is any value that is identical to another value in a database. In other words, this record occurs more than once. It is often found while working with large databases in Excel. As the database owner, it becomes essential to find and highlight the duplicate values in Excel because the end-user may or may not want to retain them.
A duplicate value can be confined to a single column; any value that occurs more than once in that column is a duplicate. In an Excel data set, duplicates in a single column are very common. The term can also refer to multiple columns or even all columns for a given record.
For instance, a student database might have a date and grade column where multiple values occur more than once in both columns. However, two records that apply the same assignment, date, and grade to the same student might be considered a duplicate and, if so, should be accommodated (or deleted) to avoid incorrect analysis and reporting.
Typically, you may want to find duplicates in Excel for one of the following reasons:
- Duplicates are there by mistake and should be deleted in order to make data accurate.
- All duplicates are important for analysis and should be highlighted in Excel.
Pro Tip: Prior to deleting the duplicates permanently, it is recommended to keep a copy of the original database. This allows one to return to the source data if required.
How to Identify Duplicates in Excel?
You can easily find duplicates in a large dataset without painstakingly checking each and every cell. This could be an entire row or column or a separate user-defined area. You can use your mouse to select an area by dragging the square box. You can also select individual cells. Simply hold down the control key and then click on the selected region.
One of the most optimal and easy ways to identify and detect duplicates in Excel is using the CountIF function. There can be scenarios wherein you wish to identify duplicate values with or without singular occurrences. There may be a slight variation in the application of the function based on the scenario. Here are the different scenarios –
- Check for duplicates with 1st occurrences.
- Search for duplicates without 1st occurrences.
- To find duplicate rows in Excel.
- To find case-sensitive duplicates in Excel.
- To filter duplicates in Excel.
To Find Duplicate Records, including 1st occurrences
In this scenario, you have a list of items in column A that you want to check for duplicate data. The data set can be order number, item name, price, etc.
Here’s a formula to find duplicates in the below sheet, including first occurrences (where A2 is the topmost cell):
=IF(COUNTIF($A$2:$A$8, $A2)>1, “Duplicate”, “Unique”)
As you can see, it marks all the records are duplicates, including the first occurrence.
To search for duplicates in Excel without 1st occurrences.
In case you plan to filter or remove duplicates after finding them, using the above formula is not advisable as it marks all identical records as duplicates. And if you want to keep the unique values in your list, then you cannot delete all duplicate records.
So you need to delete only the 2nd and all subsequent instances. So, the modified Excel duplicate formula by using absolute and relative cell references will be:
=IF(COUNTIF($A$2:$A2, $A2)>1, “Duplicate”, “”)
As you can see in the above screenshot, this formula does not identify the first occurrence of “Lemons “as a duplicate.
To find Duplicate Rows in Excel, including 1st occurrences.
Just like we found duplicate columns in the first two sections, you can also find duplicate rows in Excel. Here is the formula to do that –
=IF(COUNTIFS($A$2:$A$13, $A2, $B$2:$B$13, $B2, $C$2:$C$13, $C2)>1, “Duplicate row”,”Unique”)
To search for duplicates in Excel without 1st occurrences
Like the columns scenario, by using the above formula, Excel marks all similar rows as duplicates. If you wish to keep all the rows unique, you should not delete all duplicate data but delete only the rows after the 2nd duplicate instance. Here is the formula to do that –
=IF(COUNTIF($B$2:$B2, $B2)>1, “Duplicate”, “”)
As you can see in the following screenshot, this formula does not identify the first occurrence of “Lemons” as duplicate:
To Find Case-Sensitive Duplicates in Excel.
In situations when you need to identify exact duplicates, including the text case, use this generic array formula (entered by pressing Ctrl + Shift + Enter):
IF( SUM(( –EXACT(range, uppermost _cell)))<=1, “”, “Duplicate”)
For our sample dataset, the formula goes as follows:
As shown in the screenshot, it treats lowercase and uppercase as different characters (LEMONS and BANANAS are not identified as duplicates).
How to Highlight Duplicates in Excel using the built-in Feature?
For starters, in all Excel versions, there is a predefined rule for highlighting duplicate cells. To use this rule in your worksheets, perform the following steps:
- Select the data you want to check for duplicates. This can be a column, a row, or a range of cells.
- On the Home tab, in the Styles group, click Conditional Formatting > Highlight Cells Rules > Duplicate Values.
The screenshot below depicts the Duplicate Values dialog window. Choose the Red Fill and Dark Red Text format selected by default.
To highlight duplicates using any other color, choose the Custom Format and customize the fill and/or font color.
Note: To highlight unique values, select Unique in the left-hand box.
How to Find Duplicate Rows in Excel?
If the goal is to find duplicate rows in a particular table, then you need to implement a formula that can check each column and detect rows that have duplicate values.
For example: If you have Item No in Column A, Items in Column B, and Price in Column C, then you will want to find duplicate data access rows with the same Item No, Price, and Item Name.
For this, we are going to create a duplicate formula using the COUNTIFS function that allows checking multiple data at a time:
=IF(COUNTIFS($A$2:$A$8,$A2,$B$2:$B$8,$B2,$C$2:$C$8,$C2)>1, “Duplicate row”, “”)
How to Remove Duplicates in Excel?
There are three ways to clear or remove duplicates in Excel. They are –
Method 1. Remove by Excel’s Built-in Remove Duplicates Feature
All modern versions of Excel have a built-in tool for removing duplicates called (not surprisingly) Remove Duplicates. Here are the steps to access it –
- To begin with, select the range in which you want to delete duplicates. To select the entire table, select Control+A.
- Select the Data tab > Data Tools group, and select the Remove Duplicates button.
- The Remove Duplicates dialog box will open. Select the columns to check for duplicates, and click OK.
To delete duplicate rows that have completely equal values in all columns, leave the checkmarks next to all columns selected, like in the screenshot below.
Now all duplicate rows in the selected range are deleted, and you will now see a message which states that the duplicate entries have been removed. You will also see how many unique values remain.
How to enable Macros in Excel
How to use If Formula in Excel
Method 2. Remove Duplicates using Advanced Filter and copy unique records to another location.
You can also separate unique values from duplicate values and copy them to another sheet.
Select the range or the entire table that you want to find duplicate and delete.
Select the Data tab > Sort & Filter group, and click the Advanced button.
In the Advanced Filter dialog window, do the following:
- Choose the Copy to another location option
- Verify the List Range. This should be the range you’ve selected in step 1.
- Enter the range where you wish to copy the unique values and select the Unique records only box.
- Finally, click OK, and the unique values will be copied to a new sheet.
As you have just seen in this post, there are numerous ways to find and remove duplicates in Microsoft Excel, each having its strong points and limitations.
By using the above formulas and techniques, you can make a very time-consuming and tedious task of managing duplicate records seem like a very easy task. And yes, keep your database clean by eliminating all the duplicate records. Otherwise, you will end up with a lot of inaccuracies and inefficient reporting.
EasySheet is a tech blog that features excellent Excel hacks, tips, and tricks. Our mission is to help beginners gain knowledge of Excel with EasySheet and get answers to all of their questions using EasySheet. If you require assistance with Microsoft Office products, such as Excel, Word, PowerPoint, Publisher, or anything else, we are the best place to turn.