Need Genius Help - Finding multiple duplicates between columns - EXCEL

Kim296 · 2025-06-10T15:30:03+0100

I really need some help from a genius. I'm working on a dynamic excel spreadsheet. There are over 1,500 columns with list data below each one. Does anyone know how I can dynamically find two or more like values that appear in any of the 1,500 columns. I don't just want to highlight duplicates. I need to determine if more than two (same values) appear in any of the columns independently of each other. I've attached a sample of the column row values. Any direction in the right way to calculate this using a formula is appreciated. I've tried alot of different ideas over two weeks, but I am coming up empty.

dhookom · 2025-06-10T15:52:04+0100

There are others who I expect will be able to assist but it would us if you actually highlighted in your screenshot some duplicates.

Kim296 · 2025-06-10T16:33:15+0100

dhookom said:
There are others who I expect will be able to assist but it would us if you actually highlighted in your screenshot some duplicates.

Thank you so much for your response.

Every code listed (all 1,500 columns) is a duplicated, so it wouldn't do me any good to highlight the duplicate data. Every code would be highlighted. What I'm looking for is a solution to determine if there are two or more like values (from one column) that are the same between any of the other columns independent of each other. I may not be explaining this well.

For example:
See attached image

I don't know if what I'm asking for is possible??

dhookom · 2025-06-10T17:37:33+0100

That makes more sense now with the image. My method would be brute-force VBA to loop through the values in each column and check values against other columns. Someone might have a much more efficient solution.

Being a database developer, I might append the data values to a table that has the Row, Column, and Value. The data you just provided would create ten records in the table. Once I had all the records in a table, I would attempt to use some type of query or queries.

Kim296 · 2025-06-10T17:59:19+0100

dhookom said:
That makes more sense now with the image. My method would be brute-force VBA to loop through the values in each column and check values against other columns. Someone might have a much more efficient solution.

Being a database developer, I might append the data values to a table that has the Row, Column, and Value. The data you just provided would create ten records in the table. Once I had all the records in a table, I would attempt to use some type of query or queries.

I didn't use VBA (because I'm not that versed on it) but your Row, Column and Value method is how I located all the duplicates between sets. I just don't know how to query through them to determine if "two or more" [unknown] values are the same between sets. Do you know of a resource that I can read to learn this or do you know how to do this within a formula?

dhookom · 2025-06-10T18:30:07+0100

I think you will need to rely on someone else. I'm packing for a two-week holiday without a laptop.

Kim296 · 2025-06-10T18:31:20+0100

dhookom said:
I think you will need to rely on someone else. I'm packing for a two-week holiday without a laptop.

Thank you. Have a great trip!

Deniall · 2025-06-10T22:39:37+0100

I am still having difficulties understanding exactly what you are trying to do. My best guess at this stage is that you are trying to identify codes that appear more than once in one column AND also appear in any other column.

Can you please confirm/clarify/correct/contradict this.

mintjulep · 2025-06-11T07:47:38+0100

@Kim296, your description is confusing to me also.

Your screenshots provide trivial examples compared to the stated size (1,500 x 300) of your data and number of possible permutations to check for (two or more).

Frankly, this sounds like you are asking for help at implementing a possible solution to a problem rather than asking for help to solve the actual problem.

Why don't you take a step back and try to more clearly explain what you have, and what you need.

Deniall · 2025-06-11T09:20:10+0100

Depending perhaps upon the result of the clarifications requested by several of us above, you might be able to attack your problem by creating a subsidiary tabulation that present your information is a different way. We are told that your present table contains about 1500 columns, each of which could contain up to about 300 "codes". You haven't told us what the total number of possible codes is, and all we can be reasonably confident of is that it is unlikely to exceed 1500x300=450,000.

If we can assume the number of possible codes is CONSIDERABLY smaller than this, you MIGHT be able to get whatever it is you are seeking as follows.

Create an unstructured table of 1501 columns and 301 rows.
Prefill the first column with all the possible 300 codes as row headings.
Prefill the first row with "Column1", "Column2", ...., "Column1500" as column headings.
Fill the 450000 detail cells with COUNTIF formulae along the lines of =COUNTIF("EntireColumn","RowHeading")
(which will be a lot less tedious if you get your absolute versus relative referencing correct).

It should then be quite trivial to undertake whatever tests it is that you want.

However this method is somewhat profligate in its cell usage (to put it mildly).

combo · 2025-06-11T11:37:52+0100

Kim296 said:
Every code listed (all 1,500 columns) is a duplicated, so it wouldn't do me any good to highlight the duplicate data. Every code would be highlighted.

But it is how you described the problem. A value can be in any two or more columns.

Whatever your problem is, I would start with defining structured table from your data. Next process it in Power Query:
- add index column to locate row,
- unpivot other columns transformation (relative to added Index column),
- keep it as connection only query.
You will get a starting three column table (Index, column header, value), without empty entries.
In next query you can create aggregation and count values, filter unique entries.
Another query can combine two first queries and return addresses (index and column header) for duplicated entries.

mintjulep · 2025-06-11T11:53:05+0100

How do either of those proposed solutions address "two or more like values"?

mintjulep · 2025-06-11T12:11:09+0100

Seems like set intersections, i.e. inner join, might be useful.

Kim296 · 2025-06-11T13:36:40+0100

Deniall said:
I am still having difficulties understanding exactly what you are trying to do. My best guess at this stage is that you are trying to identify codes that appear more than once in one column AND also appear in any other column.

Can you please confirm/clarify/correct/contradict this.

Yes, that is correct. However there isn't one primary column to compare all to. I need to compare each of the 1,500 columns and extract "two or more" like codes from each columns independently of each other; hence, comparing all columns separately. See the screen shot I attached as an example. Any help with a possible solution is very much appreciated.

Kim296 · 2025-06-11T13:39:11+0100

mintjulep said:
@Kim296, your description is confusing to me also.

Your screenshots provide trivial examples compared to the stated size (1,500 x 300) of your data and number of possible permutations to check for (two or more).

Frankly, this sounds like you are asking for help at implementing a possible solution to a problem rather than asking for help to solve the actual problem.

Why don't you take a step back and try to more clearly explain what you have, and what you need.

Thank you for your reply.

Kim296 · 2025-06-11T13:45:05+0100

Deniall said:
Depending perhaps upon the result of the clarifications requested by several of us above, you might be able to attack your problem by creating a subsidiary tabulation that present your information is a different way. We are told that your present table contains about 1500 columns, each of which could contain up to about 300 "codes". You haven't told us what the total number of possible codes is, and all we can be reasonably confident of is that it is unlikely to exceed 1500x300=450,000.

If we can assume the number of possible codes is CONSIDERABLY smaller than this, you MIGHT be able to get whatever it is you are seeking as follows.

Create an unstructured table of 1501 columns and 301 rows.
Prefill the first column with all the possible 300 codes as row headings.
Prefill the first row with "Column1", "Column2", ...., "Column1500" as column headings.
Fill the 450000 detail cells with COUNTIF formulae along the lines of =COUNTIF("EntireColumn","RowHeading")
(which will be a lot less tedious if you get your absolute versus relative referencing correct).

It should then be quite trivial to undertake whatever tests it is that you want.

However this method is somewhat profligate in its cell usage (to put it mildly).

You're right. This is quite trivial for what I'm trying to accomplish. I've mulled this over for weeks. This is only one tab of a much larger workbook in which I've come to this point. This is a template in which new data (codes) will be introduced each time. There will never be one set of codes. They codes will change everytime with a new query.

I have tried what you are proposing above in previous tabs with success. My issue is trying to find "two" or more like codes within each column. Every code in the total columns and rows are duplicates (between columns). I'm now trying to extract where two or more are alike (in each column) between all columns.

I appreciate everyone's help and questions. I will continue to research this.

Kim296 · 2025-06-11T13:49:31+0100

combo said:
But it is how you described the problem. A value can be in any two or more columns.

Whatever your problem is, I would start with defining structured table from your data. Next process it in Power Query:
- add index column to locate row,
- unpivot other columns transformation (relative to added Index column),
- keep it as connection only query.
You will get a starting three column table (Index, column header, value), without empty entries.
In next query you can create aggregation and count values, filter unique entries.
Another query can combine two first queries and return addresses (index and column header) for duplicated entries.

Thank you for your reply.

combo · 2025-06-11T14:07:33+0100

BTW, it is a result of the queries with your data:

Andrzejek · 2025-06-11T14:12:01+0100

Kim,

Would you be open to a VBA solution?.
I may have (what Duane described as a) "brute-force VBA to loop through the values in each column and check values against other columns."

Kim296 · 2025-06-11T14:17:15+0100

combo said:
BTW, it is a result of the queries with your data:

View attachment 2350

combo said:
BTW, it is a result of the queries with your data:
View attachment 2350

Thank you for your response. I think this might work. Let me give it a go...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Need Genius Help - Finding multiple duplicates between columns - EXCEL

MIS

Attachments

Programmer

MIS

Attachments

Programmer

MIS

Programmer

MIS

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

MIS

MIS

MIS

MIS

Technical User

Programmer

MIS

Similar threads

Log in

Part and Inventory Search

Sponsor