Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations MikeeOK on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need Genius Help - Finding multiple duplicates between columns - EXCEL

Kim296

MIS
Aug 24, 2012
111
US
I really need some help from a genius. I'm working on a dynamic excel spreadsheet. There are over 1,500 columns with list data below each one. Does anyone know how I can dynamically find two or more like values that appear in any of the 1,500 columns. I don't just want to highlight duplicates. I need to determine if more than two (same values) appear in any of the columns independently of each other. I've attached a sample of the column row values. Any direction in the right way to calculate this using a formula is appreciated. I've tried alot of different ideas over two weeks, but I am coming up empty.
 

Attachments

  • screen_shot4.jpg
    screen_shot4.jpg
    54.8 KB · Views: 11
There are others who I expect will be able to assist but it would us if you actually highlighted in your screenshot some duplicates.
 
There are others who I expect will be able to assist but it would us if you actually highlighted in your screenshot some duplicates.
Thank you so much for your response.

Every code listed (all 1,500 columns) is a duplicated, so it wouldn't do me any good to highlight the duplicate data. Every code would be highlighted. What I'm looking for is a solution to determine if there are two or more like values (from one column) that are the same between any of the other columns independent of each other. I may not be explaining this well.

For example:
See attached image

I don't know if what I'm asking for is possible??
 

Attachments

  • screen_shot4.jpg
    screen_shot4.jpg
    54.8 KB · Views: 23
Last edited:
That makes more sense now with the image. My method would be brute-force VBA to loop through the values in each column and check values against other columns. Someone might have a much more efficient solution.

Being a database developer, I might append the data values to a table that has the Row, Column, and Value. The data you just provided would create ten records in the table. Once I had all the records in a table, I would attempt to use some type of query or queries.
 
That makes more sense now with the image. My method would be brute-force VBA to loop through the values in each column and check values against other columns. Someone might have a much more efficient solution.

Being a database developer, I might append the data values to a table that has the Row, Column, and Value. The data you just provided would create ten records in the table. Once I had all the records in a table, I would attempt to use some type of query or queries.

I didn't use VBA (because I'm not that versed on it) but your Row, Column and Value method is how I located all the duplicates between sets. I just don't know how to query through them to determine if "two or more" [unknown] values are the same between sets. Do you know of a resource that I can read to learn this or do you know how to do this within a formula?
 
I think you will need to rely on someone else. I'm packing for a two-week holiday without a laptop.
 
I am still having difficulties understanding exactly what you are trying to do. My best guess at this stage is that you are trying to identify codes that appear more than once in one column AND also appear in any other column.

Can you please confirm/clarify/correct/contradict this.
 
@Kim296, your description is confusing to me also.

Your screenshots provide trivial examples compared to the stated size (1,500 x 300) of your data and number of possible permutations to check for (two or more).

Frankly, this sounds like you are asking for help at implementing a possible solution to a problem rather than asking for help to solve the actual problem.

Why don't you take a step back and try to more clearly explain what you have, and what you need.
 
Depending perhaps upon the result of the clarifications requested by several of us above, you might be able to attack your problem by creating a subsidiary tabulation that present your information is a different way. We are told that your present table contains about 1500 columns, each of which could contain up to about 300 "codes". You haven't told us what the total number of possible codes is, and all we can be reasonably confident of is that it is unlikely to exceed 1500x300=450,000.

If we can assume the number of possible codes is CONSIDERABLY smaller than this, you MIGHT be able to get whatever it is you are seeking as follows.

Create an unstructured table of 1501 columns and 301 rows.
Prefill the first column with all the possible 300 codes as row headings.
Prefill the first row with "Column1", "Column2", ...., "Column1500" as column headings.
Fill the 450000 detail cells with COUNTIF formulae along the lines of =COUNTIF("EntireColumn","RowHeading")
(which will be a lot less tedious if you get your absolute versus relative referencing correct).

It should then be quite trivial to undertake whatever tests it is that you want.

However this method is somewhat profligate in its cell usage (to put it mildly).
 
Every code listed (all 1,500 columns) is a duplicated, so it wouldn't do me any good to highlight the duplicate data. Every code would be highlighted.
But it is how you described the problem. A value can be in any two or more columns.

Whatever your problem is, I would start with defining structured table from your data. Next process it in Power Query:
- add index column to locate row,
- unpivot other columns transformation (relative to added Index column),
- keep it as connection only query.
You will get a starting three column table (Index, column header, value), without empty entries.
In next query you can create aggregation and count values, filter unique entries.
Another query can combine two first queries and return addresses (index and column header) for duplicated entries.
 
Last edited:
I am still having difficulties understanding exactly what you are trying to do. My best guess at this stage is that you are trying to identify codes that appear more than once in one column AND also appear in any other column.

Can you please confirm/clarify/correct/contradict this.
Yes, that is correct. However there isn't one primary column to compare all to. I need to compare each of the 1,500 columns and extract "two or more" like codes from each columns independently of each other; hence, comparing all columns separately. See the screen shot I attached as an example. Any help with a possible solution is very much appreciated.
 
@Kim296, your description is confusing to me also.

Your screenshots provide trivial examples compared to the stated size (1,500 x 300) of your data and number of possible permutations to check for (two or more).

Frankly, this sounds like you are asking for help at implementing a possible solution to a problem rather than asking for help to solve the actual problem.

Why don't you take a step back and try to more clearly explain what you have, and what you need.
Thank you for your reply.
 
Depending perhaps upon the result of the clarifications requested by several of us above, you might be able to attack your problem by creating a subsidiary tabulation that present your information is a different way. We are told that your present table contains about 1500 columns, each of which could contain up to about 300 "codes". You haven't told us what the total number of possible codes is, and all we can be reasonably confident of is that it is unlikely to exceed 1500x300=450,000.

If we can assume the number of possible codes is CONSIDERABLY smaller than this, you MIGHT be able to get whatever it is you are seeking as follows.

Create an unstructured table of 1501 columns and 301 rows.
Prefill the first column with all the possible 300 codes as row headings.
Prefill the first row with "Column1", "Column2", ...., "Column1500" as column headings.
Fill the 450000 detail cells with COUNTIF formulae along the lines of =COUNTIF("EntireColumn","RowHeading")
(which will be a lot less tedious if you get your absolute versus relative referencing correct).

It should then be quite trivial to undertake whatever tests it is that you want.

However this method is somewhat profligate in its cell usage (to put it mildly).
You're right. This is quite trivial for what I'm trying to accomplish. I've mulled this over for weeks. This is only one tab of a much larger workbook in which I've come to this point. This is a template in which new data (codes) will be introduced each time. There will never be one set of codes. They codes will change everytime with a new query.

I have tried what you are proposing above in previous tabs with success. My issue is trying to find "two" or more like codes within each column. Every code in the total columns and rows are duplicates (between columns). I'm now trying to extract where two or more are alike (in each column) between all columns.

I appreciate everyone's help and questions. I will continue to research this.
 
But it is how you described the problem. A value can be in any two or more columns.

Whatever your problem is, I would start with defining structured table from your data. Next process it in Power Query:
- add index column to locate row,
- unpivot other columns transformation (relative to added Index column),
- keep it as connection only query.
You will get a starting three column table (Index, column header, value), without empty entries.
In next query you can create aggregation and count values, filter unique entries.
Another query can combine two first queries and return addresses (index and column header) for duplicated entries.
Thank you for your reply.
 
Kim,

Would you be open to a VBA solution?.
I may have (what Duane described as a) "brute-force VBA to loop through the values in each column and check values against other columns." :)
 

Part and Inventory Search

Sponsor

Back
Top