Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here


Find duplicate files from different directories

Find duplicate files from different directories

Find duplicate files from different directories

Hi All,
I have about 200 folders in which each contains up to 6 to 10 sub-folders. There are about 100 different files that have been duplicated and stored in the folders and I must find their path locations and filenames.
I'd like to have your help to write a TCL or Perl script that can help me to find the duplicated file names and their path locations.
I am really appreciate for your help.

RE: Find duplicate files from different directories

I personally no longer have a platform that runs Tcl BUT... it's not a hard problem. Look at the "file" and "glob" functions. Start with just the directories, not the sub-directories. Remember that in Tcl, arrays are hashed so you can have something like "filenames(xyz/abc/qrl.ext)=n" where n is the number of occurrences.

Bob Rashkin

RE: Find duplicate files from different directories


As Bong already pointed out, you need to create a procedure to search for file using 'glob'. See following Link

Furthermore, let's assume you have finally created your search procedure or somebody else on this forum has much more free time then me and can give you the script to results the following list format as an example result:

CODE -->

set searcrhResults {
  dir1/dir2/dir3/file1.tcl   dir1/dir3/file1.tcl   dir1/dir2/file1.tcl   dir1/dir2/dir3/file2.tcl   dir1/dir2/dir3/file3.tcl   dir1/dir3/file2.tcl   dir1/file3.tcl   dir1/file4.tcl } 

Then you can use the following script to parse the list for duplicates:

CODE -->

foreach file $searcrhResults {
  if {[catch {incr filenames([file tail $file],cnt)}]} {set filenames([file tail $file],cnt) 1}
  lappend filenames([file tail $file],paths) [file dirname $file]

When 'parray' command is unleashed onto array 'filenames', we get view the duplicates:

CODE -->

filenames(file1.tcl,cnt)   = 3
filenames(file1.tcl,paths) = dir1/dir2/dir3 dir1/dir3 dir1/dir2
filenames(file2.tcl,cnt)   = 2
filenames(file2.tcl,paths) = dir1/dir2/dir3 dir1/dir3
filenames(file3.tcl,cnt)   = 2
filenames(file3.tcl,paths) = dir1/dir2/dir3 dir1
filenames(file4.tcl,cnt)   = 1
filenames(file4.tcl,paths) = dir1 


Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close