INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

One nearest neighbor using awk

One nearest neighbor using awk

(OP)
This is what I am trying to do using AWK language. I have problem with mainly step 2. I have shown a sample dataset but the original dataset consists of 100 fields and 2000 records.

Algorithm

1) initialize accuracy = 0
2) for each record r
           --Find the closest other record, o, in the dataset using distance formula 
3) if the class value of closest record o is equal to class value of current record, increment accuracy by 1. here, class value is last field(col 6)
4) Finally, 100 * accuracy/total_records

Sample Dataset

 c1  c2  c3  c4  c5  c6  --> Columns
0.6 0.1 0.2 0.3 0.4 0.3  --> row1 & row7 nearest neighbor in c1
0.1 0.2 0.1 0.1 0.1 0.6      and same values in c6(0.3) so ++accuracy
0.2 0.3 0.1 0.1 0.2 0.6  
0.3 0.4 0.1 0.1 0.3 0.3
0.4 0.5 0.1 0.1 0.9 0.6
0.5 0.6 0.1 0.1 0.8 0.9
0.6 0.7 0.1 0.1 0.7 0.3
0.7 0.8 0.1 0.1 0.6 0.6
0.8 0.9 0.1 0.1 0.5 0.9
0.9 1.0 0.1 0.1 0.4 0.3

 
Code

BEGIN{
   accuracy = 0;
   total_records = 10;
}
{
   for(i = 1; i <= 5; i++ )  # for fields 1 to 5 only
   #for each record
   {
    #find closest record(calculating the distance)
    distance = abs($i - other_records)

    #compare values of field 6 for closest and current(each) record
    if(current_record_field_6.value == closest_record_field_6.value)
    {
        ++accuracy;
    }
   }
}
END{
percentage = 100 * (accuracy/total_records);

print percentage;
}
 
I am struggling on how to find the closest record for each record in the dataset using AWK. As far as I know '{}' block is only executed once for each record.

Any help or suggestion is much appreciated.

RE: One nearest neighbor using awk

Quote (Murlidhar)


I am struggling on how to find the closest record for each record in the dataset using awk. As far as I know '{}' block is only executed once for each record.

Yes block {..} is executed only once for each record. In this block we can store every record in an array, then compute the distances of every record with all previous records and store the values in a distance matrix.

Then in the block END{..} we can evaluate computed distances and print the results.

I tried it. For the input data posted above I computed this distance matrix (rN means N-th record)

CODE

REC  r01   r02   r03   r04   r05   r06   r03   r08   r09   r10 
r01 0.00  1.20  1.10  1.00  1.40  1.30  1.20  1.30  1.40  1.50 
r02 1.20  0.00  0.30  0.60  1.40  1.50  1.60  1.70  1.80  1.90 
r03 1.10  0.30  0.00  0.30  1.10  1.20  1.30  1.40  1.50  1.60 
r04 1.00  0.60  0.30  0.00  0.80  0.90  1.00  1.10  1.20  1.30 
r05 1.40  1.40  1.10  0.80  0.00  0.30  0.60  0.90  1.20  1.50 
r06 1.30  1.50  1.20  0.90  0.30  0.00  0.30  0.60  0.90  1.20 
r07 1.20  1.60  1.30  1.00  0.60  0.30  0.00  0.30  0.60  0.90 
r08 1.30  1.70  1.40  1.10  0.90  0.60  0.30  0.00  0.30  0.60 
r09 1.40  1.80  1.50  1.20  1.20  0.90  0.60  0.30  0.00  0.30 
r10 1.50  1.90  1.60  1.30  1.50  1.20  0.90  0.60  0.30  0.00 

and I got these results:

CODE

record #01: [ 0.6 0.1 0.2 0.3 0.4 0.3 ]
	closest record(s) with minimal distance =  1.00
		record #04: [ 0.3 0.4 0.1 0.1 0.3 0.3 ]	--> accuracy found

record #02: [ 0.1 0.2 0.1 0.1 0.1 0.6 ]
	closest record(s) with minimal distance =  0.30
		record #03: [ 0.2 0.3 0.1 0.1 0.2 0.6 ]	--> accuracy found

record #03: [ 0.2 0.3 0.1 0.1 0.2 0.6 ]
	closest record(s) with minimal distance =  0.30
		record #02: [ 0.1 0.2 0.1 0.1 0.1 0.6 ]	--> accuracy found
		record #04: [ 0.3 0.4 0.1 0.1 0.3 0.3 ]

record #04: [ 0.3 0.4 0.1 0.1 0.3 0.3 ]
	closest record(s) with minimal distance =  0.30
		record #03: [ 0.2 0.3 0.1 0.1 0.2 0.6 ]

record #05: [ 0.4 0.5 0.1 0.1 0.9 0.6 ]
	closest record(s) with minimal distance =  0.30
		record #06: [ 0.5 0.6 0.1 0.1 0.8 0.9 ]

record #06: [ 0.5 0.6 0.1 0.1 0.8 0.9 ]
	closest record(s) with minimal distance =  0.30
		record #05: [ 0.4 0.5 0.1 0.1 0.9 0.6 ]
		record #07: [ 0.6 0.7 0.1 0.1 0.7 0.3 ]

record #07: [ 0.6 0.7 0.1 0.1 0.7 0.3 ]
	closest record(s) with minimal distance =  0.30
		record #06: [ 0.5 0.6 0.1 0.1 0.8 0.9 ]
		record #08: [ 0.7 0.8 0.1 0.1 0.6 0.6 ]

record #08: [ 0.7 0.8 0.1 0.1 0.6 0.6 ]
	closest record(s) with minimal distance =  0.30
		record #07: [ 0.6 0.7 0.1 0.1 0.7 0.3 ]
		record #09: [ 0.8 0.9 0.1 0.1 0.5 0.9 ]

record #09: [ 0.8 0.9 0.1 0.1 0.5 0.9 ]
	closest record(s) with minimal distance =  0.30
		record #08: [ 0.7 0.8 0.1 0.1 0.6 0.6 ]
		record #10: [ 0.9 1.0 0.1 0.1 0.4 0.3 ]

record #10: [ 0.9 1.0 0.1 0.1 0.4 0.3 ]
	closest record(s) with minimal distance =  0.30
		record #09: [ 0.8 0.9 0.1 0.1 0.5 0.9 ]

number of records processed: 10
percentage = 30 

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close