Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

LINK TO THIS FORUM!

Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Partner With Us!

"Best Of Breed" Forums Add Stickiness To Your Site
Partner Button
(Download This Button Today!)

Feedback

"...love the site and am constantly recommending it to (selected !) clients here in ireland..."

Geography

Where in the world do Tek-Tips members come from?
ksdh (TechnicalUser)
18 Feb 09 6:41
HI
I have 2 files A and B
A
1
1234
986740982
78264182
89264162
9128635
5
6
7
8

B
1
4
5
6
7

As you can see , some of the records in B are in A. What is the best way that we can compare the 2 files? I mean lets say i want to output the comparison of both the files where the entries match (or dont match). I tried the while command but could not really get what i wanted , any kind of help is appreciated.

Thanks
feherke (Programmer)
18 Feb 09 6:50
Hi

diff ?

CODE

master # diff A B
2,6c2
< 1234
< 986740982
< 78264182
< 89264162
< 9128635
---
> 4
10d5
< 8

master # diff -y A B
1                                           1
1234                                      | 4
986740982                                 <
78264182                                  <
89264162                                  <
9128635                                   <
5                                           5
6                                           6
7                                           7

Feherke.
http://rootshell.be/~feherke/

ksdh (TechnicalUser)
18 Feb 09 6:56
Feherke
Thanks , but the diff command would nt work out here .
The files contain thousands of entries.

So basically i would have to run a command where if the number or string is in file B and file A the count is incremented and if the content from file B does not match A, the counter stays the way it is (no increment).

I can give you a typical example, lets say we have a whitelist and we have a stream of raw data coming in. The raw data contains all the numbers and the white list contains only x amount that are allowed to pass through. Once the raw data is passed through a whitelist filter, i get another file. Now i want to compare if the filtered file was actually filtered and that whitelist works.

I hope i have been able to explain the situation.

Thanks
feherke (Programmer)
18 Feb 09 7:13
Hi

In your example file A being the raw and file B the white list ? Maybe like this ?

CODE

# allowed by white list
master # grep -f B -x A
1
5
6
7

# rejected by white list
master # grep -f B -x -v A
1234
986740982
78264182
89264162
9128635
8
Tested with GNU grep.

Feherke.
http://rootshell.be/~feherke/

ksdh (TechnicalUser)
18 Feb 09 7:24
Feherke
Sorry but the -f option doesnt work on my solaris?
grep -f IMSI2 -x IMSI1
grep: illegal option -- f

Also, what i want to do here is

A------> Raw file
B------> Whitelist
C------> Filtered file

Lets say each of them contains only numbers.
I want to check , how many numbers in the filtered file (came from raw) are there in the whitelist. Ideally all of them should be in the whitelist (filter). But i still want to compare ----is each number in the filtered file also present in the whitelist. If so, i want to count the number of entries that matched.


Thanks
feherke (Programmer)
18 Feb 09 7:31
Hi

Show us a sample of the desired output too.

Feherke.
http://rootshell.be/~feherke/

ksdh (TechnicalUser)
18 Feb 09 7:45
All i want to do here, is run a for or while loop on file C to compare the entries with file B but dont know how to do it. Once thats doen, i want to count the number of entries that matched in both files.
The output would just give us a count of the number of entries that matched, a simple number.
 Apologies if i could not explain the problem in my previous messages.

Is it clear now?
PHV (MIS)
18 Feb 09 8:53
What have YOU tried so far and where in YOUR code are you stuck ?

Hope This Helps, PH.
FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer?

elgrandeperro (TechnicalUser)
18 Feb 09 9:11
Do you only care about the entries existence?  If the original order is not significant, then you usually sort both inputs then use comm on the sorted files with the -3 option.  diff would give the contextual difference, not what you want if you just want to check solely for existence.

If you don't care about multiple entries, then use uniq to supress multiple entries on input.
ksdh (TechnicalUser)
18 Feb 09 9:27
I was trying
cat C | while read line; do grep -i $line B ;done | wc -l

I dont know if i applied the right logic.

Thanks
elgrandeperro (TechnicalUser)
18 Feb 09 12:41
To fix your code, it would be:

cat C | while read line
  do
  egrep -i "^$line$" B > /dev/null
  if [ $? -eq 0 ]
  then echo $line in B
  fi
done

You need to egrep it because you don't want a substring match.
(BTW, I am a marginal shell programmer)
Annihilannic (MIS)
18 Feb 09 18:58
/usr/xpg4/bin/grep support the -f option.

Annihilannic.

ksdh (TechnicalUser)
19 Feb 09 5:14
Elgrandeperro
Thanks for your post. Your script does the job but could you please explain the logic behind  
egrep -i "^$line$" B > /dev/null
  if [ $? -eq 0 ]

What does ^$line$ mean
and $? -eq 0

Thanks
elgrandeperro (TechnicalUser)
19 Feb 09 9:30
grep finds all substrings so for instance a grep with like:

grep 47
in a file matches
147
1147

etc.  I thought there was an exact match grep, but at least does not have it.

egrep is a regular expression grep.  ^ means begins with, and an ending $ means end of line.  So you ^47$ would be "starts with 47 and ends with EOL.  We use " in your example because we want the variable to expand.

$? means the return value of the grep.  The expected return value of any command is at the bottom of its man page.  Most return 0 on success, in this case it returns 0 on a match.

 
feherke (Programmer)
19 Feb 09 9:41
Hi

Quote (elgrandeperro):

I thought there was an exact match grep, but at least does not have it.

Quote (Annihilannic):

/usr/xpg4/bin/grep support the -f option.
And hopefully supports the -x or --line-regexp option too.

Feherke.
http://rootshell.be/~feherke/

ksdh (TechnicalUser)
19 Feb 09 11:15
thanks elgrandeperro

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close