checking dates and removing lines

siswjh · Nov 12, 2002

I have a perl script that I am working on. I would like for it to open the messages file and check for errors and warning. I want the check to be date specific though. I would also like for it to remove duplicate lines too. This is what I have so far.

Code:

#!/usr/bin/perl -w
open(MESSAGES,&quot;/var/adm/messages&quot;) || die &quot;cannot open messages: $!&quot;;
open(OUT,&quot;>/export/home/siswjh/message1&quot;) || die &quot;cannot create message: $!&quot;;
while (<MESSAGES>) {
  if  (/corrupt/i || /warning/i || /error/i) {
  print OUT $_;
}
}

This part works fine in that it opens mesages files and puts what I searching for in the message1 file. It is not date specific and it does not remove duplicate lines. I would appreciate any help.

Jesse

icrf · Nov 12, 2002

Define 'date specific'. We would have to know the format of any given line, and specifically, the format of the date and what you'd want done with it.

About duplicate lines, it would be very compute intensive to search the entire file for each line in the file. If all duplicate lines were guarenteed to be right after one another, it's easier, needing only trivial lookahead space. When it removes duplicate lines, does it update the messages file it read from, or should it be outputting all unique lines to some specific file (like message1)? ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

siswjh · Nov 12, 2002

The date looks like this in the /var/adm/messages file:
Nov 10

It has a time too but I don't need that just the date. I want the script to run on a daily basis but only find errors and warnings for that date. For today it would search the messages file for error,warnings and corrupt for Nov 12

The duplicate lines I think are going to be more difficult because some of the duplicate are going to have a different time stamp. When I say different time stamp I am not talking about the date but the actual time.

I could have a line in the messages file that looks something like this:

Nov 10 04:00:03 lax unix: WARNING: /pci@1f,0/pcisd@3,0 (sd48

but have a duplicate line like this

Nov 10 04:05:04 lax unix: WARNING: /pci@1f,0/pcisd@3,0 (sd48

notice that there is a one second difference which in effect makes this line different.

I have this running in a korn script, but I wanted to see if it could be done with perl. I know that most of the korn shell scripts that I have converted to perl tend to be a lot smaller more stream lined. That is what I was trying to do here.

icrf · Nov 12, 2002

Alright, here's what I've come up with, not sure how correctly it works or how portable it might be (I'm working on win32, I read warnings of possible differences with the time functions).

Code:

#!/usr/bin/perl -w

use POSIX qw(strftime);
$today = strftime &quot;%b %d&quot;, localtime; #eg 'Nov 12'

open(MESSAGES,&quot;/var/adm/messages&quot;) || die &quot;cannot open messages: $!&quot;;
open(OUT,&quot;>/export/home/siswjh/message1&quot;) || die &quot;cannot create message: $!&quot;;

while(<MESSAGES>)
{
	if(/$today \d{2}:\d{2}:\d{2} (.*(?:corrupt|warning|error).*)/i)
	{
		unless(defined($temp{$1}))
		{
			print OUT $_;
			$temp{$1} = 0;
		}
	}
}

close MESSAGES;
close OUT;

The duplicate line trick is just defining each uniquely discovered message as a hash key, and the date is fairly self-explanitory, or at worst check out perldoc's POSIX manual:

http://www.perldoc.com/perl5.6.1/lib/POSIX.html

Hope it helps. ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

siswjh · Nov 12, 2002

Works very nice. I do have one more question though. The \d{2} what is that they do.I mean see what it does but I don't think I quite understand it. I thought \d matched a digit.

icrf · Nov 12, 2002

The {2} is how many of the previous character to match. I think you can set ranges, like {2,5} for two to five matches. I want to say you can even set upper and lower bounds, like {2,} for at least two and maybe more, or {,8} for as many as eight but no more. I'm not certain on the limit bit, but yeah, they count things. Can anyone confirm/refute this? ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

siswjh · Nov 13, 2002

Thanks for all your help icrf!! I went from a korn script with 93 lines of code to a perl script with 14 lines of code!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

checking dates and removing lines

siswjh

MIS

icrf

Programmer

siswjh

MIS

icrf

Programmer

siswjh

MIS

icrf

Programmer

siswjh

MIS

Similar threads

Part and Inventory Search

Sponsor