Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

File pattern match

Status
Not open for further replies.

margamo

Technical User
Joined
Feb 1, 2002
Messages
9
Location
US
I am steping through a file one line at a time. 2 samples of data I want to match are:
1.
(x) [highlight]Testosterone Enanthate[/highlight](x) [highlight]200mg[/highlight]
IM injection ( ) 250mg
( ) 300mg
Patient's maintenance schedule:
(x) [highlight]Every two weeks[/highlight]
( ) Every three weeks
( ) Every month
( ) Every three days

and
2.
(x) [highlight]Testosterone Cypionate[/highlight] (x) [highlight]200mg[/highlight]
IM injection ( ) 250mg
( ) 300mg
Patient's maintenance schedule:
(x) [highlight]Every two weeks[/highlight]
( ) Every three weeks
( ) Every month
( ) Every three days

I use a regular exprssion to match the Testosterone Enanthate or Testosterone Cyprionate, then I push it into an array and then I want to match the (X) 200 mg and the (x) Every two weeks
(or wherever the (x) happens to be then put that data into the array. Because the pattern is the same for both how can I match them appriopriately?

Thanks,
Margamo
 
post your perl code. Is this school work?

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
This is work related project.

I have 1400 files with about 300 pages of test each to work through.

This my code it works except for the Testosterone Enanthate and Testosterone Cyprionat example

use strict;

my $dir="t:/test";


opendir(DIRECTORY, $dir) || die("Cannot open directory");
my @thefiles= readdir(DIRECTORY);
closedir(DIRECTORY);

my @array = ();

my $drug = 'testosterone';
my $Date_of_note ='DATE OF NOTE';
my $date;
my $med;
my $med2;
my $med3;
my $med4;
my $med5;
my $dosage;
my $dosage2;
my $dosage3;
my $duration;
my $duration2;
my $cyprionate;
my $enanthate;

my $n;
foreach my $file (@thefiles) {
unless ( ($file eq ".") || ($file eq "..")) {
open FILE, "$dir/$file" or die "Can't open $file : $!";

while (my $line = <FILE>) {

next if $line =~ /^(\s)*$/; # skip blank lines

#date
if ($line =~ m/(DATE OF NOTE:)\s+(\b[\wA-Z][A-Z][A-Z]\s\d\d,\s\d\d\d\d)/) {

#keep the Date see if it follow by a line with testosterone
$date = $2;
}

#___200____ mg I.M.
if ($line =~ m/(Testosterone)\s*\_*(\d\d\d)_*\s*(mg)\sI\.*M\./) {
$med = $1;
push (@array, "U\t"),
push (@array, $date),
push (@array, "\t"),
push (@array, $med),
push (@array, "\t"),
push (@array, $2),
push (@array, $3),

push (@array, "\r");
}
#Testosterone _____200___mg
if ($line =~ /(Testosterone)\s\_*(\d\d\d)\_*(mg)/) {
$med2 = $1;
push (@array, "V\t"),
push (@array, $date),
push (@array, "\t"),
push (@array, $med2),
push (@array, "\t"),
push (@array, $2),
push (@array, $3),
push (@array, "\r");
}
# (x) 200mg
if ($line =~ m/(?<=Testosterone IM).*?\(\s*x+\s*\)\s* ((\d\d\dmg))/) {

$dosage = $2;
}
#Testosterone Cypionate
if ($line =~m/(Testosterone Cypionate)/i) {
$cyprionate = $1;

}

[highlight]if ($line =~ m/\(\s*x+\s*\)\s*(\d+mg)/i) {
$dosage2 = $1;
push (@array, "cyp\t"),
push (@array, $date),
push (@array, "\t"),
push (@array, $cyprionate),
push (@array, "\t"),
push (@array, $dosage2),
push (@array, "\r"), ;
}
# Testosterone Enanthate
if ($line =~m/(Testosterone\s+Enanthate)/i) {
$enanthate = $1;
}
next if $line =~m/(Testosterone\s+Enanthate)/i;
if ($line =~ m/\(\s*x+\s*\)\s*(\d+mg)/i && $enanthate ne undef) {
$dosage3 = $1;
push (@array, "ent\t"),
push (@array, $date),
push (@array, "\t"),
push (@array, $enanthate),
push (@array, "\t"),
push (@array, $dosage3),
push (@array, "\r"), ;
}[/highlight]
# (xx) 200mg
if ($line =~/\(\s*x+\s*\)\s*(\d+mg)/i) {
$dosage = $1;
}
#(X) Every

# Testosterone IM
if ($line =~ m/(Testosterone)\s+IM/) {
$med4 = $1;
push (@array, "X\t"),

push (@array, $date),
push (@array, "\t"),
push (@array, $1),
push (@array, "\t"),
push (@array, $dosage);
push (@array, "\r");
}

#___200____mg
if ($line =~ m/(\_*x\_*\d\d\d_*\s*mg)(Testosterone IM)/i) {
push (@array, "Z\t"),
push (@array, $date),
push (@array, "\t"),
push (@array, $med4),
push (@array, "\t"),
push (@array, $1);
push (@array, "\r");

}
if ($line =~ m/(?<=(Testosterone))\s(\d\d\d\smg)\sIM:/i) {
push (@array, "ZZ\t"),
push (@array, $date),
push (@array, "\t"),
push (@array, $1),
push (@array, "\t"),
push (@array, $2),
push (@array, "\r");
}

}

close FILE;

$n++;
print "\n$n:$file: \n";
print "@array\n";
}


}
 
I don't have the time right now to look through all that, but this does standout:

Code:
if ($line =~m/(Testosterone\s+Enanthate)/i) {
                      $enanthate = $1;
        }
        next if $line =~m/(Testosterone\s+Enanthate)/i;
        if ($line =~ m/\(\s*x+\s*\)\s*(\d+mg)/i && $enanthate ne undef) {

try like this:

Code:
if ($line =~m/(Testosterone\s+Enanthate)/i) {
                      $enanthate = $1;
        }
#        next if $line =~m/(Testosterone\s+Enanthate)/i;
        if ($line =~ m/\(\s*x+\s*\)\s*(\d+mg)/i && $enanthate) {

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
margamo

Without launching into a line-by-line critique of your code, it has a number of bad 'code smells'. I also notice that you make use of a date field extracted from the file, but this doesn't appear in your sample file contents.

As Kevin says, we don't have time to wade through your code to figure out what you want. So we can help you, can you answer the following:[ol][li]Is there more than one set of prescription data in each file, or is there only one prescription per file?[/li][li]What data are you trying to extract?[/li][li]What do you want the output to look like?[/li][/ol]If you could post a real set of prescription data (suitably anonymised if necessary), it would help a lot. I suspect this can be accomplished with relatively little coding.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::PerlDesignPatterns)[/small]
 
Thanks for your prompt replies.

I am Looking for output that looks like this:
testing Date Med Dosage Duration
U SEP 22, 2004 Testosterone 200 mg
U SEP 02, 2004 Testosterone 200 mg every 3 weeks
V JUN 22, 2004 Testosterone 200 mg
U JUN 03, 2004 Testosterone 200 mg
U MAY 13, 2004 Testosterone 200 mg

These are progress notes with forms people fill out when they give a shot but there are a number of forms and they can appear anywhere in the file. There is one file per patient.

using KevinADC code:
if ($line =~m/(Testosterone\s+Enanthate)/i) {
$enanthate = $1;
}
# next if $line =~m/(Testosterone\s+Enanthate)/i;
if ($line =~ m/\(\s*x+\s*\)\s*(\d+mg)/i && $enanthate) {

I get double of everything:
cyp AUG 29, 2002 Testosterone Cypionate 200mg
ent AUG 29, 2002 TESTOSTERONE ENANTHATE 200mg
cyp AUG 16, 2002 Testosterone Cypionate 200mg
ent AUG 16, 2002 TESTOSTERONE ENANTHATE 200mg

I provided the code because I was asked if it a school project and to show you I am working on this. I don't expect you to wade through it. It works except for the part in yellow. I think helps just to look at the original question.

Thanks a lot,
Margamo
 
now give this a try, instead of :

Code:
if ($line =~m/(Testosterone\s+Enanthate)/i) {
                      $enanthate = $1;
        }
#        next if $line =~m/(Testosterone\s+Enanthate)/i;
        if ($line =~ m/\(\s*x+\s*\)\s*(\d+mg)/i && $enanthate) {

try this:

Code:
if ($line =~m/(Testosterone\s+Enanthate)/i) {
                      $enanthate = $1;
                      next;
        }
        if ($line =~ m/\(\s*x+\s*\)\s*(\d+mg)/i && $enanthate) {

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
I guess I solved it myself. I changed the regex to read any word after Testosterone which included Enanthate and Cyprionate. Then I just one if statement instead of 2.

Thanks for your help.
Margamo
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top