Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

generate treatment&control with perl

Status
Not open for further replies.

klkot

Programmer
Joined
Jun 6, 2007
Messages
18
Location
US
Hi again Perl gurus
I've created the following Perl script to generate groups of test (T) and development (D) or "treatment and control" with the flip of a coin so to speak divided according to a list of a fake case Id's. So the first perl script I've written to do the simple task of:
fakeid group
1 T
2 D
3 D
4 T
5 D
6 T
and so on and so forth
My perl script to create that dataset follows:

##creating a random test or development generator
@random = ();
for ($i=1; $i<@fakeids; $i+=2){
$random[$i] = rand();
if ($random[$i]<0.5){
print OUTFILE "$fakeids[$i],T\n$fakeids[$i+1],D\n";
}
else {print OUTFILE "$fakeids[$i],D\n$fakeids[$i+1],T\n";
}
}

Now I want to print a dataset that creates 10 more rows of T's and D's (or how many I may want in the future) lined up next to the fakeid's.
So my new dataset will look like:
fakeid group group group ..............
1 T D D
2 D T T
3 D D T
4 T T D
5 D T D
6 T D T

Does anybody know how I can generate however many groups I want to with different column titles alongside the fakeids?
I keep trying a loop but it doesn't seem to work very well so any help will be muuuch appreciated.
Cheers,
klkot
 
You want to use a hash of arrays most likely. Where the hash key is the unique ID and the array associated with the hash key is your T/D data. See if this link helps you to understand how to construct such a data set:






------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Okay thanks Kevin
So I created it how *I think* it should look but there's something wrong with the way I've created the hash. Do you know what I've done wrong? Here's my whole code:

open(FH ,"/home/kotterkl/nsqipfakedata")||die "Couldn't open infile";
open(OUTFILE ,">nsqip_test_development")||die "Couldn't open outfile";

@fakeids = ();
$observation = <FH> ; #shaving off header row
while($observation = <FH>) {
chomp $observation;
my ($fakeid) = $observation;
push(@fakeids, $fakeid);
#creating an array out of the fakeids
}
close(FH);
##creating a random test or development generator
@random = ();
%hoa = ();
foreach my $fakeid (@fakeids) {
$hoa{$fakeid} = {};
}
for ($i=1; $i<@fakeids; $i++){
$hoa{$fakeids[$i]} = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] unless (exists $hoa{$fakeids[$i]});
}

for ($i=1; $i<@fakeids; $i+=2){
$random[$i] = rand();
if ($random[$i]<0.5){
push(@{$hoa{$fakeids[$i]} }, "T");
push(@{$hoa{$fakeids[$i+1]}},"D");
}
else {push(@{$hoa{$fakeids[$i]}},"D");
push(@{$hoa{$fakeids[$i+1]}},"T");
}
}

foreach my $fakeid (keys(%hoa)) {
print OUTFILE "$fakeid,";
print OUTFILE join(",", @{$hoa{$fakeids}});
print OUTFILE "\n";
}

close(OUTFILE);
 
Yes, as Kevin suggests, it's time to just learn more about Perl's complex data structures, and control operators. Here's something loosely based off of what you devised.

Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$recordcount[/blue] = [fuchsia]6[/fuchsia][red];[/red]
[black][b]my[/b][/black] [blue]$datacount[/blue] = [fuchsia]3[/fuchsia][red];[/red]
[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [url=http://perldoc.perl.org/functions/join.html][black][b]join[/b][/black][/url][red]([/red][red]'[/red][purple]  [/purple][red]'[/red], [red]'[/red][purple]fakeid[/purple][red]'[/red], [red]([/red][red]'[/red][purple]group [/purple][red]'[/red][red])[/red] x [blue]$datacount[/blue][red])[/red], [red]"[/red][purple][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]

[olive][b]for[/b][/olive] [black][b]my[/b][/black] [blue]$i[/blue] [red]([/red][fuchsia]1..[/fuchsia][blue]$recordcount[/blue][red])[/red] [red]{[/red]
	[black][b]my[/b][/black] [blue]@data[/blue] = [red]([/red][red])[/red][red];[/red]
	[olive][b]for[/b][/olive] [black][b]my[/b][/black] [blue]$j[/blue] [red]([/red][fuchsia]1..[/fuchsia][blue]$datacount[/blue][red])[/red] [red]{[/red]
		[url=http://perldoc.perl.org/functions/push.html][black][b]push[/b][/black][/url] [blue]@data[/blue], [red]([/red][url=http://perldoc.perl.org/functions/rand.html][black][b]rand[/b][/black][/url][red]([/red][red])[/red] < [fuchsia].5[/fuchsia] ? [red]'[/red][purple]T[/purple][red]'[/red] : [red]'[/red][purple]D[/purple][red]'[/red][red])[/red][red];[/red]
	[red]}[/red]
	[url=http://perldoc.perl.org/functions/printf.html][black][b]printf[/b][/black][/url] [black][b]join[/b][/black][red]([/red][red]'[/red][purple] [/purple][red]'[/red], [red]([/red][red]'[/red][purple]%-7s[/purple][red]'[/red][red])[/red] x [red]([/red][fuchsia]1[/fuchsia] + [blue]$datacount[/blue][red])[/red][red])[/red], [blue]$i[/blue], [blue]@data[/blue][red];[/red]
	[black][b]print[/b][/black] [red]"[/red][purple][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[red]}[/red]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[/ul]
[/tt]

And the output:

Code:
>perl scratch.pl
fakeid  group   group   group
1       D       D       D
2       D       D       T
3       T       D       D
4       T       D       D
5       D       D       T
6       D       T       T

- Miller
 
I am not good with theoretical or conceptual data. Post the exact data you are using:

nsqipfakedata

asd explain how you are using it.

This looks suspect:

Code:
foreach my $fakeid (@fakeids) {
  $hoa{$fakeid} = {};
  }
for ($i=1; $i<@fakeids; $i++){
$hoa{$fakeids[$i]} = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] unless (exists $hoa{$fakeids[$i]});
}

the first "foreach" loop looks unnecessary. It creates a hash of empty hashes which you never use again in your script. The second "for" loop I think should be like this:

Code:
for my $id (@fakeids){
   $hoa{$id} = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] unless (exists $hoa{$id});
}

which assumes $id (each element of @fakeids) is a unique value, probably a digit (1,2,3,etc). Each hash key will be an array.

One thing to note is you start the "for" loop at 1, $i=1, but arrays start at 0 (zero), so you in affect skipped over whatever the first element of the array was. Maybe you wanted to though.

In the next loop I don't understand why you are processing the loop like this:

Code:
for ($i=1; $i<@fakeids; $i+=2){

which processes only the odd numbered elements of @fakeid.

I see Miller has posted while I was posting and editing my reply. [wink]





------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thanks!
You were right. It was redundant. I redid with more thought and didn't need to create any hashes or arrays. I was just making things much harder than necessary.

Sorry and thanx a lot
~klkot
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top