Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

getting the 95% value from an array 3

Status
Not open for further replies.

netman4u

Technical User
Joined
Mar 16, 2005
Messages
176
Location
US
I am trying to get the 95th percentile from an array of values. In other words, say I have an array of values as in:

29,98,34,52,78,45,67,88,29,13,0,23,13,67,45,34


I want to throw away the highest 5% of the numbers and take the next highest value.

Thanks,

Nick
 
seems your example is not a good one since the top 5th percentile of 16 is less than 1. Can't remove less than one element from an array. But if the array is 20 elements or over, this might be good enough:

Code:
my @values = (29, 98, 34, 52, 78, 45, 67, 88, 29, 13, 0, 23, 13, 67, 45, 34, 33, 66, 44, 22);
my $top_5_percent = int(@values * .05);
if ($top_5_percent >= 1) {
   @values = sort {$a <=> $b} @values;
   pop @values for (1 .. $top_5_percent);
   print "@values";
}
else {
   print "Sample is too small";
}


if you know you will always have a large enough sample:

Code:
my @values = (1 .. 100);
@values = sort {$a <=> $b} @values;
pop @values for (1 .. int(@values * .05));
print "@values";
 
Thanks Kevin,

One little thing, I need to get the p5th percentile number in a scalar versus array for comparison. I assume this is all I change:

Code:
my @values = (29, 98, 34, 52, 78, 45, 67, 88, 29, 13, 0, 23, 13, 67, 45, 34, 33, 66, 44, 22);
my $top_5_percent = int(@values * .05);
if ($top_5_percent >= 1) {
   @values = sort {$a <=> $b} @values;
   [b]$percentile95th[/b] = pop @values for (1 .. $top_5_percent);
   print "@values";
}
else {
   print "Sample is too small";
}
 
if you tried it you will see it works, sort of anyway. If the sample will only return one value then it works fine, but if it will return mnore than one valuye you have to create a list, such as an array to store all the returned values. Like so:

Code:
my @values = (1 .. 500);
my $top_5_percent = int(@values * .05);
my @percentile95th = ();
if ($top_5_percent >= 1) {
   @values = sort {$a <=> $b} @values;
   push @percentile95th, pop @values for (1 .. $top_5_percent);
   print "@percentile95th";
}
else {
   print "Sample is too small";
}

then you use @percentile95th to do whatever it is you need to do with the list. It will already be sorted highest to lowest so $percentile95th[0] will have the highest value and $percentile95th[$#percentile95th] will have the lowest value.
 
Kevin,

Having some trouble getting it working. Below is the code I am using. The output I am getting from the:

print "$top_5_percent\n";

is:

Code:
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
 etc. all the way up to 74

Here is my messy code:

Code:
open (LOG, ">$logfile") or die ("Error opening log file $1");
open (COMP, "$trans_file") or die ("Error opening element allocation file $trans_file: $1");
my @comp = <COMP>;
chomp @comp;
close (COMP);
#
# Process command line variables
#
my $fromDate    = $ARGV[0];
my $toDate      = $ARGV[1];
my $subjectName = $ARGV[2];

#############################################################################################
#
#  95th Percentile Calculations
#
#############################################################################################
my $line;
my $value;
my $percentile95;
my @rdifetch = ();
my @allvalues = ();
my $element;
my %groups = ();
my %all_stats = ();
my $entry;
#my @elements = ();
#
# Loop through the config file, split each entry into the group, element and allocation.
# Create a hash with the key being the element and the value being the allocated bandwidth
# for use in later lookups.
#
foreach (@comp) {
    next if /^#/;
    next unless $_ =~ /$subjectName/;
    my ($group,$element,$allocation) = split(/,/,$_);
    push (@elements,$element);
    push (@allocation,$allocation);
    $lookup{$element} = $allocation;
}
#
# Loop through the element array and run nhExportData for each element to get the bandwidth
# usage statistics.
#
#print "$NH_BIN/nhExportData -subjName $subjectName -subjType \"group\" -all -fromDate $fromDate -toDate $toDate -vars \"bandwidthIn\",\"bandwidthOut\"\n";
$all_stats{$subjectName} = [ `$NH_BIN/nhExportData -subjName $subjectName -subjType group -all -fromDate $fromDate -toDate $toDate -vars "bandwidthIn","bandwidthOut"` ];
my %select = ();
foreach (@elements) {
    foreach my $line (@{$all_stats{$subjectName}}) {
        #print "$line\n";
        next unless $line =~ /^\"*.?\"/;
        next unless $line =~ /$_/;
        #print "$line\n";
        my ($elementType,$elementName,$aliasName,$speed,$speed2,$sampleTime,$deltaTime,$totalTime,$bandwidthIn,$bandwidthOut) = split(/,/,$line);
        if ($bandwidthIn > $bandwidthOut) {
            $bandwidth = $bandwidthIn;
        }else{
            $bandwidth = $bandwidthOut;
        }
        push (@bandwidth, $bandwidth);
        #print "@bandwidth\n";
        #
        # Calculate 95th percentile for all the bandwidth data for this element 
        #
        my $top_5_percent = int(@bandwidth * .05);
        print "$top_5_percent\n";
        if ($top_5_percent >= 1) {
            @values = sort {$a <=> $b} @bandwidth;
            $percentile95th = pop @values for (1 .. $top_5_percent);
            if ($lookup{$_} > $percentile95th ) {
                $excess = 0;
            }elsif ($lookup{$_} < $percentile95th) {
                $usage = $lookup{$_};
                $excess = ($percentile95th - lookup{$_});
                 print "@values";
            }else {
                print "Sample is too small";
            }
        }
        $select{$_} = ["$usage","$excess","$lookup{$_}"];
    }
}


Any help is appriciated (as always),

Nick
 
I wish I could help but the code has too many unknowns and there is nothing I am spotting that is obvious.
 
does this help at all... (probably not!)

Code:
[b]#!/usr/bin/perl[/b]

@values = qw( 29 98 34 52 78 45 67 88 29 13 0 23 13 67 45 34 );

[red]##############
$percent = 95;
##############[/red]

$percentile = 100 / $percent;

map {$total += $_} @values;
$average = $total / scalar @values;

print "The ${percent}th percentile\n";
print "-------------------\n";
print "Average = $average\n\n";

foreach (sort @values) {
  print "$_\n" if $_ * $percentile >= $average;
}

outputs:-

The 95th percentile
-------------------
Average = 44.6875

45
45
52
67
67
78
88
98


Kind Regards
Duncan
 
the nested loops seems like it might be the problem, but I just can't really tell, the code is very hard to read and I just have no idea what some of it's doing.
 
I know it not at all pretty Kevin but this job was not scoped right and I have about half the time I need to do this project right. The perl portion is about 40% of the project. The rest is documentation, setting up the test environment in a messed up lab and some quirky, much more complicated than need be configuration file creation.

Evrything seems to be working ok (I still need to get to the point when I can actually test if the numbers are right) up to this point:

Code:
        push (@bandwidth, $bandwidth);
        #print "@bandwidth\n";
        #
        # Calculate 95th percentile for all the bandwidth data for this element 
        #

At least the @bandwidth array is loaded with a ton of bandwith statistics, which is what the next part of the code is looking for. I morphed your code with things I created earlier in my script:

Your nice clean code:
Code:
my @values = (1 .. 500);
my $top_5_percent = int(@values * .05);
my @percentile95th = ();
if ($top_5_percent >= 1) {
   @values = sort {$a <=> $b} @values;
   push @percentile95th, pop @values for (1 .. $top_5_percent);
   print "@percentile95th";
}
else {
   print "Sample is too small";
}


My destructive morph of your code:

Code:
        my $top_5_percent = int(@bandwidth * .05);
        print "$top_5_percent\n";
        if ($top_5_percent >= 1) {
            @values = sort {$a <=> $b} @bandwidth;
            $percentile95th = pop @values for (1 .. $top_5_percent);
            if ($lookup{$_} > $percentile95th ) {
                $excess = 0;
            }elsif ($lookup{$_} < $percentile95th) {
                $usage = $lookup{$_};
                $excess = ($percentile95th - lookup{$_});
                 print "@values";
            }else {
                print "Sample is too small";
            }
        }
        $select{$_} = ["$usage","$excess","$lookup{$_}"];

What I am thinking is that if I have the 95th percentile of the actual bandwidth I can compare it to the allocated bandwidth I got from a config file here:

Code:
foreach (@comp) {
    next if /^#/;
    next unless $_ =~ /$subjectName/;
    my ($group,$element,$allocation) = split(/,/,$_);
    push (@elements,$element);
    push (@allocation,$allocation);
    [b]$lookup{$element} = $allocation;[/b]
}

As I am looping through the @element array again at the outside of that nested foreach loop and $_ contains the key of the %lookup hash.

Hey, not pretty I know...I think lack of sleep and jet lag lately has been looping me out.

[3eyes]

I can post some actual data in the @bandwidth array manyana.

Nick
 
I see you have this back in your code:
Code:
if ($top_5_percent >= 1) {
            @values = sort {$a <=> $b} @bandwidth;
            [b]$percentile95th[/b] = pop @values for (1 .. $top_5_percent);

that is not correct, you must use an array there:

Code:
if ($top_5_percent >= 1) {
            @values = sort {$a <=> $b} @bandwidth;
            [b]@percentile95th[/b] = pop @values for (1 .. $top_5_percent);

for the reason explained previously.
 
ahh! I corrected that incorrectly! Should be:

Code:
push @percentile95th, pop @values for (1 .. $top_5_percent);

plus I have a suspicion that the data in @bandwidth might need some more munging to sort it like I showed you:

@values = sort {$a <=> $b} @bandwidth;

that assumed @bandwidth just had simple whole or mixed numbers in it.

 
maybe if I explained this code it will help you:

Code:
my @values = (1 .. 500);[b]# just load the array with 1 thru 500[/b]
my $top_5_percent = int(@values * .05);[b]#figure out how many "places" there are in the top five percent (25 for a sample of 500)[/b]
my @percentile95th = ();[b]#just declares the array to hold the top five percent[/b] 
if ($top_5_percent >= 1) {[b]#see if the sample was big enough, you need at least a sample of 20 to have 1 place in the top 5 percent[/b]
   @values = sort {$a <=> $b} @values;[b]#sort the array in ascending numeric order (not really necessary in this example as the original array is already in ascending numeric order)[/b] 
   push @percentile95th, pop @values for (1 .. $top_5_percent);[b]#load @percentile95th with the last N number of elements from the @values array which are the top five percent[/b]
   print "@percentile95th";[b]#just prints the final data[/b]
}
else {
   print "Sample is too small";[b]#sample was under 20 so we can't get a top 5 percent[/b]
}
 
Nick - does this make the selection process any more clear?

Code:
[b]#!/usr/bin/perl[/b]

[blue]@values = (29, 98, 34, 52, 78, 45, 67, 88, 29, 13, 0, 23, 13, 67, 45, 34, 33, 66, 44, 22);[/blue]

@values = reverse sort @values;

[red]$to_pick[/red] = (100 - 95) * (scalar @values) / 100;

for ($x=0; $x<=[red]$to_pick[/red]; $x++) {
  print "$values[$x]\n";
}


Kind Regards
Duncan
 
Thanks for all the replies and patience with me here Duncan and Kevin. After reviewing my code and the nested foreach loops, I realize it is fubar. I have modified my code some below. Let me explain what I am attempting to do:

Code:
$all_stats{$subjectName} = [ `$NH_BIN/nhExportData -subjName $subjectName -subjType group -all -fromDate $fromDate -toDate $toDate -vars "bandwidthIn","bandwidthOut"` ];
my %select = ();
foreach $elem (@elements) {
    foreach my $line (@{$all_stats{$subjectName}}) {
        #print "$line\n";
        next unless $line =~ /^\"*.?\"/;
        next unless $line =~ /$elem/;
        #print "$line\n";
        my ($elementType,$elementName,$aliasName,$speed,$speed2,$sampleTime,$deltaTime,$totalTime,$bandwidthIn,$bandwidthOut) = split(/,/,$line);
        if ($bandwidthIn > $bandwidthOut) {
            $bandwidth = $bandwidthIn;
        }else{
            $bandwidth = $bandwidthOut;
        }
        $bandwidth{$elem} = $bandwidth;
    }
}

When I run the system command here:

Code:
$all_stats{$subjectName} = [ `$NH_BIN/nhExportData -subjName $subjectName -subjType group -all -fromDate $fromDate -toDate $toDate -vars "bandwidthIn","bandwidthOut"` ];

I am loading up the %all_stats hash with polling information from a database. I am primarily concerned with the "bandwidthIn" and "bandwidthOut" data because the larger of these I will use for calculations.

I am running the command for a "group" of "elements". I have to do it that way. What I end up getting is poll after poll of stats data for ALL elements in the group. I have an array @elements loaded up with all the elements in the group the program is running for, and I want to loop through all the stats data and pull out just the data for each element and save it somehwere to use in calculations so for each element in @elements, I loop through all the stats data and pick out just that elements data and load in into a hash the key being the element and the value being an array of all the stats data picked out for that element. I need to choose the larger of the $bandwidthIn and $bandwidthOut and only use that value.

Once I have all the stat data for each individual element, I need to get the 95th percentile of the bandwidth for each element and compare that with the bandwidth allocated to that element which I loaded earlier with:

$lookup{$element} = $allocation;

So once I get the $percentile95th[0] (which is the 95th percentile of the actaul bandwidth) I need to compare that with the bandwidth allocated to that element and generate a csv for a program to read and generate a chart (among other things), with amount "exceding" the bandwidth, or amount below the bandwidth.

I hope that explains it better, my code was not doing a good job of explaining itself.

Any help is appriciated,

Nick

 
Ok folks, I think I am getting my perl legs back after 6 months of no coding. I have working code. Thanks again for your patience and help.

Nick
 
A star for me????!!!!! I just don't know what to say! To start with, I would like to thank......
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top