Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Percentile Function 1

Status
Not open for further replies.

NevG

Programmer
Oct 10, 2000
162
GB
Hey gang

Can anybody help me with a function that will calculate percentiles from a given series of data, probably a receordset or dataset,
I could probably work this out if I even knew the process of calculating the thing.

Any help much appreciated

Thanks

Nev
 
This may be something you want to avoid. Percentile is a (stastical) measure of the values above / below a data set. It is normally related to a standard distribution curve for the population. I would suggest that you attempt to add a reference to the excel library and use the built in Excel function. If this is not an acceptableapproach for you, you will need to start with a text on stastics to have some understanding of the process.

MichaelRed
m.red@att.net

There is never time to do it right but there is always time to do it over
 
Hi there,
I need to implement this for about 40000 records on ground water stats.
Have you found a method of implememting the Percentile function in VB of an array of data?

Regards
Ed
 
Simple (not necessarily optimal) method is as follows:

Put the values into an array
Sort the array (roll your own sort or search for an implementation of ShellSort - avoid QuickSort)
Calculate the index for the entry representing your percentile as Pctile * UBound(Array) and extract

There are a number of fun parts, like what to do if the index isn't integer, but that should get you started...

HTH
 
hello dear!!

its very easy...
You have a range of data sorted in acsending order....
e.g. 1567, 1634, 1700, 2100, 2230, 2400, 2550

and you want to calculate 75th percentile point... then
All you need to do is to reach to Nth element in the series, where N is explained below-

N=(n-1)*p+1, here n=number of elements in the series and p in our case is .75
So in our case N=((7-1)*0.75)+1 which is 5.5. That mean in the above series 5.5th element is your answer....which you can reach in following way....
1. Reach to 5th element..Store it into a variable called "BasePercent".
2. then get the difference of 6th and 5th elements and multiply that by .5, Store this into a variable called "ExtraPercent".
3. Now add BasePercent and ExtraPercent and store the result into another cariable "RealPercent".....THIS IS YOUR ANSWER......YOU CAN TEST YOUR ANSWER BY PUTTING THE SAME RANGE IN EXCEL AND USING THE EXCEL FORMULA....
thanks

Rohit Pareek,
Ratangarh, India
 
Rohit,


Missing ALL of the bells an whistles (as well as the oft neglected error trapping Stufffffffffffffffff),

the following appears to be a near minmalist implementation. Not that this assumes the value array is presorted, and it will get sorely confused if any weirdness is introduced??????


MichaelRed
m.red@att.net

There is never time to do it right but there is always time to do it over
 
this is the right calculation Rohit.
Just one precision however.
the .5 comes from the fact that
at 5 we had the value 2230.
Now there is 5.5-5=0.5 left to calculate
So i multipliy the difference of 6th and 5th elements by 0.5
 
Hello,
This was a big help to me. I have one question, this formual does not work for a dataset with duplicate values. for example
10
20
20
20
30
30
40
50.
According to Excel, the 25th percentile is 20, 50th is 25 and 75th is 32.5. This formula comes up with 25th percentile is 27.5, 50th is 20 and 75th is 30. How would I get the percentile from this dataset?
Thanks in advance for your help.
 
I calculate Excel as being correct. With the dataset you provide their are 8 values. Remember that the first and last map to 0% and 100%, so there are 7 intervals in between, the first 20 maps to 1/7 (14.3% or so) and the rest follow. So the 25% mark occurs part way between the first and second 20s, but they're both 20 so that's easy. The last 20 and the first 30 lie equidistant either side of the 50% point, so the percentile value is half way between them, or 25.
The 75% point is the interesting one. It lies between the 5/7 and 6/7 point and straight-line interpolation tells us (after a bit of fraction arithmetic) that the value should be 40*1/4 + 30*3/4, or 32.5.
Now if there were NINE points (let's add a 60 value at the end for curiosity) the values would all fall on multiples of 12.5% and the 25, 50 and 75 %iles would be 20, 30 and 40.

HTH,

Mike
 
I think lorimickey is NOT applying Rohit's formula properly. If you apply it to the data set that lorimickey has provided, 25th percentile is NOT 27.5, 50th is NOT 20 and 75th is NOT 30. Let us do some calculation:

N=(n-1)*p+1

Here n = 8, so N=7p+1

Case p=.25:
N=7*(0.25)+1=2.75 => 20+(0.75)*(20-20)=20 is the answer

Case p=.50:
N=7*(0.50)+1=4.5 => 20+(0.50)*(30-20)=25 is the answer

Case p=.75:
N=7*(0.75)+1=6.25 => 30+(0.25)*(40-30)=32.5 is the answer

Thanks Rohit!

 
See below. Try it. It agrees with rparmar70. (and Excel)


Code:
Public Function basPercentileTest()

    Dim MyArray(7) As Single

    MyArray(0) = 10
    MyArray(1) = 20
    MyArray(2) = 20
    MyArray(3) = 20
    MyArray(4) = 30
    MyArray(5) = 30
    MyArray(6) = 40
    MyArray(7) = 50

    Debug.Print basPercentile(0.25, 10, 20, 20, 20, 30, 30, 40, 50)
    Debug.Print basPercentile(0.5, 10, 20, 20, 20, 30, 30, 40, 50)
    Debug.Print basPercentile(0.75, 10, 20, 20, 20, 30, 30, 40, 50)


End Function


Code:
Public Function basPercentile(Pctl As Single, ParamArray MyAray() As Variant) As Single

    'Explination Courtsey of Rohit Pareek, Ratangarh, India
    'Tek-Tips; thread710-170007

    'You have a range of data sorted in acsending order....
    'e.g. 1567, 1634, 1700, 2100, 2230, 2400, 2550

    'and you want to calculate 75th percentile point... then
    'All you need to do is to reach to Nth element in the series,
    'where N is explained below.

    'N = (N-1)* p + 1 ,
    'n = number of elements in the series
    'and p in our case is .75

    'So in our case N=((7-1)*0.75)+1 which is 5.5.
    'That means in the above series 5.5th element is your answer
    '   ....which youcan reach in following way....
    '1. Reach to 5th element..Store it into a variable called "BasePercent".
    '2. Get the difference of 6th and 5th elements and multiply that by .5,
    'Store this into a variable called "ExtraPercent".
    '3. Now add BasePercent and ExtraPercent and store the result into
    'another cariable "RealPercent"

    '.....THIS IS YOUR ANSWER...... YOU CAN TEST YOUR ANSWER BY PUTTING THE SAME
    'RANGE IN EXCEL AND USING THE EXCEL FORMULA....

    'Code by Michael Red    7/26/2002
    'Sample Usage:
    '? baspercentile(0.75, 1567, 1634, 1700, 2100, 2330, 2400, 2550)
    ' 2365

    Dim NthEl As Single         '~ N
    Dim StrtPctle As Single     '~ BasePercent above
    Dim IncrPctle As Single     '~ Invremental Percentage ~~ 0.5
    Dim DeltPctle As Single     '~ Extra Percent

    NumElems = UBound(MyAray) + 1
    NthEl = (NumElems - 1) * (Pctl) + 1
    If (NthEl = Int(NthEl)) Then
        basPercentile = MyAray(NthEl)
     Else
        StrtPctle = MyAray(Int(NthEl - 1))
        IncrPctle = (NthEl - Int(NthEl))
        DeltPctle = (MyAray(Int(NthEl)) - MyAray(Int(NthEl - 1)))
        basPercentile = StrtPctle + (IncrPctle * DeltPctle)
    End If


End Function

MichaelRed
m.red@att.net

Searching for employment in all the wrong places
 
MichaelRed had the right idea with his first response. Add a reference to excel and this becomes much easier.
Call an excel function as such.
prcnt = WorksheetFunction.Percentile(array(), 0.8)
To add all the values to your array use the CSng() function to convert your numbers to singles. You don't need to sort your array.
 
'right (idea)', is -of course- a relative term.




MichaelRed
m.red@att.net

Searching for employment in all the wrong places
 
rightly said MichaelRed......"right idea" is a relative term.
I wansted to give a technical logic to this statistical definition of percentile and assume that the user knows error handling and how to sort an array.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top