scannin values from within a file 2

devarshi77 · Oct 31, 2004

hi im new here and need some guidance as i dont wan swim any longer!
the problem is that i have created a file in R and i want to read p-values from the file. the file has 1000 simulations. someone told me that c can do it effortlessly.
this is the final aim of the project im workin on:

i wan to read the file and give me the list of p-values smaller than 0.05 .

the file output looks like this (1000 such observations) :
------
One sample Kolmogorov-Smirnov Test of Composite Normality

data: y
ks = 0.1062, p-value = 0.5
alternative hypothesis: True cdf is not the normal distn. with estimated parameters
sample estimates:
mean of x standard deviation of x
0.7940168 0.8786753
------------
if anyone can show me a way out of this, i d appreciate.

best

bcastner · Nov 1, 2004

I am assuming this is a text file. forum68 should be tried with your question, as I believe you can parse the file into either Access or Excel to do the sorting/selection you desire.

tsuji · Nov 1, 2004

Hello devarshi77,

I hope this script can help you as a quick-development intermediary. Make it to a .vbs file and double-click to go. p-values will be extracted to a text file as well with each occurance occupies one line.

Code:

infilespec="d:\test\abc.txt"    [blue]'edit to point to your data file[/blue]
outfilespec="d:\test\def.txt"   [blue]'edit to point to p-value file[/blue]
 
set fso=createobject("scripting.filesystemobject")
if not fso.fileexists(infilespec) then
    set fso=nothing : wscript.echo "Data file not exist. Operation aborted."
    wscript.quit 9
end if
sdata=fso.opentextfile(infilespec,1,false).readall

spattern="\bp-value(\s*)=(\s*)[0-9]{1,}(.[0-9]{1,}|[0-9]*)\s"
set regex=new regexp
with regex
	.pattern=spattern
	.ignorecase=true
	.global=true
end with
sout=""
set matches=regex.execute(sdata)
for each match in matches
    snum=trim(replace(replace(match,"p-value",""),"=",""))
    if sout<>"" then sout=sout & vbcrlf & snum else sout=snum
next
set regex=nothing

set ots=fso.opentextfile(outfilespec,2,true)
ots.write sout
ots.close : set ots=nothing
set fso=nothing

regards - tsuji

bcastner · Nov 1, 2004

tsuji,

Since you are reading the thread, what about a conversion to a comma delimited file so that an import to Excell or Access is easier?

data: y
ks = 0.1062
p-value = 0.5
description: "alternative hypothesis: True cdf is not the normal distn. with estimated parameters"
mean of x = 0.7940168
standard deviation of x = 0.8786753

With these six record fields, the output would be an CSV format file something like:
[tt]
"y","0.1062","alternative hypothesis: True cdf is not the normal distn. with estimated parameters","0.7940168","0.8786753"
[/tt]

The user would have to make certain that the file did not contain a comma (likely in the description field), but a find and replace with a semicolon prior to converting should handle that issue.

I believe you could then directly import the file into Excell or Access.

No? It seems to me a more flexible plan of attack, and would allow the use of charting and other features of the Office applications.

tsuji · Nov 1, 2004

Hi bcastner,

If the data block are structured as shown, parse it to a comma-delimited is relatively easy for the 1st and 2nd lines. The 3rd & 4th serve as description headers and we can tolerate easily as another entries. The most annoying is the 5th and 6th lines where header and data are presented column-wise.

If it is not because of 5th and 6th lines' structure, we can easily transform it to a comma-delimited or alike ready for further office processing.

But having said, transforming fully data block can be done relatively easily given 5th are always the same and 6th line's data are space- or tab-delimited.

So I totally agree with you.

I would say it's only a matter of "will" and "utility". I extract a single useful data the poster desired to have for quick (and hopefully not totally dirty) development/analysis---as you know, this kind of output coming from simulations for statistical analysis can be quick to be discarded in favor of one hypothesis than another. So an extract for quick analysis makes some sense. I don't know...

regards - tsuji

bcastner · Nov 1, 2004

tsuji,

That makes sense.

I do not know if these values are tab delimited or the output file is using spaces:

mean of x standard deviation of x
0.7940168 0.8786753

The delimiter could be either as you suggest.

devarshi77 · Nov 1, 2004

Hi Tsuji and Bcastner
thanks for initiating it. i was stuck!! anyways the thing is that i tried the scan syntax in Splus 2000 and that hopefully should work. the problem is that i have scant knowledge about programmin and so this dilemma!!

see the syntax in S plus 2000 looks like this (im sure u would have seen somethin like this)...

------------
scan(file="", what=numeric(), n=<<see below>>, sep=<<see below>>,
multi.line=F, flush=F, append=F, skip=0, widths=NULL,
strip.white=<<see below>>)

--------------------

is it possible to read the p-values somehow and then apply some conditional statement which would give us the all the p values less than 0.05 (the significance level) as an output in any file format...
and one more thin as i dont have C i could not work on the code u turned in...im hopin it will work!!

best
dev

devarshi77 · Nov 1, 2004

yes the simulation gets stored as an output in the form of a text file.

take care

bcastner · Nov 1, 2004

The only suggestion I can make is that to avoid this in the output file:

y,0.1062,"alternative hypothesis: True cdf is not the normal distn. with estimated parameters","0.7940168",0.8786753

The comma should be the delimter, and all values should be passed in "" marks. A mixed input file for CSV just does not work.

As in my earlier comment:

"y","0.1062","alternative hypothesis: True cdf is not the normal distn. with estimated parameters","0.7940168","0.8786753"

Then this is an importable CSV. You decide the column types in Excel, or the field types in Access.

tsuji · Nov 1, 2004

devarshi77,

If you can tell what final form of filtered (with p<0.05) data you want to squeeze out of the output file of 1000+ blocks or smaller, I can help you get it via special made script. Are you willing to try?

In the previous script, it can already be done by modifying this segment.

Code:

for each match in matches
    snum=trim(replace(replace(match,"p-value",""),"=",""))
    [blue]if cdbl(snum)<0.05 then[/blue]
        if sout<>"" then sout=sout & vbcrlf & snum else sout=snum
    [blue]end if[/blue]
next

Only that the output text file contains p-value only, no other data. If you are happy with it, the script is already done. Try it.

- tsuji

devarshi77 · Nov 2, 2004

tsuji
i am willin to try it. the output should look like this:

"The null hypothesis was rejected n times"

or

" the power of the test is "

based on the condition that whenever p value is less than 0.05, the null hyp gets rejected. actually the aim of the excercise is to calculate the power of the test.

that would be 'total nos of null hyp rejected/ total no. of runs'

if the null hyp is rejected say 109 times out of 1000 simulations then the power would be 0.109.

low power implies the test aint any good and thats what im workin on!!!!

well tsuji is that a c code that you turned in?
and is it possible to import it in R or S plus and run it...coz i dont have any access to C...

best n thanks a ton

dev

devarshi77 · Nov 2, 2004

and ya i could import it in excel and sorted the p-values in one column.
is there any macro that would calculate the power of the test in excel?
any inputs will be deeply appreciated!!

best

bcastner · Nov 2, 2004

http://perso.club-internet.fr/hdelboy/download_software.html

There are several commmercial add-ins for Excel with fully functional trial versions. E.g.

http://www.analyse-it.com/kolmogorov-smirnov-normality-test_y.htm

http://www.nag.co.uk/stats/ae/af_functions.asp

Do a Google search on "excel Kolmogorov-Smirnov Test"

tsuji · Nov 3, 2004

devarshi77,

The script will in large part accomplish what needed. The only factor I am worried is that you have not run it to feedback what is lacking.

This is the revised script to produce the report.

Code:

const plimit=0.05    'This is the reject/accept limit

infilespec="d:\test\abc_data.txt"    'edit to point to your data file

set fso=createobject("scripting.filesystemobject")
if not fso.fileexists(infilespec) then
	set fso=nothing : wscript.echo "Data file not exist. Operation aborted."
	wscript.quit 9
end if

outfilespec=fso.getparentfoldername(infilespec) & "\" & _
	fso.getbasename(infilespec) & "_summary"
if fso.getextensionname(infilespec)<>"" then
	outfilespec=outfilespec & "." & fso.getextensionname(infilespec)
end if
	
sdata=fso.opentextfile(infilespec,1,false).readall

spattern="\bp-value(\s*)=(\s*)[0-9]{1,}(.[0-9]{1,}|[0-9]*)\s"
set regex=new regexp
with regex
	.pattern=spattern
	.ignorecase=true
	.global=true
end with
sout=""
set matches=regex.execute(sdata)
isamsize=matches.count
isam=0 : ireject=0
for each match in matches
	isam=isam+1
	snum=trim(replace(replace(match,"p-value",""),"=",""))
	snum=trim(replace(snum,chr(13),""))
	if ccur(snum)<plimit then	'reject criteria
		sout=sout & isam & vbtab & snum & vbcrlf
		ireject=ireject+1
	else
		sout=sout & isam & vbtab & vbtab & snum & vbcrlf
	end if
next
set regex=nothing

if isam<>0 then power=ireject/isam else power="n/a"
sout="The power of the test : " & power & vbcrlf & vbcrlf & sout
sout="The null hypothesis was rejected : " & ireject & vbcrlf & sout
sout="Sample size : " & isam & vbcrlf & sout

set ots=fso.opentextfile(outfilespec,2,true)
ots.write sout
ots.close : set ots=nothing
set fso=nothing

If your data file produced by the statistical package is :
[tt] d:\test\abc_data.txt[/tt]
The script will produce automatically the summary file :
[tt] d:\test\anc_data_summary.txt[/tt]
with the structure like this:
[tt]
Sample size : 1500
The null hypothesis was rejected : 300
The power of the test : 0.2

1 0.5232
2 0.04999
3 0.015
4 0.5
5 etc...
[/tt]
- tsuji

devarshi77 · Nov 3, 2004

tjusi

thanks for the script but will it run on VB opened from excel?

devarshi

tsuji · Nov 3, 2004

devarshi77,

Not without further effort.

- tsuji

devarshi77 · Nov 4, 2004

tjusi

thnks for all you have done. i will run it n let u know about it!!

best

devarshi

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

scannin values from within a file 2

devarshi77

Programmer

bcastner

IS-IT--Management

tsuji

Technical User

bcastner

IS-IT--Management

tsuji

Technical User

bcastner

IS-IT--Management

devarshi77

Programmer

devarshi77

Programmer

bcastner

IS-IT--Management

tsuji

Technical User

devarshi77

Programmer

devarshi77

Programmer

bcastner

IS-IT--Management

tsuji

Technical User

devarshi77

Programmer

tsuji

Technical User

devarshi77

Programmer

Similar threads

Part and Inventory Search

Sponsor