×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

stronger correlation formula

stronger correlation formula

stronger correlation formula

(OP)
If I have two series of numbers, series A contains either 1s or 0s, depending on if a patient took a pill or not. Series B contains random numbers. All of the series B numbers that coincide with the patient taking a pill have an average of 100, whereas those that coincide with NOT taking a pill average to 101. There is a HUGE amount of data, so I am trying to find the formula that will show that there is a strong correlation between the two - that if the patient takes the pill, the most likely result is that their B measurement will go up by 1 point. A standard correlative coefficient shows a low correlation... around .15. Any help would be greatly appreciated.

RE: stronger correlation formula

This is more of a statistical test than data mining.  Actually, a paired t-test. You have two sets of data, one with pill, one without.  The null hypothesis is that the two data sets are identical.  Then you (likely) disprove the hypothesis with (say) 90% certainty using the t-test.  

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

RE: stronger correlation formula

Correction - this is not a paired t-test, it is an unpaired t-test.  

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

RE: stronger correlation formula

johnherman, it would have been better had the study been designed properly to allow a paired t-test. The usual strategy is to give the product or a placebo to everyone at day 1, do the analysis, wait until you are certain the effects of the product have gone, then give product to all the former placebo people, and the placebo to all the product people. This way you have paired measurements, and the analysis gets much more sensitive.

RE: stronger correlation formula

(OP)
what formula would you enter to get the kind of result I'm looking for?

RE: stronger correlation formula

Data Mining is used to find "unknown" trends and relationships in the data.  You have a hypothesis regarding a relation in the data and are seeking to prove or disprove it, or, in other words, determine the degree of confidence in which the data supports your hypothesis.  I would venture to guess that every statistical package on the market supports t-test.   

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

RE: stronger correlation formula

"If I have two series of numbers, series A contains either 1s or 0s, depending on if a patient took a pill or not. Series B contains random numbers. All of the series B numbers that coincide with the patient taking a pill have an average of 100, whereas those that coincide with NOT taking a pill average to 101. There is a HUGE amount of data, so I am trying to find the formula that will show that there is a strong correlation between the two - that if the patient takes the pill, the most likely result is that their B measurement will go up by 1 point. A standard correlative coefficient shows a low correlation... around .15. Any help would be greatly appreciated."

The most commonly used correlation measure(Pearson's correlation) is not well-suited to this problem.  I will suggest that you measure two things:

1. Magnitude:  The difference between the mean of variable B for variable A 0s and for variable A 1s.

and...

2. Significance: Try a t-test or bootstrap to establish that the difference between the two means is unlikely to be zero.


 

RE: stronger correlation formula

(OP)
Right, but how would you write this as a formula in a spreadsheet format?

RE: stronger correlation formula

It's probably not worth the effort to write your own t-test within Excel.  It's been done and statistical packages are relatively cheap. Some stat packages might have Excel compatibility or plug-ins.  Good Luck

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close