## Theory Help

## Theory Help

(OP)

Hi All,

I'm having some difficulty getting my head around an "experiment" I'm trying to run. (This is just for some idea I have related to a thought experiment I'm working on and thought that I could generate a data set based on it, but I'm stuck on how to actually make it work).

The idea is based on Duffey & Saull's "Jar of Life" experiment, which has 25 white balls, and 5 black balls which are in a jar. Each of the 30 balls is drawn in succession. At the end you have a result that would look something like:

WWWWWBWWWWBWBWWWWWBBWWWWWWWWBW

Where W = W=white and B = black.

SO this sequence is a random sequence every time, but there will always be exactly 25 White and 5 Black (this is the part that is doing my head in).

I created a routine to randomize 0 and 1 (Where 0 represents white, and 1 represents black) This I will actually store in a text field in a table, C30 in length, so later I can just use SQL commands to pluck out the patterns that I want after I run about 1,000,000 iterations of the test.

So I can do something like:

I'll of course bury that into a loop that runs 1,000,000 iterations, but this is the idea.

The problem is, HOW do I get it to produce only 25 White outcomes and 5 black outcomes in an iteration???

Best Regards,

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

I'm having some difficulty getting my head around an "experiment" I'm trying to run. (This is just for some idea I have related to a thought experiment I'm working on and thought that I could generate a data set based on it, but I'm stuck on how to actually make it work).

The idea is based on Duffey & Saull's "Jar of Life" experiment, which has 25 white balls, and 5 black balls which are in a jar. Each of the 30 balls is drawn in succession. At the end you have a result that would look something like:

WWWWWBWWWWBWBWWWWWBBWWWWWWWWBW

Where W = W=white and B = black.

SO this sequence is a random sequence every time, but there will always be exactly 25 White and 5 Black (this is the part that is doing my head in).

I created a routine to randomize 0 and 1 (Where 0 represents white, and 1 represents black) This I will actually store in a text field in a table, C30 in length, so later I can just use SQL commands to pluck out the patterns that I want after I run about 1,000,000 iterations of the test.

So I can do something like:

#### CODE

White = 0 Black = 1 APPEND BLANK FOR X = 1 to 30 REPLACE SEQUENCE WITH ALLTRIM(SEQUENCE)+ALLTRIM(STR((INT((White - BLACK + 1) * RAND( ) + Black)) ENDFOR

I'll of course bury that into a loop that runs 1,000,000 iterations, but this is the idea.

The problem is, HOW do I get it to produce only 25 White outcomes and 5 black outcomes in an iteration???

Best Regards,

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

## RE: Theory Help

You may additionally want to check the occurrences of "W" and "B" in your sequence

## CODE -->

hth

MK

## RE: Theory Help

I explained the W and B for simplicty as that's the terms used by creator, but I just want to use 0 and 1 as it will be much simpler to deal with. This way I just have one value, and I don't have to do any converting. So a 0 will represent White and a 1 will represent Black.

If I REALLY wanted to after all the sequences were run, I could always do an STRTRAN to convert 0 to W and 1 to B, but really not necessary.

So the issue I still have is, how do I do the draw but ensure that only 25 0 and 5 1 are always produced?

Best Regards,

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

## RE: Theory Help

But just off the top of my head, how about something like this:

1. Create a 2 col. x 30 row array.

2. Populate the first column with your 25 zeroes and 5 ones.

3. Populate the second column with a random number.

4. ASORT the array on the second column.

5. Concatenate the values in the first column - in the new sequence - into your string.

6. Repeat one million time.

Mike

__________________________________

Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

## RE: Theory Help

Please have a look at the code below

## CODE -->

hth

MK

## RE: Theory Help

A quick code on this, by stuffing blues into whites:

## CODE --> VFP

## RE: Theory Help

You can also fill 25 collection elementsĀ“, 20 with "0" and five with "1" and then draw from that like from a hat. That guarantees the amount you need.

## CODE

Notice, if you generate these combinations randomly, there is a likelyhood you generate the same sequence twice. 5 bits set in 30 bits has about 17 million possible combinations. (30 over 5)

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

## RE: Theory Help

## CODE -->

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

## RE: Theory Help

## CODE

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

## RE: Theory Help

I did it just based on a simple string not with 25 columns though.

## CODE --> Foxpro

It's based on the fact, that in the first lane there are 25 Ws and then we place random Bs until there are 5 Bs in the string.

As I just noticed, it is more or less the string based version of Olafs code :)

-Tom

https://www.blogger.com/profile/089031659767875220...

## RE: Theory Help

And since the string length then grows from 25 to 30 you also need to compute int(rand()*len(lcString)+1, don't forget the +1. stuff works with start position =0, but then does the same as with start position 1. You are allowed to go to position length+1 to add the new string at the end, so you actually need int(rand()*(len(lcString)+1))+1.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

## RE: Theory Help

So changing the code to

## CODE -->

should be enough.

-Tom

https://www.blogger.com/profile/089031659767875220...

## RE: Theory Help

Update: corrected a minor error. I forgot that Bittest() is 0-based.

## RE: Theory Help

Regards

Griff

Keep ing

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.## RE: Theory Help

## CODE --> VFP

## RE: Theory Help

well, ASORT would also sort the "0" and "1", but just like a cursor sorted by INDEX ON Rand() you can use that universally no matter what characters or bits you want in which combination and the count will match.

Chances you create a 30 bit number with 5 bits set are about 17 million: 2^30 and that is quite bad, as high as 17 million sounds, 2^30 is ~1 billion. So it comes down to 2% only, so you'd generate 50 numbers in average before you get one matching the conditions. It's not that bad as the other solutions also generate 30 random numbers, but it gets far worse with other conditions, eg there only are 30 combinations with just 1 bit set in 2^30. Hit or miss is almost always a miss. Same as 29 bits set = 1 bit unset. But these cases are simple just computing 1 position.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

## RE: Theory Help

Regards

Griff

Keep ing

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.## RE: Theory Help

Mike

__________________________________

Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads

## RE: Theory Help

I wasn't expecting there to be SO many amazing answers to this! I posted this up about 12 hours ago, and then went off forgetting about it for a while, and just came back to Tek-Tips and found all these replies!

First, thanks to all, totally cool, and I love how there are so many approaches to this.

I will try several of them, and see what comes up from this.

Just to mention some context, I'm looking at simulating "failure" scenarios, and wanted to look at the stochastic distribution of failures. Some of you may know, I'm in the thesis phase of my doctorate, and I had this crazy idea last night, and tried to model it, just to get a glimpse of what the result would look like. My "thinking" is major catastrophe are represented by not the appearance of the black, but the appearance of all 5 black in a row. But I'm going to be testing this theory, so the number of white and number of black may change, until I can find a typical occurrence. I just wanted to start with a million iterations as a point to see how often this will occur.

When I win the Nobel Prize, I'll make sure to thank everyone here in the acceptance speech. ><

Best Regards,

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

## RE: Theory Help

The 1,000,000 iterations only take about 20 seconds to run using Tom Borgamanns very elegant solution.

Thought you guys might like to know, of the first test set of 1,000,000 there were 187 were all 5 occur in a row.

But a quick look at the result set shows this may not be giving me the randomness I need. One interesting thing is the occurrence of 5 1's in a row at the start appeared 17 times, but the appearance of all 5 1's at the end occurs 0 times.

So that's the first test, but I can see this isn't giving me a stochastic distribution. So next, I think I'll give one of Olaf's solutions a try.

Best Regards,

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

## RE: Theory Help

I tried my code for 1,000,000 iterations, it runs in my computer for aproximattely 8 seconds (including storing the results in a cursor).

These are my benchmarks for you to compare against your findings (I ran the generator 4 times, marked from R1 to R4)

## CODE

## RE: Theory Help

But of course, there are only 26 combinations with all 1s in one rows, out of the 17,100,720. As you rarelyget them even in a million tries there will be a bulking, you just have to do more to get that evenly. Up to simply creating all combinations.

And Tom, I thought of changing the way to stuff, you can start with the 25 "O"s:

## CODE

## CODE

So:

1. Avoid using position 0, your random number result should be 1 at minimum.

2. go up to LEN(lcString)+1 to also add a 1 at the right end

3. notice RAND() never is 1, it's always <1. Besides that INT always rounds down, so INT(RAND()*5) will create numbers 0 to 4.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

## RE: Theory Help

So the next update.

I was running Olaf's Second solution, and it was interesting... after a few minutes, I killed it, as it was still running. Wasn't sure why. So I put an update in to show how often 1,000 results were produced MOD(X,1000) = 0 and for a while it was about 1,000 every 6 seconds. I got tired, let it run while I was asleep, and 8 hours later, it was at 314,000. I watched, and the time to increase from 314,000 to 315,000 was about 5 minutes! So it gets slower and slower and slower as it runs. So I decided to terminate that one, and use another approach. So that was amusing.

I'm now testing Olaf's latest approach, and I'm going to re-run Tom's original approach with 17,000,000 iterations and see if I get any 11111 at the end...

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

## RE: Theory Help

I can now identify that there are 142,506 possible combinations. 1,000,000 iterations doesn't result in all the combinations, but gets close. But 2 million iterations will.

So this helps with my first phase, which is to get this to work. And then I need to determine the number of variables involved, versus number of failure opportunities. THEN I'm on to something.

Many thanks everyone.

Mike Lewis,

I'm still going to run your test as well. I quite like that idea.

I think it may be faster.

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

## RE: Theory Help

Sequences of 11111 leading occurred 306 times in 17 million iterations, but ending in 11111 resulted in 0 outcomes, and the total distribution was 118,755 unique outcomes, and we know there are 142,506 possible results, so that solution isn't a good random distribution.

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

## RE: Theory Help

I don't think it has to do with Tom's solution but with the way he implemented the RAND() function. Nevertheless there remain chances that duplicates occur. Please read below form Hacker's Guide to VFP 7.0

If you want you may delete the multiples (code based on Tom's solution)

## CODE -->

hth

MK

## RE: Theory Help

142,506 is a number resulting from combinations calculators, yes. You have to divide by 5!=5*4*3*2*1=120 as that is the number of different sequences of the 5 bits.

And sure, using a collection to compute a combination is slower. My cursor solution would be faster.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

## RE: Theory Help

I don't see an evident pattern biasing the results one way or another, but having a fast generator will provide more data to analyze.

Code + more 4 runs stats:

## CODE --> VFP

## RE: Theory Help

You got the important one... that it should generate 142506 in 2,000,000 iterations, which is the maximum number of unique results. I see your last one is missing 1, and I agree that's within a reasonable "variation" for only 2,000,000. In 17 million, as Olaf mentiones 2^30, it should always provide the full range. I see the frequency of leading 11111 and trailing 11111 is reasonably the same as well, so I'm comfortable/confident that the randomness is reasonable. That's the critical point here.

My next step is to determine the number of factors that result in a major outage, and the frequency of events before an outage occurs. That frequency will then be represented by the 0's, and the number of incidents leading to an outage will be the 1's. Then I can change the 25 0's and the 5 1's and begin to predict some data about the outages (human errors) I'm looking to model and predict.

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

## RE: Theory Help

mjcmkrsr quoted about the random number generator. You can seed it in a way you always get a repeatable random sequence:

## CODE

So this is a typical random number generator that can only be randomized. RAND(-1) makes the seed depend on system time, so do that once and you get different sequences, but only call RAND(-1) once with that -1 seed. System time does not change that fast, so it's also no good idea to always call RAND(-1), this is what happens then:

## CODE

For certain scientific experiments with real randomness needed there are specific hardware devices for that, TRNG. No programming language I know has a true random number generator. See https://www.random.org/randomness/

It's good enough if you only use it for generating 0s and 1s. It might lack to ever create a longer sequence of same bits as a TRNG would do, albeit as seldom as that happens. The characteristic of PRNG is to use integer arithmetic which covers a whole range of integer, say 2 billion 32bit numbers and then repeats. but the aftermath of converting that to a number range 0..1 and then taking only a few values like 1-6 or just 0s and 1s you can forget that nature, it's random enough.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

## RE: Theory Help

Yes, I was aware of the non-randomness of the RAND() function, and particularly TRNG not being software based, but for what I need, this will do fine. The idea here is to examine generally speaking how often the 5 1's would actually "clump" together. My "test" to see if it was giving a reasonable distribution was to look at leading and trailing clumps, and Tom's solution made it clear that it was not giving a distribution that would be reasonable. (It in fact, never comes up with all 5 1's at the end, even when I ran 17 million iterations against it).

I'm not sure why, but I moved on with the other solutions to see if I could get a better result, and I did.

Ironically, part of my simulation is about human error, which isn't "truly random" either. And we can even be quite predictable about it. For instance, any high skill task requires someone with high skill at that task. I can assure you that despite having read recently a great book on brain surgery, I would still have a 100% failure rate should I try to perform brain surgery on you or one of your friends at your choosing.

I'm working to assign particular "elements" to the 1's now, and working to determine how many of these I need to assign to progress my theory. So one may represent skill, another attitude, another concentration, another involuntary reaction, etc. Then when they align... we have big failures. BUt I need to work out "opportunities" which represents the 0's. I'm not sure yet if 25 is enough. I may need to increase it.

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

## RE: Theory Help

Has been done already. Not on me, not on a close friend or family member, but from a brain surgeon.

Tom's first solution has that error of choosing too low range of random numbers, to get a 1 at the right end you have to allows length+1 for the position of stuff, that's all. I had written about that.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

## RE: Theory Help

Hahahaha... yes, lots of brain surgery done, but not by people who just "read about it".

I hope it turned out good.

Sorry I missed that point about the stuff+1.

I will modify it and try again, as it was the fastest method.

Scott

MSc ISM, MIET, MASHRAE, CDCP, CDCS, CDCE, CTDC, CTIA, ATS

"Everything should be made as simple as possible, and no simpler."

## RE: Theory Help

Not entering in racing mode, but mine, Tom's and Olaf's methods are variations of the same method.

In fact, Tom's would be the slowest since it admits uneffective STUFF() calls, that is, STUFF() in which a blue goes over a blue. That adds about 7% to calculations (roughly the number of innocuous calls).

STUFF() to insert seems also to be a tiny bit slower than STUFF() to replace, so starting with a 30-character string would be marginally faster than with a 25-character string.

Finally, using variables instead of constants adds time to each run of the experiment (but using variables is the base of experimenting, right?). Change from variables to constants in code, and the gain of speed may be in the order of one third.

## RE: Theory Help

It's already done by Tom, atlopes and me in variations of Tom's original idea.

If you're after performance you'd not use string operations, STUFF does not operate on the memory of an existing string, also not when used to replace instead of insert.

You'd need to use SYS(2600) with 30 bytes allocated or better create a DLL in C++.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name