## Seek help on Generating multiple 8 base-pair long dependent motif

## Seek help on Generating multiple 8 base-pair long dependent motif

(OP)

Hi all!

I am going to generating multiple 8 base-pair long dependent motif binding sites using Perl.

The thread is:

(1)First,randomly select 4 pairs of nucleotides as the group A.

For example, we randomly select 4 pairs of nucleotides, “CA”, “TG”, “CG”, and “TC” from 16 combinations of two nucleoties :

AA,AC,AG,AT,

CA,CC,CG,CT,

GA,GC,GG,GT,

TA,TC,TG,TT

Then randomly select another 4 pairs of nucleotides, for instance, like “AA”,”CA”,”CT” and “TT” as the group B.

Any pair in group A should be different from the one in group B.

(2)To create the motif binding site. Choose “CA”, the first pair from the group A with 85% probability or “AA” from the group B with 15% probability to occupy the 1st position in the binding site, which means that each pair in the group A will be the dominant pair compared to the one in the group B. The sum of probabilities of corresponding pairs in group A and B will be 1.00 (0.85+0.15=1.00).

Repeat this for the next 3 positions and put 4 pairs of nucleotides together, we will end up with a random binding motif like:

CA --- 1st position

CC --- 2nd position

CG --- 3rd position

TC --- 4th position

CACCCGTC----the 8 bp long binding motif site

We also can form another group A like GG,GT,AT,CG and group

B like GA,AA,AC,CC.

Then we can get a random motif like: GGAAATCG

(3)Repeat this to make another 99 binding motif sites.

I had experience of generating 8 base-pair long indepent motif binding sites with the probability of dominant nucleotide as 0.85. But I don't know how to generate two groups of pairs of nucleotides which are not identical. Thank you very much for suggestions!

Alex

The previous code is:

## $len = length of the sequence - 1

$len = 2;

## $ar[i] letter in position i with probability $ap[i]

## other letters have the same probability (in our case 0.05)

## you have to build the arrays

$ar[0] = "T";

$ap[0] = 0.85;

$ar[1] = "C";

$ap[1] = 0.85;

$ar[2] = "A";

$ap[2] = 0.85;

for ($i=0;$i<=$len;$i++) {

$ar_other[$i][0] = "A";

$ar_other[$i][1] = "C";

$ar_other[$i][2] = "G";

if ($ar[$i] eq "A") {

$ar_other[$i][0] = "T";

} elsif ($ar[$i] eq "C") {

$ar_other[$i][1] = "T";

} elsif ($ar[$i] eq "G") {

$ar_other[$i][2] = "T";

}

$p = rand(1);

#print "$p";

if ($p<(1-$ap[$i])) {

$frac = (1-$ap[$i])/3;

if ($p<(1-$ap[$i]-(2*$frac))) {

$ran_seq[$i] = $ar_other[$i][0];

} elsif ($p<(1-$ap[$i]-$frac)) {

$ran_seq[$i] = $ar_other[$i][1];

} else {

$ran_seq[$i] = $ar_other[$i][2];

}

} else {

$ran_seq[$i] = $ar[$i];

}

}

## you have your sequence in the array @ran_seq

for ($i=0;$i<=$len;$i++) {

print "$ran_seq[$i] ";

}

print "\n";

A overly longer code is:

#!/usr/bin/perl

# randommotifs.pl

#

# This program generates randomized instances of pseudomotifs, i.e.

# sequences that are obtained by adding "noise", according to a

specified

# model to a consensus sequence. To keep this reasonably general, we

# proceed through the following steps:

#

# Define consensus motif and alphabet

# For the required number of repetitions

# For each position in motif

# Define an appropriate probability distribution over the

alphabet

# Produce a character according to the distribution

# ...

# ...

#

# The probability distributions are generated to produce a consensus

character

# with a particular "consensus probability" and all other characters

# with equal probabilites (the noise). This is defined in a single

subroutine

# so it is straightforward to change this for a different model of noise

# (e.g. increase the probabilities for "similar" characters). It's easy

# to become creative here and incorporate (biological) knowledge into

the

# noise-model.

#

# Probablities are then converted into cumulative intervals to chose a

# particular character:

#

# Eg: Probabilities A: 0.4, C: 0.3, G: 0.2, T: 0.1

# Intervals A: 0.4 C: 0.7, G: 0.9, T: 1.0

#

# This example can be pictured as

#

# 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

# +----+----+----+----+----+----

+----+----+----+----+

# | A | C | G | T |

# +----+----+----+----+----+----+----+----+----+----+

#

# .. now all we need to do is generate a random number between 0 and 1

# and see in which interval it falls. This then points to the character

to be

# written to output.

#

# B. Steipe, July 2005. Public domain code.

# =====================================================================

use strict;

use warnings;

my $N_Sequences = 100; # Number of sequences to be

generated

my @Motif = split(//,'TTGACATATAAT'); # This example is simply a

concatenation

# of the -35 and -10 consensus

elements

# of prokaryotic promoters;

replace with

# whatever ...

my @Alphabet = split(//,'ACGT'); # Nucleotides; replace with

whatever ...

my $P_Consensus = 0.85; # Replace with whatever ...

# ====== Globals ==========================

my @Probabilities; # Stores the probability of each

character

# ====== Program ==========================

for (my $i=0; $i < $N_Sequences; $i++) {

for (my $j=0; $j < scalar(@Motif); $j++) {

loadConsensusCharacter($Motif[$j]); # Load the character for

this position

addNoiseToDistribution(); # Modify probabilities

according to noise model

convertToIntervals();

print(getRandomCharacter(rand(1.0))); # Output character

}

print "\n";

}

exit();

# ====== Subroutines =======================

#

sub loadConsensusCharacter {

my ($char) = @_;

my $Found = 'FALSE';

for (my $i=0; $i < scalar(@Alphabet); $i++) {

if ( $char eq $Alphabet[$i]) { # found character in i_th

position in Alphabet

$Probabilities[$i] = 1.0;

$Found = 'TRUE';

} else {

$Probabilities[$i] = 0.0;

}

}

if ($Found eq 'FALSE') {

die("Panic: Motif-Character\"$char\" was not found in Alphabet.

Aborting.\n");

}

return();

}

# ==========================================

sub addNoiseToDistribution {

# This implements a very simple noise model:

# We work on an array in which one element is 1.0 and

# all others are 0.0. The element with 1.0 has the same

# index as the consensus character in the Alphabet.

#

# We change that value to the consensus probability and

# we distribute the remaining probability equally among

# all other elements.

#

# It should be straightforward to implement more elaborate

# noise models, or use a position specific scoring matrix

# or something else here.

my $P_NonConsensus = ( 1.0-$P_Consensus) / (scalar(@Alphabet) - 1);

for (my $i=0; $i < scalar(@Probabilities); $i++) {

if ( $Probabilities[$i] == 1.0 ) { # ... this is the consensus

character

$Probabilities[$i] = $P_Consensus;

} else {

$Probabilities[$i] = $P_NonConsensus;

}

}

return();

}

# ==========================================

sub convertToIntervals {

my $Sum = 0;

# Convert the values of the probabilities array to the top

boundaries

# of intervals having widths proportional to their relative

frequency.

# Numbers range from 0 to 1.0 ...

for (my $i=1; $i < scalar(@Probabilities); $i++) {

$Probabilities[$i] += $Probabilities[$i-1];

}

return();

}

# ==========================================

sub getRandomCharacter {

my ($RandomNumber) = @_;

my $i=0;

# Test which interval contains the RandomNumber ...

# The index of the interval points to the Event we should choose.

for ($i=0; $i < scalar(@Probabilities); $i++) {

if ($Probabilities[$i] > $RandomNumber) { last; }

}

return($Alphabet[$i]);

}

I am going to generating multiple 8 base-pair long dependent motif binding sites using Perl.

The thread is:

(1)First,randomly select 4 pairs of nucleotides as the group A.

For example, we randomly select 4 pairs of nucleotides, “CA”, “TG”, “CG”, and “TC” from 16 combinations of two nucleoties :

AA,AC,AG,AT,

CA,CC,CG,CT,

GA,GC,GG,GT,

TA,TC,TG,TT

Then randomly select another 4 pairs of nucleotides, for instance, like “AA”,”CA”,”CT” and “TT” as the group B.

Any pair in group A should be different from the one in group B.

(2)To create the motif binding site. Choose “CA”, the first pair from the group A with 85% probability or “AA” from the group B with 15% probability to occupy the 1st position in the binding site, which means that each pair in the group A will be the dominant pair compared to the one in the group B. The sum of probabilities of corresponding pairs in group A and B will be 1.00 (0.85+0.15=1.00).

Repeat this for the next 3 positions and put 4 pairs of nucleotides together, we will end up with a random binding motif like:

CA --- 1st position

CC --- 2nd position

CG --- 3rd position

TC --- 4th position

CACCCGTC----the 8 bp long binding motif site

We also can form another group A like GG,GT,AT,CG and group

B like GA,AA,AC,CC.

Then we can get a random motif like: GGAAATCG

(3)Repeat this to make another 99 binding motif sites.

I had experience of generating 8 base-pair long indepent motif binding sites with the probability of dominant nucleotide as 0.85. But I don't know how to generate two groups of pairs of nucleotides which are not identical. Thank you very much for suggestions!

Alex

The previous code is:

## $len = length of the sequence - 1

$len = 2;

## $ar[i] letter in position i with probability $ap[i]

## other letters have the same probability (in our case 0.05)

## you have to build the arrays

$ar[0] = "T";

$ap[0] = 0.85;

$ar[1] = "C";

$ap[1] = 0.85;

$ar[2] = "A";

$ap[2] = 0.85;

for ($i=0;$i<=$len;$i++) {

$ar_other[$i][0] = "A";

$ar_other[$i][1] = "C";

$ar_other[$i][2] = "G";

if ($ar[$i] eq "A") {

$ar_other[$i][0] = "T";

} elsif ($ar[$i] eq "C") {

$ar_other[$i][1] = "T";

} elsif ($ar[$i] eq "G") {

$ar_other[$i][2] = "T";

}

$p = rand(1);

#print "$p";

if ($p<(1-$ap[$i])) {

$frac = (1-$ap[$i])/3;

if ($p<(1-$ap[$i]-(2*$frac))) {

$ran_seq[$i] = $ar_other[$i][0];

} elsif ($p<(1-$ap[$i]-$frac)) {

$ran_seq[$i] = $ar_other[$i][1];

} else {

$ran_seq[$i] = $ar_other[$i][2];

}

} else {

$ran_seq[$i] = $ar[$i];

}

}

## you have your sequence in the array @ran_seq

for ($i=0;$i<=$len;$i++) {

print "$ran_seq[$i] ";

}

print "\n";

A overly longer code is:

#!/usr/bin/perl

# randommotifs.pl

#

# This program generates randomized instances of pseudomotifs, i.e.

# sequences that are obtained by adding "noise", according to a

specified

# model to a consensus sequence. To keep this reasonably general, we

# proceed through the following steps:

#

# Define consensus motif and alphabet

# For the required number of repetitions

# For each position in motif

# Define an appropriate probability distribution over the

alphabet

# Produce a character according to the distribution

# ...

# ...

#

# The probability distributions are generated to produce a consensus

character

# with a particular "consensus probability" and all other characters

# with equal probabilites (the noise). This is defined in a single

subroutine

# so it is straightforward to change this for a different model of noise

# (e.g. increase the probabilities for "similar" characters). It's easy

# to become creative here and incorporate (biological) knowledge into

the

# noise-model.

#

# Probablities are then converted into cumulative intervals to chose a

# particular character:

#

# Eg: Probabilities A: 0.4, C: 0.3, G: 0.2, T: 0.1

# Intervals A: 0.4 C: 0.7, G: 0.9, T: 1.0

#

# This example can be pictured as

#

# 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

# +----+----+----+----+----+----

+----+----+----+----+

# | A | C | G | T |

# +----+----+----+----+----+----+----+----+----+----+

#

# .. now all we need to do is generate a random number between 0 and 1

# and see in which interval it falls. This then points to the character

to be

# written to output.

#

# B. Steipe, July 2005. Public domain code.

# =====================================================================

use strict;

use warnings;

my $N_Sequences = 100; # Number of sequences to be

generated

my @Motif = split(//,'TTGACATATAAT'); # This example is simply a

concatenation

# of the -35 and -10 consensus

elements

# of prokaryotic promoters;

replace with

# whatever ...

my @Alphabet = split(//,'ACGT'); # Nucleotides; replace with

whatever ...

my $P_Consensus = 0.85; # Replace with whatever ...

# ====== Globals ==========================

my @Probabilities; # Stores the probability of each

character

# ====== Program ==========================

for (my $i=0; $i < $N_Sequences; $i++) {

for (my $j=0; $j < scalar(@Motif); $j++) {

loadConsensusCharacter($Motif[$j]); # Load the character for

this position

addNoiseToDistribution(); # Modify probabilities

according to noise model

convertToIntervals();

print(getRandomCharacter(rand(1.0))); # Output character

}

print "\n";

}

exit();

# ====== Subroutines =======================

#

sub loadConsensusCharacter {

my ($char) = @_;

my $Found = 'FALSE';

for (my $i=0; $i < scalar(@Alphabet); $i++) {

if ( $char eq $Alphabet[$i]) { # found character in i_th

position in Alphabet

$Probabilities[$i] = 1.0;

$Found = 'TRUE';

} else {

$Probabilities[$i] = 0.0;

}

}

if ($Found eq 'FALSE') {

die("Panic: Motif-Character\"$char\" was not found in Alphabet.

Aborting.\n");

}

return();

}

# ==========================================

sub addNoiseToDistribution {

# This implements a very simple noise model:

# We work on an array in which one element is 1.0 and

# all others are 0.0. The element with 1.0 has the same

# index as the consensus character in the Alphabet.

#

# We change that value to the consensus probability and

# we distribute the remaining probability equally among

# all other elements.

#

# It should be straightforward to implement more elaborate

# noise models, or use a position specific scoring matrix

# or something else here.

my $P_NonConsensus = ( 1.0-$P_Consensus) / (scalar(@Alphabet) - 1);

for (my $i=0; $i < scalar(@Probabilities); $i++) {

if ( $Probabilities[$i] == 1.0 ) { # ... this is the consensus

character

$Probabilities[$i] = $P_Consensus;

} else {

$Probabilities[$i] = $P_NonConsensus;

}

}

return();

}

# ==========================================

sub convertToIntervals {

my $Sum = 0;

# Convert the values of the probabilities array to the top

boundaries

# of intervals having widths proportional to their relative

frequency.

# Numbers range from 0 to 1.0 ...

for (my $i=1; $i < scalar(@Probabilities); $i++) {

$Probabilities[$i] += $Probabilities[$i-1];

}

return();

}

# ==========================================

sub getRandomCharacter {

my ($RandomNumber) = @_;

my $i=0;

# Test which interval contains the RandomNumber ...

# The index of the interval points to the Event we should choose.

for ($i=0; $i < scalar(@Probabilities); $i++) {

if ($Probabilities[$i] > $RandomNumber) { last; }

}

return($Alphabet[$i]);

}

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

## CODE

use strict;

my @fullset = qw(AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT);

my $probab = 0.85;

my $i;

my @major = (); # Cluster of 4 nucleotides (A)

my @minor = (); # Cluster of 4 nucleotides (B)

# Pick 4 random pairs

my @temp = @fullset;

for($i = 0; $i<4; ++$i) {

push @major, $temp[int(rand(scalar(@temp)))];

}

print "@major\n";

# Pick 4 random pairs (excluding matches to @major)

@temp = @fullset;

for($i = 0; $i<4; ++$i) {

my $index = int(rand(scalar(@temp)));

redo unless($major[$i] ne $temp[$index]);

push @minor, $temp[$index];

}

print "@minor\n";

# Generate resultant set based on previously defined probability

my @result = ();

for($i = 0; $i<4; ++$i) {

push @result, rand(1.0) > $probab ? $major[$i] : $minor[$i];

}

print "@result\n";

Trojan.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

Thank you very much first of all.

What I mentioned the probabilty of 0.85 of pair

in the group A (major group in your case)means

that it can be randomly selected 85 times out of

100 times vs. the pair in group B 15 times out of

100 times.

I found you assigned 0.85 to the $probab in the code.

Can you clarifying this ?

Thank you for help!

Alex

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

## CODE

use strict;

my @fullset = qw(AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT);

my $probab = 0.85;

my $i;

my @major = (); # Cluster of 4 nucleotides (A)

my @minor = (); # Cluster of 4 nucleotides (B)

# Pick 4 random pairs

my @temp = @fullset;

for($i = 0; $i<4; ++$i) {

my $index = int(rand(scalar(@temp)));

push @major, $temp[$index];

splice(@temp, $index, 1);

}

print "@major\n";

# Pick 4 random pairs (excluding matches to @major)

@temp = @fullset;

for($i = 0; $i<4; ++$i) {

my $index = int(rand(scalar(@temp)));

redo unless($major[$i] ne $temp[$index]);

push @minor, $temp[$index];

splice(@temp, $index, 1);

}

print "@minor\n";

# Generate resultant set based on previously defined probability

my @result = ();

for($i = 0; $i<4; ++$i) {

push @result, rand(1.0) > $probab ? $major[$i] : $minor[$i];

}

print "@result\n";

Trojan.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

The rand(1.0) will generate a random number between 0.0000 and 0.99999 (ish).

Therefore when my code says:

## CODE

This means that 85% of the time (generally) you get the major pair.

Hope that helps explain.

Trojan.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

"two pairs can be identical in one nucleotide"?

"one nucleotide" means "a position" in a binding site?

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

Can u show a generated motif here?

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

The "splice" that I added was to remove the chance of getting something like this:

AA CG TA CG

It removes the "CG" from the "@temp" list when it's been used so that it cannot be selected again.

I did not know if duplicate pairs were allowed so I gave you both options.

Trojan.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

(I hope I'm understanding this enough and not making a fool of myself here!!!)

Trojan.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

I would say "duplicate pairs" are not allowed

to appear.

We will have two groups A and B (major and minor here).

Each of them is different from another pair. So the ideal

result should never be like AA CG TA CG.

Regards,

Alex

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

The motif is in @result and therefore you may only need to print that.

Trojan.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

CAAGACAA

ACAGTTCA

CCGAATTC

GATACTAA

GTTTCACC

TCATCTCA

CCGCCGAT

TATTGTCA

ACGTAACG

TGAGGCTT

CAAATTAC

TCGACGGG

GGTCAGGT

ATAAGGCA

ACGGGCTG

CTCCTCGT

GTAATCAT

GACAAGAA

AACCGTGA

GTCCACGG

AAGCCGGG

CTTGTCCT

GTTCTTAC

AATGCCGG

TAGCAGGA

GTCTTGAG

AGATCTGA

ATTTGATG

TGTTGGCG

CACTGGTT

AGTGAAGG

CTATGATC

GCGATCTA

AATCCAAT

TGATCGCA

TTTGAGAA

TGGTATCT

TCTGGACG

AACCCTGA

TTCCACCT

CGCTCCTG

GACTAGGG

CCAAGCAT

CAACGTTA

TCCTTAAA

TCCCGCAT

TCGCCATT

AAGCTCTC

GTACCGTT

CCAATCCT

CAATCGTC

AGTCGCTA

TACACGAG

TTCGGTAT

ACCAATCG

TGGTGATA

ACCTATTG

TTCCGTAA

CGGCATCT

AAACATGA

CTTATAAT

CGAAGTGT

CCCCCATG

GACTAGAC

CAAAGTTA

AAGTAGTC

TTACGTCA

GACTGGAT

AATATTGG

CTGGGTAT

CTTCAGGT

TCTGTTGA

AAGTGCGT

ACCTTCGT

TAGAGACG

ACAGTCGA

AACGTCTT

GGCATGAT

TGGTTTAT

TTAACTAG

GTAATGGC

CTTGTCAT

TCTGCACA

ACTCAGAA

CTGACATG

ATTCAACC

TAGTCAAC

GACACGGT

AAGACTGC

ACGAGGCC

GTACGGTC

AGCCAGTG

CGTTGATC

CTTCAAAT

ACGTTAGA

CCAACGTC

TTCCGTGC

TTCGGTTC

TCCAATAC

And here's the final code that created it:

## CODE

use strict;

my @fullset = qw(AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT);

my $probab = 0.85;

for(my $j=0; $j<99; $j++) {

my $i;

my @major = (); # Cluster of 4 nucleotides (A)

my @minor = (); # Cluster of 4 nucleotides (B)

# Pick 4 random pairs

my @temp = @fullset;

for($i = 0; $i<4; ++$i) {

my $index = int(rand(scalar(@temp)));

push @major, $temp[$index];

splice(@temp, $index, 1);

}

#print "@major\n";

# Pick 4 random pairs (excluding matches to @major)

@temp = @fullset;

for($i = 0; $i<4; ++$i) {

my $index = int(rand(scalar(@temp)));

redo unless($major[$i] ne $temp[$index]);

push @minor, $temp[$index];

splice(@temp, $index, 1);

}

#print "@minor\n";

# Generate resultant set based on previously defined probability

my @result = ();

for($i = 0; $i<4; ++$i) {

push @result, rand(1.0) > $probab ? $major[$i] : $minor[$i];

}

print join("", @result), "\n";

}

Trojan.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

Trojan.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

Alex

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

Good luck.

Trojan.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

## CODE

use strict;

use Benchmark qw(:all);

my @fullset = qw(AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT);

my $probab = 0.85;

my $reps = 1000;

my $r = timethese($reps, {

'Kevin' => \&Kevin,

'Trojan' => \&Trojan,

});

cmpthese($r);

sub Kevin {

my %results = ();

for (0..99) {

my %base_pairs = ();

my $motif = '';

until (scalar keys %base_pairs == 8) {

my $index = int rand @fullset;

$base_pairs{$fullset[$index]} = $fullset[$index];

}

my @base_pairs = keys %base_pairs;

my @major = @base_pairs[0..3];

my @minor = @base_pairs[4..7];

$motif .= rand(1) > $probab ? $minor[$_] : $major[$_] for (0..3);

redo if exists $results{$motif};

$results{$motif} = $motif;

}

print "$_\n" for keys %results;

}

sub Trojan {

for(my $j=0; $j<99; $j++) {

my $i;

my @major = (); # Cluster of 4 nucleotides (A)

my @minor = (); # Cluster of 4 nucleotides (B)

# Pick 4 random pairs

my @temp = @fullset;

for($i = 0; $i<4; ++$i) {

my $index = int(rand(scalar(@temp)));

push @major, $temp[$index];

splice(@temp, $index, 1);

}

#print "@major\n";

# Pick 4 random pairs (excluding matches to @major)

@temp = @fullset;

for($i = 0; $i<4; ++$i) {

my $index = int(rand(scalar(@temp)));

redo unless($major[$i] ne $temp[$index]);

push @minor, $temp[$index];

splice(@temp, $index, 1);

}

#print "@minor\n";

# Generate resultant set based on previously defined probability

my @result = ();

for($i = 0; $i<4; ++$i) {

push @result, rand(1.0) > $probab ? $major[$i] : $minor[$i];

}

print join("", @result), "\n";

}

}

with the results printed, Trojans code was typically a bit slower than my code (25% approx), but with the results not printed Trojans code was faster than mine (by about the same percentage).

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

## CODE

should be:

## CODE

if the rand() function returns a value greater than the probability factor the minor sequence should be used, not the major sequence, which is what is happening in the first line above.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

Not quite so sure about your algorithm for selecting the pairs though. Would it be fair to say that with your algorithm, @minor could not possibly contain any pairs that appear

anywherein @major and vice versa?If that's the case, you'll see from the original supplied example that it

shouldbe possible to have the same values, as long as they do not share the same index position.Trojan.

## RE: Seek help on Generating multiple 8 base-pair long dependent motif

I admit I was (still am) confused if identical base-pairs could be in major and minor as long they were not in same index. But since there could never be duplicated base-pairs in the motif (I think) it seemed easier just to make sure both arrays contained no duplicate base-pairs to begin with. That might make a difference in the final outcome of the generated list, but honestly I am not sure.

Also, the implication of my benchmarking test is that the join() function is extremely slow and looks like it can drag down the run time of an entire script. You might want to look into that if you use join() often in code.

I might play around with my code a little to see if I can speed it up.

Regards,

Kevin