3-base periodicity is a well-known non-random feature of the DNA. That is to say, a base will sometimes be repeated 3 nucleotides away. This should happen randomly at a frequency of about 25% if all the bases are equally represented, but I got something that was slightly away from random.
3-base periodicity is a well known pattern that seems to identify exonic regions. For lack of a better word, I use the word “3mer” whenever I encountered the same base 3 nucleotides away. 3mer is a term Dr. Sanford’s DNA Skittle uses, but I have to confer with him whether that is what he means.
I tried to see how frequently A,T,C,G repeated every 3 bases. It seems the Adenenine and Thymine 3mers appeared about twice as frequently as Cytosine or Guanine 3mers, and this seems partly due to increased A or T frequency. I think this is a legitimate non-random pattern. Here were my numbers for Human Chromosome 1:
guanine count 47,016,562
cytosine count 47,024,413
adenine count 65,570,891
thymine count 65,668,756
guanine 3mer count = 10,798,024
cytosine 3mer count = 10,805,795
adenine = 3mer count 20,297,310
thymine = 3mer count 20,355,586
cg_at_3mer_ratio = 0.5314214023030487
This was a follow on to Jean Claude Perez. I didn’t get the golden ratio, but instead I explored a well-acknowledge phenomenon, namely 3-base periodicity. I want to make sure my numbers are correct. Why should this non-random pattern emerge? Is it codon bias or something? Do I have a bug in my code?
I provided the Java code that I used here:
I got the Chromosome 1 fasta file from:
Any insights and corrections are especially welcome. Thanks in advance.