String analysis of exon 10 of the CFTR gene and the use of Bioinformatics in determination of the most accurate DNA indicator for CF prediction
(1) Fremont High School, Sunnyvale, CA
* These authors made equal contributions
Cystic Fibrosis (abbreviated as CF) is a deadly disease with no cure that affects over 70,000 people worldwide every year. In this study, we aimed to discover an efficient way of diagnosing the disease and hence wanted to figure out the best predictor of a newborn’s predisposition to developing CF by comparing different nucleotide patterns in the CF transmembrane conductance regulator (abbreviated as CFTR) gene. We compared nucleotide sequences from CF patients to healthy humans. We then ran string analyses over each nucleotide sequence, looking for pre-determined patterns. We then compared the patterns we observed in the diseased patients and healthy patients. The pattern that showed the most discrepancy from the diseased gene to the healthy gene was noted as the best predictor of someone’s predisposition to CF. In this experiment, we focused on sequences two to eight bases long. We hypothesized the eight-base long sequence “GGGGGGGG” would be the best predictor due to it being that of the greatest length, therefore making the respective nucleotide sequences longer. Furthermore, G is the least common base, so we reasoned that this sequence will be the least common sequence. Because this sequence is least common, a single occurrence of it would be statistically significant. This differs from the current research, as the current research focuses on analyzing point mutations, and not the whole exon. By focusing on the whole exon, we propose a more accurate determination technique because more DNA is being analyzed. We can carry this work forward by using the same code and scientific process on other exons. Overall, we found “TTCCACAG” occurs 9.13 times more in the healthy nucleotide sequence than the mutant nucleotide sequence. Hence, if we can compare the DNA of a newborn to their parents’ and the DNA string occurs “TTCCACAG” more often, we can conclude the child may have increased risk of developing CF.