how we handle AA/DNA databases:-
we maintain a DNA alignment for ALL sequences we want in our database.
-
we translate the DNA alignment into an amino acid alignment => each species has 2 alignment entries, e.g. "data/ali_dna" and "data/ali_pro"
-
we align the protein sequences
-
we realign DNA (according to the aligned protein sequences; see ´Realign DNA´)
Here some reasons why we act as described:
-
we want to maintain a DNA alignment, to be able to use the PT-Server (to find next relatives). You'll find a section about that issue in ´The integrated aligners´.
-
we want to keep DNA data, because it contains more information than the translated protein sequences (you cannot create DNA from protein sequences).
-
we want to align sequences using protein alignments, because the alignment of protein sequences is more determined than the alignment of the corresponding DNA sequences.
-
we always realign DNA after changing the protein alignment, to always be able to perform a new translation from scratch (e.g. if translation table changes; see ´Translate DNA to Protein´)
| |
|