Google
More docs on the ARB website.
See also index of helppages.
Last update on 04. Mar 2022 .
Main topics:
Related topics:

Parsimony value

OCCURRENCE  

displayed in the top area of the ARB_PARSIMONY main window

 

DESCRIPTION  

The parsimony value indicates the quality of a trees topology.

Basically it counts the minimum number of base mutations that necessarily needed to occur, if we assume that the current topology represents the way the evolution took. Therefore smaller values indicate better topologies.

Several parameters influence the absolute parsimony value:

  • if you specify a filter (in parsimony startup window) only mutations in the remaining, unfiltered alignment columns are counted, i.e. filtering will normally lower the resulting parsimony value.
  • if you specify a weighting mask (in parsimony startup window) higher weighted sites will count stronger and raise the absolute parsimony value.
  • adding more species to a tree will normally raise the number of mutations, i.e. a tree with many species has a higher parsimony value than a tree with fewer species (see also ´Reset optimal parsimony´).

If you compare parsimony values of different topologies you need to use the same alignment, the same filter, the same weighting mask and the same set of species.

 

Dots  

ARB uses dots ('.') as a special gap type. The meaning of a dot is "might be a gap or a nucleotide/aa". It indicates the lack of any information about the sequence data at the position where they are used.

Opposed to that, a normal gap ('-') clearly states that it is KNOWN that the sequence does NOT CONTAIN any bases at the positions of the gaps - the gaps have only been inserted for alignment purposes.

And - opposed to gap - a 'N' (or 'X' for amino acid sequences) clearly states, that it is KNOWN that the sequence CONTAINS some nucleotide/aa at that position.

In ARB databases you should use dots at both ends of the alignment. Doing so means: you know that the sequence continues in both directions - it just has not been sequenced completely.

Also you may use dots in the middle of the alignment, whenever you have stronger indications, that some gap might in fact be a sequencing error.

 

Mutations against dots  

When the parsimony value is calculated, dots do not cause mutations. They will match any base or gap or other dot.

That means, dots at both sequence endings will compensate some of the negative effects, that are normally caused by using sequences of different lengths (e.g. clustering of ´Partial sequences´).

 

Differences between sequence types  

For nucleotide sequences:

Mutations are simply counted for single nucleotides.

For amino acid sequences:

Mutations are determined on amino acid basis. This differs from what would be done when using the corresponding DNA alignment:
  • in DNA several different codons (combinations of 3 nucleotides) may represent the same amino acid. Therefore a mutation would be counted for DNA, where no mutation is counted for AA.
  • the parsimony value for amino acid alignments does not count the number of amino-acid-mutations. It counts the minimum number of nucleotide(!) mutations needed to mutate from one amino acid to another, while assuming that there is no selection pressure when mutating a codon into another codon that translates into the same amino acid (see also EXAMPLES below).

ARB generally uses the "Standard code" to calculate the mutations between different amino acids, when determining the parsimony value.
 

NOTES  

The parsimony value is also used to ´Calculate Branch Lengths´.

 

EXAMPLES  

for an amino acid mutation

Imagine an alignment position P and three species, where

  • species F has an 'F' (Phenylalanine) at position P,
  • species Q has a 'Q' (Glutamine) at position P and
  • species L has an 'L' (Leucine) at position P.

These amino acids may be represented by the following codons:

  • F = TTT | TTC
  • Q = CAA | CAG
  • L = TTA | TTG | CTN

Based on the minimum codon distances, the mutation costs used in ARB_PARSIMONY are:

  • F -> Q = 3 mutations
  • F -> L = 1 mutation (e.g. TTT -> TTA)
  • L -> Q = 1 mutation (e.g. CTA -> CAA)

This results in the following parsimony values for the possible subtree-rearrangements (R=Rest of whole tree):

   R
   |
   |
   F              pars value = 4
  / \
 /   \
Q     L
   R
   |
   |
   Q              pars value = 4
  / \
 /   \
F     L
   R
   |
   |
   L              pars value = 2 (!)
  / \
 /   \
Q     F

Assuming the third topology (which is the "best" according to the parsimony value), means to assume that the ancestor of Q and F had an L at position P. As no selection pressure is assumed for mutating that 'L'-codon (e.g. from 'TTA' into 'CTA') no mutation penalty is counted when calculating the parsimony value.

 

WARNINGS  

None

 

BUGS  

No bugs known