 More docs on the ARB website. See also index of helppages. Last update on 04. Apr 2019 .
Main topics:
Related topics:

# Algorithm used for consensus tree

## DESCRIPTION

### ARB has its own library for calculating consensus trees, which is used when

• calculating bootstrap trees with NJ (ARB_DIST),
• directly calculating consensus trees using ARB_NT/Tree/Build consensus tree and
• from the command line tool arb_consensus_tree.

### The algorithm works as follows:

• all trees are read and all occurring branches are stored in a branch-pool
• the consensus tree is constructed iteratively by
• picking the "best" branch and adding it to the tree.
• deleting all now impossible branches from the branch pool.

### The best branch (at each time of the iteration) is determined by the following criteria (listed in order of significance):

• inner branches are picked before leaf branches
• branches occurring more often (i.e. branches with higher bootstrap values) are picked first
• branches nearer to the center of the tree are preferred over more distant branches
• longer branches are preferred over shorter branches

## Partial trees

### These two hypothetical branches are added with different probabilities:

• adding missing species to the bigger side of a branch is done with a probability of 1.0
• adding missing species to the smaller side of a branch is done with a probability P calculated as follows:
`P  =  ( S / B ) ^ M`
where
```S  =  number of edges on the SMALL side of the branch + 1
B  =  number of edges on the BIG side of the branch + 1
M  =  number of missing species```

## Branch lengths

### Example:

Imagine merging 3 trees, where species C is missing in one tree and length of the branch 'AB---CD' is 0.3 in the two full trees and the length of the branch 'AB---D' in the partial tree is 0.6.
The probability of adding C to the D-side of 'AB---D' is P=2/3. The summarized weight for that branch is 8/3, the summarized length will be 2*0.3 + 2/3*0.6 = 1.0. Then the average length calculates to 1.0 / (8/3) = 0.375, i.e a bit more than 0.3, but far from 0.6.

## EXAMPLES

### Example for 2 trees:

```A      C                  A      D
\    /                    \    /
+--+                      +--+
/    \                    /    \
B      +--E               B      +--F
|                         |
D                         E```

### These trees contain the following branches:

```A---BCDE                  A---BDEF
B---ACDE                  B---ADEF
C---ABDE                  D---ABEF
D---ABCE                  E---ABDF
E---ABCD                  F---ABDE
AB---CDE                  AB---DEF
ABC---DE                  ABD---EF```

### Consensus tree is build upon the following branch pool:

```A      --- BCDE[F]              A[F]      --- BCDE
B      --- ACDE[F]              B[F]      --- ACDE
C      --- ABDE[F]              C[F]      --- ABDE
D      --- ABCE[F]              D[F]      --- ABCE
E      --- ABCD[F]              E[F]      --- ABCD
A      --- B[C]DEF              A[C]      --- BDEF
B      --- A[C]DEF              B[C]      --- ADEF
D      --- A[C]BEF              D[C]      --- ABEF
E      --- A[C]BDF              E[C]      --- ABDF
F      --- A[C]BDE              F[C]      --- ABDE
AB     --- CDE[F]               AB[F]     --- CDE
ABC[F] --- DE                   ABC       --- DE[F]
AB     --- [C]DEF               AB[C]     --- DEF
AB[C]D --- EF                   ABD       --- [C]EF```

### Bootstrap values are calculated for branches:

```AB --- CDEF          100%            #1
EF --- ABCD           56.2%          #2
DE --- ABCF           50%            #3
ABC --- DEF           33.3%          #4
ABF --- CDE           16.7%          #5
ABD --- CEF           16.7%          #6
ABDE --- CF           12.5%          #7
AF --- BCDE            6.25%         #8
ACDE --- BF            6.25%         #9
ABCE --- DF            6.25%         #10
AC --- BDEF            6.25%         #11
ADEF --- BC            6.25%         #12
ABDF --- CE            6.25%         #13
ABEF --- CD            6.25%         #14
A --- BCDEF          100%            #15
B --- ACDEF          100%            #16
D --- ABCEF          100%            #17
E --- ABCDF          100%            #18
C --- ABDEF           50%            #19
F --- ACBDE           50%            #20```

### That results in the following tree building steps:

`AB---CDEF                    (add branch #1)`
```       CD
/
/
AB--+                        (add branch #2)
\56%
\
EF```
Now it's impossible to insert branch #3 => branch is dropped!
```       C
/
/
AB--+                        (add branch #4)
\33%
\
+---D
/
/56%
EF```
Branches #5 to #14 are now impossible to insert => drop them
Now only leaf branches (#15 to #20) have to be added, which leads to the following final topology:
```A         C
\       /
\     /
+---+
/     \33%
/       \
B         +---D
/
/56%
E---+
|
|
F```