More docs on the ARB website.
See also index of helppages.
Last update on 13. Nov 2025 .

Main topics:

Maximum base frequency

Calculate the Percentage of the Most Frequent Base

OCCURRENCE
DESCRIPTION
NOTES
EXAMPLES
Gaps
Ambiguities
BUGS

OCCURRENCE

ARB_NT/SAI/create SAI/Max Frequency

DESCRIPTION

Finds the most frequent base (or gap) in each column for all marked species. Then the number of all sequences with this base are divided by:

the number of all marked sequences (if not ignoring gaps)
the number of bases in this column (if ignoring gaps)

The resulting percentage is divided by ten and then the second last digit is taken:

0%  - 19%  ->  '1' (does not occur for nucleotides)
20% - 29%  ->  '2'
30% - 39%  ->  '3'
...
90% - 99%  ->  '9'
100%       ->  '0'

NOTES

The result can be used as a conservation profile and filter. Rule of thumb: the higher the number, the more conserved the position (but mind the '0' which means 100%!).

Internally the SAI consists of two lines: the main line called 'data' and a second line called 'dat2'.

The first is used when you use the SAI as conservation profile or filter and contains the SECOND LAST digit of the calculated frequencies.

The second contains the LAST digit of the calculated frequencies. It is not used and does only show up, when you load the SAI into ARB_EDIT4, where it will show both lines.

EXAMPLES

Say one column contains 7 A's 4 G's and 5 Gaps.

ignoring gaps will result in 7/11 == 64 % which is converted to '6'.
otherwise we get 7/16 == 44% which will be indicated by a '4' in the target sequence.

Gaps

If gaps are ignored '-' are treated like '.': both get removed and frequency is calculated on non-gaps only.

If gaps are NOT ignored, '-' are treated like non-gaps, i.e. a column containing only '-' will be assigned a max. frequency of 100%. '.' are treated as gaps.

Ambiguities

Ambiguities are counted proportionally, i.e.

a 'N' counts as 1/4 'A', 1/4 'C', 1/4 'G' and 1/4 'T'
a 'D' counts as 1/3 'A', 1/3 'G' and 1/3 'T'
a 'Y' counts as 1/2 'C' and 1/2 'T'

Example:

A column containing 9 'C' and one 'Y' results in a max. frequency of 95% (=9.5 'C').