More docs on the ARB website.
See also index of helppages.
Last update on 04. Mar 2022 .
Main topics:
Related topics:

    Chimera check


    ARB_NT/Sequence/Chimera check



    Takes sequences, a tree and a column statistic as input, and generates a short sequence quality output string, which will be stored into the database under a user defined key.

    First the sequences are split into different slices:

    • 2 pieces (front and back half)
    • 5 equally sized pieces
    • user defined pieces

    The programs sums up the weighted mutations for each sequence slice using a maximum likelihood technique.

    For each slice a students t-test (see is performed and its result is written into the XXX portions of the entries mentioned below. The t-test tests whether the likelihood of a specific sequence slice (of one species) follows a t-distribution of the likelihoods for that sequence slice in all examined species.

    The meaning of each X contains the result of the t-test (the "t-value") as follows:

    • if the t-test succeeds the value of X is '1' up to '8' (where '5' is shown as '-').
    • if the t-test fails '0' or '9' is written to the X's
    • if there is not enough data to perform the t-test, '.' is written to the X's

    Rule of thumb: Values near '0' or '9' indicate regions with an abnormal, values near '5' ('-') regions with a normal (i.e. expectable) number of weighted mutations.

    The sequence quality string written into a user-definable species field has the following format:

    • MED is the median of all t-values (0.0 = normal; <5.0 = succeeds t-test (mean); >5.0 = abnormal)
    • SUM is the sum of all t-values
    • aXX shows the quality for 2 pieces
    • bXXXXX shows the quality for 5 pieces
    • cXXXXX...XXXX shows the quality for user defined slices

    Optionally a 'quality' entry may be written to the alignment, allowing to display it in EDIT4 below the sequence. That quality entry simply is a "blown up" version of the "cXXXXX...XXXX" part of the sequence quality field.

    If 'Keep existing reports' is unchecked, the selected 'species field' and the quality entry in the alignment are removed from ALL species when running a (new) chimera check.

    By clicking the 'Remove them now' button they will be removed w/o running a new check. Make sure the selected 'species field' is correct!



    Only sequences which are in the tree are used.

    Slices in high variance regions more easily pass the t-test.

    Slices from sequences with higher overall variance more easily pass the t-test.



    Needs really a lot of memory!