Google
More docs on the ARB website.
See also index of helppages.
Last update on 04. Mar 2022 .
Main topics:
Related topics:

    Chimera check

    OCCURRENCE  

    ARB_NT/Sequence/Chimera check

     

    DESCRIPTION  

    Takes sequences, a tree and a column statistic as input, and generates a short sequence quality output string, which will be stored into the database under a user defined key.

    First the sequences are split into different slices:

    • 2 pieces (front and back half)
    • 5 equally sized pieces
    • user defined pieces

    The programs sums up the weighted mutations for each sequence slice using a maximum likelihood technique.

    For each slice a students t-test (see http://en.wikipedia.org/wiki/T-test) is performed and its result is written into the XXX portions of the entries mentioned below. The t-test tests whether the likelihood of a specific sequence slice (of one species) follows a t-distribution of the likelihoods for that sequence slice in all examined species.

    The meaning of each X contains the result of the t-test (the "t-value") as follows:

    • if the t-test succeeds the value of X is '1' up to '8' (where '5' is shown as '-').
    • if the t-test fails '0' or '9' is written to the X's
    • if there is not enough data to perform the t-test, '.' is written to the X's

    Rule of thumb: Values near '0' or '9' indicate regions with an abnormal, values near '5' ('-') regions with a normal (i.e. expectable) number of weighted mutations.

    The sequence quality string written into a user-definable species field has the following format:

    MED SUM aXX bXXXXX cXXXXX...XXXX
    where:
    • MED is the median of all t-values (0.0 = normal; <5.0 = succeeds t-test (mean); >5.0 = abnormal)
    • SUM is the sum of all t-values
    • aXX shows the quality for 2 pieces
    • bXXXXX shows the quality for 5 pieces
    • cXXXXX...XXXX shows the quality for user defined slices

    Optionally a 'quality' entry may be written to the alignment, allowing to display it in EDIT4 below the sequence. That quality entry simply is a "blown up" version of the "cXXXXX...XXXX" part of the sequence quality field.

    If 'Keep existing reports' is unchecked, the selected 'species field' and the quality entry in the alignment are removed from ALL species when running a (new) chimera check.

    By clicking the 'Remove them now' button they will be removed w/o running a new check. Make sure the selected 'species field' is correct!

     

    NOTES  

    Only sequences which are in the tree are used.

    Slices in high variance regions more easily pass the t-test.

    Slices from sequences with higher overall variance more easily pass the t-test.

     

    WARNINGS  

    Needs really a lot of memory!

     

    BUGS  

    None