More docs on the ARB website.
See also index of helppages.
Last update on 25. Nov 2018 .
Main topics:
Related topics:

    Graph Aligner


    ARB Editor -> Edit -> Prototypical Graph Aligner



    This is an alternative to the integrated aligners developed for the SILVA project. Similar to those aligners it uses aligned sequences from your current database as a reference to align the selected sequences. Other than them it employs full dynamic programming to create the alignment. It also considers all selected relatives at once, instead of falling back to less similar sequences only if the current sequence is missing bases (e.g. because it is a partial sequence).



    Select the sequences to be aligned as usual ("Current Species", "Selected Species", "Marked Species").

    Select a PT-Server to be used. Make sure it is up to date and contains all sequences you want to be considered as reference.

    HINT: Unless you deselect the "Realign" button in the advanced menu, no sequence will be used as a reference for itself.

    HINT: Sequences with less than 10 gaps are considered not aligned, and also not used as a reference.

    Select a positional variability filter. If possible, use the filter appropriate for the type of sequences you want aligned. Positional variability statistics will be considered when placing the individual bases.

    Decide what to do with possible overhang. If your sequence extends beyond the reference sequences on either side of the alignment, those bases cannot be aligned properly. Three options of handling this situation are supported:

    "keep attached"

    just leave them dangling, directly attached to the last base that could be aligned properly

    "move to edge"

    move them out to the very beginning and end of the alignment. This allows you to easily spot sequences with overhang, and decide what to do yourself. Recommended, but only if you check your sequences after alignment!


    automatically remove these bases.

    Select a protection level higher than that of the sequences if you want the alignment software to actually modify the bases. Choose a lower protection level to execute a "dry run", not changing anything. Note that sequences with a protection level of zero will always be changed.

    The Logging Level option allows you to change the noisiness of the alignment program. All output will be printed to the console from which you started ARB. The Option "debug_graph" may produce several large files for every sequence aligned and is not recommended for the uninitiated.



    If you want to see how the alignment that would be produced by the graph aligner differs from your current alignment, and why the program would act that way, you can set the protection level to "0" and the Logging level to "debug". The output on the console will now include all differing sections of the alignment and the matching parts of the reference sequences.



    Select the "Show advanced options" Button at the top to gain access to the you-may-now-shoot-yourself-in-the-foot-severely dialog window.

    Don't be surprised if the graph aligner crashes after you entered silly values here. No sanity check of your options is done.

    Turn check:

    If selected (default) sequences will be automatically reversed and/or complemented if this will likely improve the alignment.


    If selected, the sequence itself is excluded from the result of the executed PT-Server family search. If deselected, the alignment of an identical sequence found by the PT-Server is copied.

    Load reference sequence from PT Server:

    Do not read alignment data from your current database, but from the database the PT-Server was built from. This makes starting the graph aligner much slower, but allows you to align against external databases or PT-Servers with different sequence names than your current database.

    (Copy and) mark sequence used as reference:

    Mark the sequences that were used as a reference during alignment. This allows you to easily load them into the editor to review the decisions made by the graph aligner. If you also selected the "Load reference" option, sequences will be copied into your current database prior to being marked.

    Gap insertion/extension penalties: (default is 5/2)

    You can change the penalties associated with opening and extending gaps.

    Family search min/min_score/max: (default 15,0.7,40)

    The first value tells the graph aligner how many sequences it should try to always use. The second value determines the minimal identity with the target sequence additional reference sequences should have. The third value selects the maximal number of sequences to be used as a reference.

    Use at least X sequences with at least Y bases: (SSU-default is 1, 1400)

    This option allows you to require that the reference include X sequences of a length larger than or equal to Y.

    Aligner threads / Queue size:

    Up to 4 threads can be used to align simultaneously. If your workstation sports multiple CPUs this will speed up alignment of many sequences. Increase the size of the buffer between the graph aligner components to about 15 when using 4 threads.





    No bugs known