Select the "Show advanced options" Button at the top to gain access to the you-may-now-shoot-yourself-in-the-foot-severely dialog window.
Don't be surprised if the graph aligner crashes after you entered silly values here. No sanity check of your options is done.
Pos.Var:
Select a positional variability filter. If possible, use the filter appropriate for the type of sequences you want aligned. Positional variability statistics will be considered when placing the individual bases.
Field used for automatic filter selection:
Configures a database field using which the value for positional variability filter is determined by majority vote from the selected reference sequences. Since the filters are usually computed at domain level, this approach is usually sufficient to select an appropriate filter. For SILVA database, the field 'tax_slv' contains appropriate data.
Turn check:
If selected (default) sequences will be automatically reversed and/or complemented if this will likely improve the alignment.
Realign:
If selected, the sequence itself is excluded from the result of the executed PT-Server family search. If deselected, the alignment of an identical sequence found by the PT-Server is copied.
Gap insertion/extension penalties: (default is 5/2)
You can change the penalties associated with opening and extending gaps.
Match/mismatch scores: (default is 2/-1)
Configures the scores given for a match (should be positive) and a mismatch (should be negative).
Family search min/min_score/max: (default 40/0.7/40)
The first value tells the graph aligner how many sequences it should try to always use. The second value determines the minimal identity with the target sequence additional reference sequences should have. The third value selects the maximal number of sequences to be used as a reference.
Minimal number of full length sequences: (default 1)
Set the minimum number of full length (see "Size of full-length sequences" setting above) reference sequences that must be included in the selected reference set. The search will proceed regardless of other settings until this setting has been satisfied. If it cannot be satisfied by any sequence in the reference database, the query sequence will be discarded. This setting exists to ensure that the entire length of the query sequence will be covered in the presence of partial sequences contained within your reference database.
Family search oligo length/mismatches: (default 10/0)
The first value sets the size of k for the reference search (size of kmer). For SSU rRNA sequences, the default of 10 is a good value. For different sequence types, different values may perform better. For 5S, for example, 6 has shown to be more effective.
The second value allows k-mer matches in the reference database to contain n mismatches. This feature is only supported by the pt-server search engine and requires substantial additional compute time (in particular for n > 1).
Minimal reference sequence length: (default 150)
Set the minimum length reference sequences are required to have. Sequences shorter than this will not be included in the selection.
Note: If you are working with particularly short reference sequences, you will need to lower this settings to allow any reference sequences to be found.
Alignment bounds: (default 0/0)
These values set the beginning and the end of the gene within the reference alignment. See "Number of references required to touch bounds" for more information.
Number of references required to touch bounds: (default: 0)
Similar to "Minimal number of full length sequences", this option requires a total of n sequences to cover each the beginning and the end of the gene within the alignment.
This option is more precise than "Minimal number of full length sequences", but requires that the column numbers for the range in which the full gene is expected be specified via "Alignment bounds" (see above).
Save used references in 'used_rels': (default is off)
Writes the names of the alignment reference sequences into the field used_rels. This option allows using ´Mark by reference´ to highlight the reference sequences used to align a given query sequence.
Store highest identity in 'align_ident_slv': (default is off)
Computes the highest similarity the aligned query sequence has with any of the sequences in the alignment reference set. The value is written to the field 'align_ident_slv'.
Disable fast search: (default is to use fast search)
Use all k-mers occurring in the query sequence in the search. By default, only k-mers starting with an A are used for extra performance.
Score search results by absolute oligo match count: (default is off)
Use absolute (number of shared k-mers) match scores in the kmer search rather than relative (number or shared k-mers divided by length of reference sequence) match scores.
Suppress warnings about missing 'start' field: (default is off)
This option suppresses warnings about missing 'start' fields and allows to use sina with databases not using the 'start' w/o getting flooded with warnings.
SINA command: (default "arb_sina.sh")
If arb has problems finding the sina binary for whatever reasons, you may specify an explicit path here. Please note, doing so will stop a fat-tarball-installation from working!
|