Google
More docs on the ARB website.
See also index of helppages.
Last update on 25. Nov 2018 .
Main topics:
Related topics:

Nearest relative search

OCCURRENCE

ARB_NT/Search/More search/Search Next Relatives of SELECTED Species in PT Server

ARB_NT/Search/More search/Search Next Relatives of LISTED Species in PT Server

ARB_EDIT4/Edit/Integrated Aligners

 

ALGORITHM

Splits the sequence(s) into short oligos of a given size. These oligos are 'Probe Matched' against the PT_SERVER database. The more hits within the sequence of another species, the more related the other species is.

 

PARAMETERS

PT-Server

Select the PT-Server to search

Oligo length

Length of oligos used to perform probe match against the PT server. Default is 12.

Mismatches

Number of mismatches allowed per oligo. Default is 0.
Be careful: The search may get incredible slow, when rising the number of mismatches.

Search mode

Complete:        Match all possible oligos
Quick:           Only match oligos starting with 'A'
The 'Quick mode' works well for many sequence types and is approx. 4 times faster than the 'Complete mode'. For some sequence types it completely fails, e.g. if there are repetitive areas containing many 'AAAAA'
Relative and absolute scores will be approx. 1/4 (compared with complete mode)

Match score:

absolute:        returns the absolute number of hits
relative:        returns the number of hits relative to some maximum (see score-scaling)
Absolute hits:
Absolute hits are the number of oligos which occur in the source sequence and in the targeted sequences (i.e. in the relatives of the source sequence).
If an oligo occurs multiple times in source or target sequence, it only creates the minimum number of hits (e.g. if it occurs twice in source and three times in a target, only two hits will be counted for that target).
The theoretical maximum for absolute hits is
maxhits = minimumBasecount(source, target) - oligolen + 1
In practice that value is rarely or never reached because several oligos are skipped, namely all oligos containing IUPAC codes, N's or dots. The PT-server as well will not report matches hitting ambiguous positions or sequence endings.
The number of absolute hits is as well affected by other parameters:
  • using quick search will only produces around 25% of the hits as using complete search (assuming that 25% of all oligo starts with an 'A')
  • searching for complement or reverse will duplicate the number of possible hits. Searching for all 4 reverse/complement-combinations will produce 4 times as many hits as a plain forward search.

Relative score:
The relative score is absolute hits scaled versus a maximum POC (possible oligo count). You can specify which maximum POC to use with the selection button next to the score selection button:
to source POC         maximum possible oligos in source
to target POC         maximum possible oligos in target
to minimum POC        minimum possible oligos in source or target
to maximum POC        maximum possible oligos in source or target
'to source POC' will report ~100% score for partial source versus all full sequences containing the part.
'to target POC' will report ~100% score for all partial target sequences which are contained in the source sequence.
'to minimum POC' will report ~100% score if source is part of target or vice versa (this was the default method in previous ARB versions).
'to maximum POC' will report ~100% score if source and target contain each other, i.e. if they have an identical oligo distribution. If either source or target is missing some bases, the score will lower.
When using 'quick search mode' the max. relative score will be 25% (if 25% of the oligos start with 'A').
When searching for forward and reverse-complement, the theoretical max. relative score will be 200%. In practice it won't find much hits on the reverse-complement strand. So you'll get similar scores as without reverse-complement, but especially if you lower the oligo size, you'll probably reach scores above 100%.
The EDIT4 aligner currently always uses 'to minimum POC'.

Complement:

forward:             Match only forward oligos
reverse:             Match only reverse oligos
complement:          Match only complement oligos
reverse-complement:  Match only reverse-complement oligos
The remaining options are combinations of the above.
The combinations will affect the score, especially for shorter oligos. Please read the section about 'Relative score' above to avoid confusion.
Note: Not available for EDIT4 aligner.

Target range:

Restrict the alignment range in which oligos may match. Hits outside that range will not be considered.
 

NOTES

Special effort is taken to eliminate multi-matches, which were ignored in past versions. That resulted in relative scores far beyond 100%, especially for small oligo-lengths.

Now e.g. an oligo occurring 3 times in the source sequence will give atmost 3 absolute hitpoints to any target sequence - even if it occurs there far more often.

 

EXAMPLES

None

 

WARNINGS

Use mismatches with care!

 

BUGS

Relative score is not scaled to the maximum possible hits in the target range.