Google
More docs on the ARB website.
See also index of helppages.
Last update on 04. Mar 2022 .
Main topics:
Related topics:

Probe Collection Matching

OCCURRENCE  

ARB_NT/Probes/Match Probes with Specificity

 

DESCRIPTION  

Searches for potential probe target sites within the sequence entries of the corresponding 'PT_SERVER' (not the current) database from a list of named probes in a probe collection. Matching is performed with the maximum allowable number of mismatches for each given probe.

Probe collections are created or loaded with the PROBE COLLECTION window. Probes must be named with unique names otherwise the match results cannot be reconcilled correctly. Similarly, species within the tree must also have unique names for the match processing to work correctly.

To add a probe to the collection enter the 'Target String' and the 'Probe Name' and press the ADD button. To remove a probe from the collection select the probe from the 'Probes' list and press the REMOVE button. Pressing the 'FORGET' button will remove all probes from the collection and start a fresh.

To save the probe collection to file press the 'SAVE' button and to load a previously created collection press the 'LOAD' button. Probe collections are stored in a simple XML format so they can be easily created with an external text editor. The file format is detailed below.

The 'Match Weighting' matrix specifies how mismatch penalities will be alloted to sequence mismatches. The 'Positional Weighting' parameters adjust the mismatch penalities according to position through the following equations:

S = -ln(10) / 'Width'
P = (((2.0 * 'position') - 'length') / 'length') - 'Bias'
Weight = exp(S * P * P)

where 'position' is the sequence position and 'length' is the probe length. weighting function gives a bell curve shape whose spread is controlled by the 'Width' parameter, centre is controlled by the 'Bias' parameter and whose maximum is one.

For the default values of 1 and 0 for 'Width' and 'Bias' respectively the weighting function has a value of 1 for a position that is half the probe length and 0.1 at the zeroth position and the probe length position.

The 'Match Weighting' and 'Positional Weighting' parameters are saved as part of the probe collection XML file.

The MATCH PROBES WITH SPECIFICITY window is used to perform probe collection matching. The 'Probes' list shows the probes in the probe collection to be matched. If the list is empty you can click on the EDIT button to open the PROBE COLLECTION window and create or open a probe collection. The CLEAR button clears any previous match results but leaves the probe collection in tact.

You need to select a 'PT_SERVER' from the menu displayed after pressing the 'PT_SERVER' button before you can carry out a probe collection match. Press the MATCH button to carry out the match operation. When the match is complete the number of matches found will be displayed and the complete list of match results can be viewed by pressing the RESULTS button. Be warned that with large probe collections this can be a very large text file.

Match results are displayed in the DENDROGRAM view using a series of vertical bars (one bar per probe) on the left hand side indicating regions in the tree where matches occur. Left mouse clicking on the bar will open a status message telling you which probe the bar corresponds to.

 

Match display control  

What constitutes a match is controlled by the MATCH DISPLAY CONTROL parameters in the MATCH DISPLAY CONTROL window. The controls allow you to test, in real time, the match performance of probe collections without having to re-run the time consuming match operation.

The 'Mismatch threshold' slider controls the threshold level that dictates whether a partial match will be regarded as a match or a mismatch. The scale of the 'Mismatch threshold' spans the range from zero to the maximum match weight for the found match results.

The 'Clade marked threshold' slider controls the threshold level (between 0 and 100%) that governs whether a clade is marked as matched. For example, if the slider was set to 70% it would indicate that at least 70% of species within a clade must match to the degree dictated by the 'Mismatch threshold' before the clade is marked as matched.

In a similar manner, the 'Clade partially marked threshold' slider controls the threshold level (between 0 and 100%) that governs whether a clade is marked as partially matched. Partial clade matches are indicated with a stippled bar whereas for a full match the bar is solid.

More options are available via 'Marker display settings' (see ´Tree marker display setup´).

 

Display interaction  

Click (and drag) on a marker shown in tree display, to display its name and to select the corresponding probe in the probe selection list.

 

PROBE COLLECTION XML  

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE probe_collection>
<probe_collection name="">
    <probe_list>
        <probe seq="AGGUCACACCCGUUCCCA" name="probe1"/>
        <probe seq="AGGUCACACCCGUUCCCG" name="probe2"/>
        <probe seq="AGGUCACACCCGUUCCCT" name="probe3"/>
            .
            .
            .
    </probe_list>
    <match_weighting width="1" bias="0">
        <penalty_matrix values="0 1 1 2 1 0 1 1 1 1 0 1 2 1 1 0"/>
    </match_weighting>
</probe_collection>

The penalty matrix values follow row major ordering.

 

NOTES  

The 'PT_SERVER' database ('*.arb' and '*.arb.pt') stored in '$ARBHOME/lib/pts' is used for probe target searching not the current database.

The 'PT_SERVER' database has to be updated ('ARB_NT/Probes/Probe Admin') if species entries should be considered for probe target searching which have been added or modified (sequence symbols) later than the date of the most recent 'PT_SERVER' database update.

Probe target searching does not depend on correctly aligned sequences and is not affected by any modifications of database entries except changes of sequence residues.

 

EXAMPLES  

None

 

WARNINGS  

Take care to ensure that all probes in the probe collection and all species in the current database are uniquely named. Not doing so will result in results not being displayed correctly.

 

BUGS  

No bugs known