Google
More docs on the ARB website.
See also index of helppages.
Last update on 30. Nov 2022 .
Main topics:
Related topics:

Search taxonomic groups

OCCURRENCE  

ARB_NT/Tree/Search groups

 

DESCRIPTION  

Allows to find taxonomic groups in trees.

First select which trees shall be searched:


Standard search mode is to 'list' all groups that 'match' the query. Alternatives are:

  • selecting 'dont match' instead of match will invert the overall query
  • selecting 'add', 'keep' or 'remove' instead of 'list' will allow to combine the results of multiple consecutive searches.

Query expressions are handled similar to those in ´Search Database for Species´. While species-search operates on database fields, groups-search mainly operates on values which are calculated on-the-fly. Please refer to ´Searching´ for general information about query expression syntax and about the combination of multiple query expressions. For details about the search criteria available for group search see section below.

Press ENTER or click 'Search' to start the search.

The HITLIST will display all matching groups. The number of hits is shown above the HITLIST.

Click onto a result to select the group in the main window.

Double click or press ENTER on a result to expand or collaps the selected group.

Below the HITLIST is a radio button which allows to choose the order (and the content) of the displayed results.

The following criteria are available for sorting:
  • by name: sort alphabetically by name of group
  • by nesting: sort numerically by level of group nesting (top level groups like 'Bacteria' have level 0, their direct child group have level 1, etc.)
  • by size: sort numerically by size (number of group-members)
  • by marked: sort numerically by number of marked species (will not automatically update if you change marks; rerun 'Search' to do so)
  • by marked%: similar to mark, but uses percentage of marked
  • by treename: sort alphabetically by name of tree
  • by treeorder: sort by tree (in order defined by ´Tree administration´)
  • by hit: sort by hit-description (the hit description tells you why an item was hit by query)
  • by cluster: only has effect for duplicate search (see section below)
  • by AID (see below)
  • by keeled: keeled groups at top (see ´Keeled groups´)
  • reverse: reverts previously selected sort order

The most recently selected criterion will be most authoritative, but previously selected criteria still remain active, i.e. when you first sort by name and afterwards by treename, the results will group by trees, but inside each tree-section groups are sorted by name.

The 2 buttons above the HITLIST allow to modify it:

  • 'Remove' allows to manually remove unwanted hits from the result list (useful before applying one of the actions listed in the next section).
  • 'Clear' empties the result list.

 

Search criteria  

Group search supports the following search expressions:

  • 'groupname' matches against the name of each group
  • 'parent' matches against the name of the direct parent group of the queried group
  • 'parent (any)' is true if any parent group of the queried group matches the given expression
  • 'parent (all)' is true if all parent groups of the queried group match the given expression (a common use of this is to check for exclusion. To e.g. check that no parent group matches '*bacteria*', check that all parents mismatch '*bacteria*')
  • 'nesting' allows to query the nesting-level of groups (top-level-groups like 'Bacteria' have a nesting level of 0 (zero), their direct child-groups have a nesting level of 1, ...)
  • 'folded' allows to query for folded/unfolded groups (possible values are 0 and 1)
  • 'size' matches against the groupsize (i.e. against the number of species inside a group and its subgroups)
  • 'marked' queries against the number of marked species inside a group
  • 'marked%' queries against the percentage of marked species inside a group
  • 'zombies' queries against the number of zombie species inside a group
  • 'AID' queries against the average ingroup distance (see following section)
  • 'keeled' queries against the keeled state (0=normal,1=keeled upper son,2=keeled lower son; see ´Keeled groups´)

 

Average ingroup distance (AID)  

For each group the average distance of all possible pairs of species inside that group is retrieved from the tree structure and is provided as criterion for group-search and to order results.

The distance of two species is defined as the sum of the lengths of all branches connecting the two species.

 

Duplicate search  

Next to the 'Search' button is a selector which allows to restrict the listed groups by analysing whether they are duplicates or not. The different available modes are:

  • 'no' = deactivate duplicate search
  • 'duplicate groups only' = activate duplicate search
  • 'unique groups only' = list all group not reported by duplicate search

The 'Configure' button provides detailed settings for duplicate search:
  • Min. size of duplicate cluster
    The minimum number of groups, which have to be strictly consistent with any given duplication criteria (=core of cluster). If that min. size isn't found, these groups will not be listed in results.
  • Search duplicates
    Defines where duplicates are expected to occur.
    • inside same tree
      All groups of a cluster have to be members of the same tree. Duplicates in other trees probably form their own cluster.
    • in different trees
      The core of the cluster will only consist of groups from different trees (one hit per tree) and the whole cluster will be discarded, if the required minimum size isn't reached. For the final result all other duplicate will be added, i.e. there may be more than one hit per tree.
    • anywhere

  • Ignore case?
    Define whether to ignore case when matching group names, words or when checking against the list of ignored words.
  • Duplicates are names that
    Defines how duplicate groups are detected. Either
    • by matching the whole name or
    • by matching single/multiple words.

  • Min. number of matching words
    If fewer words match between two compared groupnames, they are counted as mismatch. Hits with more matching words are preferred over such with fewer words.
  • Word separators
    Defines characters which separate words. Should normally contain a SPACE character.
  • Ignored words
    Specifes a list of words that will be completely ignored when matching wordwise.

Sorting results 'by cluster' will list related duplicate-groups next to each other. It will also add a new column showing the unique IDs of each cluster of groups.

 

Working with listed groups  

Right of the HITLIST are several buttons allowing to work with the found results:

  • 'Rename ...' allows to ´Rename taxonomic groups´.
  • 'Expand listed' will expand all listed groups and their parent groups.
  • 'Expand listed collapse rest' does the same and additionally folds all other groups.
  • 'Expand parents' will expand the parents of all listed groups, i.e. all listed groups will become visible.
  • 'Collapse listed' will collapse all listed groups.
  • The button 'Mark', 'Unmark' and 'Inv' allow to change marks of species contained in listed groups. Use the option-menu below these buttons and select
    • 'selected' to operate on all species contained in the currently selected group,
    • 'any listed' to operate on all species contained in ANY of the listed groups,
    • 'all listed' to operate on all species contained in ALL of the listed groups or
    • 'database' to operate on all species in the database.

  • 'Destroy selected group' will delete that group from the tree. Be aware that this action is currently irreversible (see BUGS below).
  • 'Destroy all listed groups' will delete all groups currently listed in the HITLIST.

 

NOTES  

Please configure auto-focus options in ´Tree Settings´ (esp. auto-unfold) to improve the usability of the group search.

Use the config manager icon (see ´Property/settings configurations´) to store/restore group search and rename settings.

 

EXAMPLES  

Common combinations of expanding and collapsing groups:

  • to collapse ALL groups of ALL searched trees press
    • 'Clear' to empty the HITLIST and
    • 'Expand listed collapse rest'.

  • to expand parent groups of listed and fold the rest press
    • 'Expand listed collapse rest',
    • 'Collapse listed' and
    • 'Expand parents'.


Common combinations for marking specific group members:

  • to mark all species NOT contained in all listed groups use
    • 'Mark' + 'any listed' followed by
    • 'Unmark' + 'all listed'


 

WARNINGS  

Searching for duplicates with wordwise mode and expecting only a few words (e.g. 2 words for tree in SSURef_NR99_128_SILVA), may take very, very long. Expecting more words will speed up the search.

 

BUGS  

UNDO does not work for deleting groups (http://bugs.arb-home.de/ticket/480)