Google
More docs on the ARB website.
See also index of helppages.
Last update on 04. Jul 2023 .
Main topics:
Related topics:

Searching

OCCURRENCE  

ARB_NT/Species/Search and Query

ARB_NT/Genome/Search and Query

ARB_NT/Tree/Search groups..

 

DESCRIPTION  

This describes the search feature in ARB as used in the following search and query modules:


When we talk about 'items' below, we mean e.g. 'species', 'genes', 'taxonomic groups' etc., depending which search tool you are currently using.

 

SEARCH FIELD  

Each search expression applies either

  • to a specific item field (e.g. 'full_name') or
  • to some criterion calculated on the fly (e.g. amount of marked species inside a taxonomic group) or
  • to any or all item fields, if you select one of the entries in "[...]".

The following special search fields may be available:

  • "[any field]" reports a match if any direct field matches the expression.
  • "[all fields]" reports a match if all direct fields match the expression.
  • "[any recursive]" reports a match if any direct or hierarchical field matches the expression.
  • "[all recursive]" reports a match if all direct and hierarchical fields match the expression.

Notes:
  • search is much slower using one of the 'recursive' fields mostly because sequence data is searched as well.
  • "[all fields]" is often used together with "not equal" (see below), making it equivalent to "no field matches expression".

 

SEARCH OPERATORS  

There are two kinds of search operators directly available for queries:

  1. the "equal" sign between the field and the match expression means that the selected field (or any field) should match the expression. Clicking on the sign inverts it into a "not equal" sign, which means the selected field shall not match the expression.
  2. the search operators at the beginning of the 2nd and 3rd line allow to connect the 3 search expressions available for each query. Possible values are 'and', 'or' or 'ign'.
    • 'ign' stands for "ignore" (the rest of the line will be ignored)
    • selecting 'and' means the preceeding and the expression behind have to match
    • selecting 'or' means the preceeding or the expression behind have to match
      There is no operator precedence, i.e.
    • "1st and 2nd or 3rd" is interpreted as "(1st and 2nd) or 3rd" AND
    • "1st or 2nd and 3rd" is interpreted as "(1st or 2nd) and 3rd"


More search operators are available to connect multiple (consecutive) queries:

  • using 'Add species' provides a global OR operator (uniting the results of the preceeding and the next query),
  • using 'Keep species' provides a global AND operator (intersecting the results of the preceeding and the next query) and
  • using "that don't match the q." provides a global NOT operator for the next query

Results of queries can be transformed into a set of 'marked species' using "Mark listed unmark rest" and the marked species can be stored as ´Species selections (=editor configurations)´. Multiple stored configurations can be logically combined to new sets of marked species. To again create a query result from all marked species simply use "Search species ... that are marked".

 

MATCH EXPRESSION  

  • Each expression tries to match the complete field content (or the result of the underlaying calculation), i.e. searching for 'test' will match only fields which exactly contain 'test' (not 'my test' or 'testing').
  • If you search for '' (empty expression), all fields w/o data, i.e. all non-existing fields will be found.
  • if you want to match all fields that contain some substring then use wildcards:
    • '*'
      will match any number of characters (including no characters).
    • '?'
      will match exactly one character

    If the whole search expression is '*', then it is handled like '?*' (which means 'at least one character'). That means searching for '*' will match any non-empty field.
    Examples:
    '*pseu*'        matches all fields with the substring 'pseu'
    'pyrococcus*'   matches all fields starting with 'pyrococcus'
    '*bact*ther*'   matches all fields with the substring 'bact' followed by 'ther'
                    (there may be many characters in-between or none,
                    i.e. it does match 'bactther' as well as 'Corynebacterium diphtheriae')
  • if the first character is '<' or '>' and the rest is a number, then a numerical comparison is performed:
    • '<7'
      matches all fields containing a number smaller than 7
    • '>10'
      matches all fields containing a number greater than 10

    Be careful:
    Negating '<7' does NOT only match numbers greater or equal to seven. It as well finds all non-numeric contents. Use something like '>6.999' instead.
  • if the first character is '/' then the following regular expression is used for the query (see ´Regular Expressions (REG)´).
  • if the first character is '|' then the following ACI expression is evaluated and the query hits, if the evaluation is not "0". See ´ARB Command Interpreter (ACI)´.
  • if the query string is completely empty, it hits if the selected field does not exist (or if a calculation produces no/empty result).

 

SORTING RESULTS  

Search results are displayed unsorted by default. You can sort them, by selecting a different order with the sort radio button.

The provided sort criteria depend on the kind of query. The following list shows the sort criteria available in ´Search Database for Species´:

unsorted       display items like they are stored in database
by value       sort by content of first query field
by number      same as "by value", but sort numerically
               (for string-type fields this sorts multiple columns of numbers)
by id          sort by unique item id (e.g. 'name' for species)
by parent      sort by globally unique id of parent item (e.g. 'name' of organism for genes)
by marked      sort marked before unmarked items
by hit         sort by (and display) hit description (the hit description tells you
               why an item was hit by query)
reverse        reverts previously selected sort order

ARB remembers and uses all the sort criteria you apply.

Example: Selecting 'by id' will sort the items by their id (e.g. 'name'). If you select 'by value' afterwards, ARB will sort items by the content of the first query field - if the contents of some items are equal, it will still sort them by name.

 

NOTES  

Wildcarded or exact search always searches case insensitive. Regular expression search always searches case sensitive.

 

EXAMPLES  

 

WARNINGS  

Using ACI is a bit tricky here, cause you cannot see what happens.

Using 'trace(1)' somewhere in the ACI expression starts to print an ACI trace to the console. To view the console refer to ´View ARB logs´.

 

BUGS  

No bugs known