Google
More docs on the ARB website.
See also index of helppages.
Last update on 04. Mar 2022 .
Main topics:
Related topics:

ARB NAMESERVER / Synchronize IDs

OCCURRENCE  

ARB_NT/Species/Synchronize IDs

ARB_MERGE/Check IDs/Synchronize

It's also used by several functions that create new species (eg. after import).

 

DESCRIPTION  

Automatically creates unique identifiers (=shortnames stored in field 'name') for species entries in the database. It is required for ARB that all species have different, unique IDs - otherwise ARB will misbehave in many ways!

The single species entries are normally distinguished and identified by their accession numbers.

The unique IDs are created using information from the 'full_name'.

Usually, the first three letters are taken from the genus designation, the remaining letters from the species name.

These tasks (identification and ID-generation) are handled by the so called NAMESERVER.

If there are duplicated (ie. indistinguishable) species entries, the different versions are indicated by appending a dot followed by running numbers: e.g. "DicTherm.2", "DicTherm.3", ...

 

NOTES  

The IDs are stored with the database. They are protected versus change to avoid assigning the same ID to different species.

Accession numbers (stored in the field 'acc') normally will be imported from public databases together with the sequence data. If no accession number has been found during import (eg. because the sequence has not yet been published), ARB will automatically generate accession numbers (="ARB_" followed by a CRC-32-checksum of the sequence data).

 

Duplicate IDs  

"Synchronize IDs" will create duplicate names whenever it fails to distinguish between two or more species. If there is some warning about duplicate entries, you REALLY should try to understand the reason why this happens!

Following some situations where you will run into that problem and instructions how to solve the problem:

  1. you've imported multiple IDENTICAL sequences w/o accession number. The accession numbers generated by ARB will be identical as well and "Synchronize IDs" will complain about duplicate species.
    Consider to remove the duplicated species. Normally duplicated information isn't very useful. If this is no option for you, you might as well manually change the accession numbers of the duplicated species (if you understand the implications).
  2. you've imported several genes from one organism and each of them was assigned the same accession number (the acc of the organism)
    Use an additional field to make your species entries distinguishable (e.g. a field containing the start-position of each gene). You may configure whether and which field to use together with NAMESERVER (see ´Nameserver admin´).

 

NAMESERVER  

The NAMESERVER stores the associations between the unique IDs and species entries (represented by the accession number and optionally an additional field) in the NAMESERVER-database. The standard nameserver uses the file '$ARBHOME/lib/nas/names.dat' as its database.

For more details refer to the active arb_tcp.dat (Tools/Nameserver admin/Configure arb_tcp.dat).

If you have multiple database containing common species, synchronizing IDs for all these databases will generate the identical IDs for identical species (as long as you use the same NAMESERVER-database).

 

Central NAMESERVER  

It is possible to link names.dat to a central names.dat, but you should be aware, that there may occur temporary inconsistencies, if multiple users use the NAMESERVER at the same time.

The NAMESERVER examines names.dat and terminates within 5-10 seconds if the file changes. A message is written to the console window in either case.

Another way to use a central NAMESERVER is to edit '$ARBHOME/lib/arb_tcp.dat' and to specify a central host for ARB_NAME_SERVER. This completely avoids any inconsistencies, but if too many users try to access that nameserver at the same time, you'll run into DOS problems.

 

EXAMPLES  

None

 

WARNINGS  

None

 

BUGS  

No bugs known