Google
More docs on the ARB website.
See also index of helppages.
Last update on 17. Sep 2019 .
Main topics:
Related topics:

Import Foreign Data(bases)

OCCURRENCE

ARB_INTRO <Create and Import>,

ARB_NT/File/Import/Import from external format

 

DESCRIPTION

Reads foreign data(base) formats, creates a new ARB database, and imports the foreign data. A selection of commonly used foreign formats can be automatically identified. Data can be imported from single or multiple files.

Type a source file name to the 'Enter file name of foreign database' subwindow. Use '*' and '?' as multiple and single character wildcards to load a set of files, respectively. Alternatively you may select a file from the directories and files subwindow.

Make a selection whether you want to import

  • a full genome flatfile (in GENBANK or EMBL format) or
  • normal sequence files.

In the second case select the file format from the 'Select foreign database format' subwindow or press the 'AUTO DETECT' button.

If your file type is not in the list and you are only interested in the sequence, try 'universal'.

Enter the name and type of the destination alignment (see ´What is an Alignment?´).

Use different alignment names for different genes to be able to store them in the same datebase while still being able to distinguish them.

Choose the default protection used for the imported data.

Check "Create selection?" to store the names of all imported species in a ´Species selections (=editor configurations)´.

Click the 'FTS' selection button to define or select ´Field transfer sets´.

Press the 'GO' button.

 

Custom import filters

You may create and store your own private import filters in directory '~/.arb_prop/filter'. See ´How to define new import formats´ for information about the import filter definition language.

Press the 'Test' button to modify and test the selected filter (see ´Test import filter´).

If you want to import only parts of the data provided by an existing import filter, use ´Field transfer sets´.

 

NOTES

Following file formats currently can be detected and loaded: GENBANK, RDP: GENBANK, EBI and FastA

Several uncommon file formats (including AE2, GCG and DSSP) are kept in directory '$ARBHOME/lib/import/older'. To make them available, copy or symlink them into ''$ARBHOME/lib/import' or into your local filter directory '~/.arb_prop/filter/'.

To import big new databases into an existing ARB database, convert it to the ARB format first, save and merge it with the ARB_MERGE tool.

For importing other formats such as PHYLIP or PAUP into an existing ARB database use the 'Import sequences using Readseq' function accessible via the 'File' menu of the 'ARB_NT' main menu. See ´readseq [docindex]´.

If 'AUTO DETECT' does not find any format, selecting a format by hand most likely wont help you (exception: universal format).

 

WARNINGS

When using 'AUTO DETECT', check if the correct format is detected. RDP files may for instance be identified as GenBank. In such case choose 'rdp.ift' manually.

 

BUGS

'AUTO DETECT' looks for certain key-words in the files. If it can't find these words, it does not accept the file, even if the file has the correct format. This is especially true for the gcg format.