Google
More docs on the ARB website.
See also index of helppages.
Last update on 04. Mar 2022 .
Main topics:
Related topics:

    ARB: Database

    OCCURRENCE  

    ARB_NT

     

    DESCRIPTION  

    A central database of sequences and additional information (taken from public databases or supplied by the user) is stored in a binary or ASCII file (*.arb). ( and in future releases archive and delta files). The database reader auto-detects binary or ASCII mode. Brief advantages of the different file types:

    binary with fast load file:

    (+)  very fast
    (+)  runs on slow and old computers
    (-)  needs a lot of harddisc space
    => for normal operation on old machines

    binary:

    (+)  very fast
    (+)  small (compression rate: 60%-95%)
    => for normal operation

    ASCII:

    (+)  editable by standard text editors
    (+)  information can be extracted by hand
    (-)  needs an extreme amount of harddisc space
    => to check and correct a database

    All ARB tools for database handling and most of the ARB tools for data analysis act directly upon the database. The database is kept consistent at any time. Any local modifications by individual ARB tools are immediately exported to the database and all other active tools.

     

    NOTES  

    ASCII format

    DATA FORMAT

    [xxx]      means xxx is optional
    [xxx]*     means xxx is optional and can occur many times
    xxx|yyy    means xxx or yyy
    //         means comment

    ARBDB HIERARCHY

    ARB DB is a hierarchical database system, so here's a short description of the hierarchy:
    ARBDB ::=       species_data            // container containing all species
                    presets                 // global alignment and db field information
                    [extended_data]         // all SAIs
                    [tmp]                   // temporary data
                    [tree_data]             // all trees
                    ...                     // user defined entries (programmers)
    species_data::=   [species]*
    extended_data::=  [extended]*
    gene_data::=      [gene]*               // container for genes (species local)
    species::=      'name'                  // species identifier
                    ['full_name']
                    ...                     // (end) user defined fields
                    [ali_xxx]               // the alignment container(s)
                    [gene_data]             // container containing genes
    extended::=                             // analogous to species
    gene::=                                 // analogous to species
    ali_xxx::=      'data'                  // the sequence
                    ...                     // additional sequence information
    presets::=      'use'                   // default alignment
                    [alignment]*
                    [key_data]              // description of the user defined keys
    alignment::=    'alignment_name'        // name of the alignment (prefix 'ali_')
                    'alignment_len'         // length of longest sequence
                    'alignment_write_security' // default write security
                    'alignment_type'        // dna or pro
                    'aligned'               // ==1 when all sequences have the same
                                            // length else 0
    key_data::=     [key]*
    key::=          'key_name'              // name of an user defined field
                    'key_type'              // type (12=string 3=int)
    *******************************************
    *************** ASCII  BASIC **************
    *******************************************
    Note:
    • /* xxx */ is used for comments and not read
    • I use a grammar to describe the dataformat. All terminal symbols are surrounded by "'".

    ASCII::=        ['/*ARBDB ASCII*/']
                    [FIELD]*
    FIELD::=        KEY [PROTECTION] [TYPE] VALUE
                    |
                    KEY [PROTECTION] '%%' (%
                            [FIELD]*
                            %) /* Comment */
    KEY::=          'Any string of a-z|A-Z|0-9|"_"'
                    |KEY| > 2  < 256
    PROTECTION::=   ':''delete protection level''write p.l.''00'
                                    // 00 are reserved for future use
    TYPE::=         '%s'            // STRING
                    '%i'            // INTEGER
                    '%f'            // FLOAT
                    '%N'            // BYTES
                    '%I'            // BITS
                    '%F'            // FLOATS
    VALUE::=        '"string"' | '"^Astring^A"' | 'string'  //type = STRING
                    | 'int_number'                          //type = INT
                    | 'real_number'                         //type = FLOAT
                    | 'coded bytestring'                    //type = BYTES,FLOATS,
                                                            //      BITS
     

    EXAMPLES  

    None

    *******************************************
    ************** ASCII  EXAMPLE *************
    *******************************************
    /*ARBDB ASCII*/
    species_data    %% (%
            species :5000   %% (%
                    name    :7600           "EscCol10"
                    file            "ecrna3.empro"
                    full_name               "Escherichia coli"
                    acc             "V00331;"
                    ali_23all       :5000   %% (%
                            data    :7500           "...........ACGTUUU...........
                            mark    %I              "---------------++++---------
                            %) /*ali_23all*/
            species :5000   %% (%
                    name    :7600           "EscCol11"
                    file            "ecrr23s.empro"
                    full_name               "Escherichia coli"
                    ali_23all       :5000   %% (%
                            data    :7500           "...........ACGTUUUGGG.......
                            mark    %I              "---------------++++---------
                            %) /*ali_23all*/
                    %) /*species*/
            %) /*species_data*/
    presets %% (%
            use             "ali_23all"
            max_alignment_len       %i 2000
            alignment_len   %i 0
            max_name_len    %i 9
            alignment       %% (%
                    alignment_name          "ali_23all"
                    alignment_len   %i 4205
                    aligned %i 1
                    alignment_write_security        %i 5
                    alignment_type          "rna"
                    %) /*alignment*/
            key_data        %% (%
                    key     %% (%
                            key_name                "name"
                            key_type        %i 12
                            %) /*key*/
                    key     %% (%
                            key_name                "group_name"
                            key_type        %i 12
                            %) /*key*/
                    key     %% (%
                            key_name                "acc"
                            key_type        %i 12
                            %) /*key*/
                    key     %% (%
                            key_name                "ali_23all/data"
                            key_type        %i 12
                            %) /*key*/
                    key     %% (%
                            key_name                "ali_23all/mark"
                            key_type        %i 6
                            %) /*key*/
                    key     %% (%
                            key_name                "aligned"
                            key_type        %i 12
                            %) /*key*/
                    key     %% (%
                            key_name                "author"
                            key_type        %i 12
                            %) /*key*/
                    %) /*key_data*/
            %) /*presets*/
    tree_data       %% (%
            tree_main       :4400   %% (%
                    nnodes  %i 2
                    tree            "N0.014808,0.015168;N0.000360,0.000360;LEscCol10^ALEscColi^ALEscCol11^A"
                    ruler   %% (%
                            size    %f 0.100000
                            RADIAL  %% (%
                                    ruler_y %f 0.341577
                                    ruler_x %f 0.000000
                                    %) /*RADIAL*/
                            text_x  %f 0.000000
                            text_y  %f 0.000000
                            ruler_width     %i 0
                            LIST    %% (%
                                    ruler_y %f 0.000000
                                    ruler_x %f 0.000000
                                    %) /*LIST*/
                            %) /*ruler*/
                    %) /*tree_main*/
            %) /*tree_data*/
    extended_data   :7000   %% (%
            extended        %% (%
                    name            "HELIX_PAIRS"
                    ali_23all       %% (%
                            data            "............................1a..
                            %) /*ali_23all*/
                    %) /*extended*/
            extended        %% (%
                    name            "gpl5rr"
                    ali_23all       %% (%
                            phyl_options    %N      10000106D02:0C03.0D02-07.87.DB6
                            bits    %I      "-----------------------+++++++++-+-+++
                            floats  %F      10000106D04:0A.C816.425C03.5D.802F.BF03
                            %) /*ali_23all*/
                    %) /*extended*/
            %) /*extended_data*/
    tmp     %% (%
            focus   %% (%
                    species_name            "EscColi"
                    cursor_position %i 323
                    %) /*focus*/
            message         ""
            %) /*tmp*/
     

    WARNINGS  

    The ASCII version of arb needs a lot of virtual memory when loaded.