Google
More docs on the ARB website.
See also index of helppages.
Last update on 04. Mar 2022 .
Main topics:
Related topics:

    Optimize database compression

    OCCURRENCE  

    ARB_NT

     

    DESCRIPTION  

    Sequence data normally need's a lot of memory. To be able to handle thousands of sequences we implemented an online compression. All data is compressed most of the time and only uncompressed on demand. As a user you only find smaller database files, that's all. Without understanding the data, the program can compress data only by a limited factor. With the help of a tree aligned sequences can be compressed much better by storing only the differences to a consensus sequence. Once a sequence is compressed using a tree, it will keep the good compression method until it is changed. Then only the older method is used. As long as you change only a few (up to 100) sequences, the database won't grow very much.

    To compress the entire database, the program needs a tree, which should cover most of the sequences. The larger and better the tree, the better the compression.

     

    EXAMPLES  

    10000 aligned 16s sequences need 50 mega-bytes of memory. Without your help ARB will reduce them to 10 mega-bytes, and given a tree not more than 2 mega-bytes will be needed.

     

    NOTES  

    Any major database update, especially inserting or deleting gaps in an alignment, should be followed by a new optimization step.

     

    BUGS  

    No bugs known