Dfam Families Files =================== Two sets of Dfam families are available for download: o The files named Dfam.* include both curated (DF) and uncurated (DR) families. !!!! This is a 89GB file compressed and should only be downloaded if !!!! !!!! the entire Dfam database (curated + uncurated ) families !!!! !!!! are needed. Uncompressed, 779GB will be needed to store the file. !!!! o The files named Dfam_curatedonly.* include curated (DF) families only. Additionally, they are available in several file formats: o `*.embl` includes EMBL-formatted consensus sequences and metdata. o `*.hmm` includes profile Hidden Markov Models (pHMMs) and metadata for use with the hmmer[1] suite of tools. o `*.h5` are FamDB[2] files, in the HDF5 file format. FamDB includes both consensus sequences and pHMMs, metadata, taxonomy structure and nomenclature, indexes, and other features. A md5sum file ( `*.md5sum` ) is provided for each product for download validation. For more information on the metadata in the EMBL and HMM files, see Dfam's userman.txt [3]. [1]: http://hmmer.org/ [2]: https://github.com/Dfam-consortium/FamDB/ [3]: https://www.dfam.org/releases/Dfam_3.7/userman.txt Using Dfam with RepeatMasker ============================ RepeatMasker ships with a copy of Dfam (curated families only) in FamDB format. This can be replaced with a newer version of Dfam, or with the full set of curated and uncurated families provided the famdb.py tool in the package is also updated. To use Dfam 3.7 with RepeatMasker 4.1.4 or earlier, first download an updated copy of the famdb.py tool from: https://github.com/Dfam-consortium/FamDB and replace the file in the RepeatMasker directory. Then download the latest Dfam *.h5 file and rerun the RepeatMasker configure script.