Sequence -------- wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/centromeres.txt.gz gunzip centromeres.txt cut -f 2-5 centromeres.txt centromeres.bed twoBitMask hg38.2bit centromeres.bed hg38-centromeresMasked.2bit twoBitToFa hg38-centromeresMasked.2bit tmp.fa maskOutFa tmp.fa hard dfamseq rm tmp.fa Benchmark --------- Garlic-1.3 createFakeSequence.pl -v -m hg19 -s 3099000000 --useBED --dir . -k 4 -o benchmark --no_repeats -l 2.2 split -n 5 benchmark cat xaa > tmp echo >> tmp echo ">artificial_sequence_2" >> tmp cat xab >> tmp echo >> tmp echo ">artificial_sequence_3" >> tmp cat xac >> tmp echo >> tmp echo ">artificial_sequence_4" >> tmp cat xad >> tmp echo >> tmp echo ">artificial_sequence_5" >> tmp cat xae >> tmp rm xaa xab xac xad xae mv tmp benchmark Process Into Dfam ----------------- /home/roberthubley/dfam/branches/release_2.0/Scripts/update/make_dfamseq.pl -a hg38 /var/lib/XfamProduction/Dfam/dfamseq/hg38