DAS Tool 介绍

张开发
2026/4/13 16:07:09 15 分钟阅读

分享文章

DAS Tool 介绍
DAS ToolDAS Tool 是一种自动化的处理方法, 集成了多个 binning 算法的结果, 从而从单个 assembly 结果中获取优质的, 非冗余的 bins. 与其他方法相比, 其可以从土壤基因组中重建更多接近完整的基因组12安装DAS Tool 可以通过 Bioconda 安装. 存储库.conda install -c bioconda das_tool使用方法基本使用方式(例 1) 对 MetaBAT, MaxBin, Concot, TourESOM 的 binning 结果运行 DAS Tool.$ ./DAS_Tool-i\sample_data/sample.human.gut_concoct_scaffolds2bin.tsv,\sample_data/sample.human.gut_maxbin2_scaffolds2bin.tsv,\sample_data/sample.human.gut_metabat_scaffolds2bin.tsv,\sample_data/sample.human.gut_tetraESOM_scaffolds2bin.tsv\-lconcoct,maxbin,metabat,tetraESOM\-csample_data/sample.human.gut_contigs.fa\-osample_output/DASToolRun1其中-i指定不同 binning 软件输出的 bin,-l指定标签, 也就是对应 binning 结果的输出软件,-c指定用于此次 binning 的叠连群, 指定为 fasta 格式.-o指定输出文件前缀.注意,-i输入的最后一个文件名之后,不允许以,结尾.输入文件bins用逗号分隔的 bin 表-i, --bins methodA.scaffolds2bin,...,methodN.scaffolds2bin列表为用\t分隔的 scaffold-IDs 和 bin-IDs, 如下:Scaffold_1 bin.01 Scaffold_8 bin.01 Scaffold_42 bin.02 Scaffold_49 bin.03ContigsFASTA 格式的叠连群 (contigs)-c, --contigs contigs.fa也就是用于 binning 的 assembly 文件, 如下:Scaffold_1 ATCATCGTCCGCATCGACGAATTCGGCGAACGAGTACCCCTGACCATCTCCGATTA... Scaffold_2 GATCGTCACGCAGGCTATCGGAGCCTCGACCCGCAAGCTCTGCGCCTTGGAGCAGG...(可选) Proteins预先预测的蛋白序列--proteins proteins.faa格式如Scaffold_1_1 MPRKNKKLPRHLLVIRTSAMGDVAMLPHALRALKEAYPEVKVTVATKSLFHPFFEG... Scaffold_1_2 MANKIPRVPVREQDPKVRATNFEEVCYGYNVEEATLEASRCLNCKNPRCVAACPVN...输出文件输出文件包括汇总的 binning 信息, 包括质量和完整性评估 (_DASTool_Summary.txt).DAS 综合评估后输出的 binning 文件 (_DASTool_scaffolds2bin.txt), 不含标题的 tsv 文件, 第一列为 contig 名, 第二列为 bin 名, 同上.可选若设置--write_bin_evals为1 11(默认为1 11), 则保存输入bin集合的质量和完整性估计 (_[method].eval).若设置--create_plots为1 11(默认为1 11), 则显示每种方法的高质量 bin 的数量和分数分布 (_DASTool_hqBins.pdf_DASTool_scores.pdf).若设置--write_bins为1 11(默认为0 00), 则以 FASTA 格式输出 bin (DASTool_Bins).详细介绍DAS_Tool -i methodA.scaffolds2bin,...,methodN.scaffolds2bin -l methodA,...,methodN -c contigs.fa -o myOutput -i, --bins Comma separated list of tab separated scaffolds to bin tables. -c, --contigs Contigs in fasta format. -o, --outputbasename Basename of output files. -l, --labels Comma separated list of binning prediction names. (optional) --search_engine Engine used for single copy gene identification [blast/diamond/usearch]. (default: usearch) --write_bin_evals Write evaluation for each input bin set [0/1]. (default: 1) --create_plots Create binning performance plots [0/1]. (default: 1) --write_bins Export bins as fasta files [0/1]. (default: 0) --proteins Predicted proteins in prodigal fasta format (scaffoldID_geneNo). Gene prediction step will be skipped if given. (optional) --score_threshold Score threshold until selection algorithm will keep selecting bins [0..1]. (default: 0.5) --duplicate_penalty Penalty for duplicate single copy genes per bin (weight b). Only change if you know what youre doing. [0..3] (default: 0.6) --megabin_penalty Penalty for megabins (weight c). Only change if you know what youre doing. [0..3] (default: 0.5) --db_directory Directory of single copy gene database. (default: install_dir/db) --resume Use existing predicted single copy gene files from a previous run [0/1]. (default: 0) --debug Write debug information to log file. -t, --threads Number of threads to use. (default: 1) -v, --version Print version number and exit. -h, --help Show this message.Example 2: Run DAS Tool again with different parameters. Use the proteins predicted in Example 1 to skip the gene prediction step, disable writing of bin evaluations, set the number of threads to 2 and score threshold to 0.6. Output files will start with the prefix DASToolRun2:$ ./DAS_Tool-isample_data/sample.human.gut_concoct_scaffolds2bin.tsv,\sample_data/sample.human.gut_maxbin2_scaffolds2bin.tsv,\sample_data/sample.human.gut_metabat_scaffolds2bin.tsv,\sample_data/sample.human.gut_tetraESOM_scaffolds2bin.tsv\-lconcoct,maxbin,metabat,tetraESOM\-csample_data/sample.human.gut_contigs.fa\-osample_output/DASToolRun2\--proteinssample_output/DASToolRun1_proteins.faa\--write_bin_evals0\--threads2\--score_threshold0.6输入文件的制备不是所有的 binning 工具都以\t分隔的 scaffold-ID 和 bin-ID 文件形式输出. DAS 工具同时提供了一个脚本, 将一组 fasta 格式的 bin 转化为 “scaffolds2bin” 表格, 用于 DAS Tool 的输入: Fasta_to_Contigs2Bin使用方法$ src/Fasta_to_Contigs2Bin.sh-hFasta_to_Scaffolds2Bin: Converts genome binsinfastaformatto scaffolds-to-bin table. Usage: Fasta_to_Contigs2Bin.sh-efastamy_scaffolds2bin.tsv -e,--extensionExtension of fasta files.(default: fasta)-i,--input_folderFolder with binsinfasta format.(default: ./)-h,--helpShow this message.感谢评论区 Sophilingsky 的提醒, 之前同功能脚本Fasta_to_Scaffolds2Bin.sh已更名为Fasta_to_Contigs2Bin.sh示例$ls/maxbin/output/folder maxbin.001.fasta maxbin.002.fasta maxbin.003.fasta... $ src/Fasta_to_Scaffolds2Bin.sh-i/maxbin/output/folder-efastamaxbin.scaffolds2bin.tsv $headgut_maxbin2_scaffolds2bin.tsv NODE_10_length_127450_cov_375.783524 maxbin.001 NODE_27_length_95143_cov_427.155298 maxbin.001 NODE_51_length_78315_cov_504.322425 maxbin.001 NODE_84_length_66931_cov_376.684775 maxbin.001 NODE_87_length_65653_cov_460.202156 maxbin.001问题路径DASTool_output/需要手动创建, 否则运行结束后不会输出.出现了奇怪的错误mv: cannot stat ‘DASTool_output/_proteins.faa.scg’: No such file or directory mv: cannot stat ‘DASTool_output/_proteins.faa.scg’: No such file or directory rm: cannot remove ‘DASTool_output/_proteins.faa.findSCG.b6’: No such file or directory rm: cannot remove ‘DASTool_output/_proteins.faa.scg.candidates.faa’: No such file or directory rm: cannot remove ‘DASTool_output/_proteins.faa.all.b6’: No such file or directory使用--search_engine diamond后运行成功.DAS_Tool\-iMetaBat.scaffolds2bin.tsv,MaxBin.scaffolds2bin.tsv,CONCOCT.scaffolds2bin.tsv\-lMetaBat,MaxBin,CONCOCT\-c../scaffold.fa-oDASTool_output/--write_bins1--search_enginediamond--score_threshold0\-t${THREAD}\--debughttps://www.baidu.com/link?urlJbN0z_QhZbcz05SXOmXghq4KtVaCf00Tbp6YBX3qm3O6AB-yyFw2gN9XISe880jE3sylTvZ4mTI3k-XvDwzTg9D8mefZI0koVLxEVn_M6gk_jaRX6x8BXgfeRqsWaQmHwdeqidf554c6c4000ce067000000065eca2021 (DAS Tool for Genome Reconstruction from Metagenomes) ↩︎https://doi.org/10.1038/s41564-018-0171-1 (Christian M. K. Sieber, Alexander J. Probst, Allison Sharrar, Brian C. Thomas, Matthias Hess, Susannah G. Tringe Jillian F. Banfield (2018). Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nature Microbiology.) ↩︎

更多文章