土壤微生物擴增子定序物種分類指派策略之研究 Analysis of taxonomic annotation strategies for soil microbiota amplicon sequencing
2022
A:PS
Files
Details
Title
土壤微生物擴增子定序物種分類指派策略之研究 Analysis of taxonomic annotation strategies for soil microbiota amplicon sequencing
Author
Publication Date
2022
Call Number
A:PS
Summary
應用16S rRNA基因擴增子定序進行土壤微生物物種DNA條碼鑑定,是近年微生物群落的研究趨勢,是一種高通量、標準化的方法學。DADA2是適合Illumina定序平台使用的新一代分裂擴增子降噪演算法,可提供高解析的擴增子序列變體(amplicon sequence variants; ASVs)資料,如何連結微生物二名法與高解析資料,對土壤微生物群落後續分析顯得更加重要。本研究使用DADA2套件處理土壤樣品定序資料,比較3種不同的物種分類指派(taxonomic assignment)流程,結果顯示DADA2套件內件之assignTaxonomy指令搭配包含菌種名的SILVA 138參考序列訓練集(training set),有最好的物種分類指派效能。另以二元分類法評估DADA2套件適用的SILVA 138、SILVA 138.1、GTDB與RefSeq + RDP參考序列訓練集,對土壤微生物物種分類指派之效能,研究顯示GTDB訓練集敏感度最高,SILVA 138與SILVA 138.1訓練集具有最佳特異性,而RefSeq + RDP訓練集物種分類指派結果之正確率、正確覆蓋率、馬修斯相關係數、陽性預測率指標均高於其他訓練集。微生物多樣性分析結果則顯示,GTDB訓練集之物種分類指派結果最貼近原始ASVs資料,最能反應真實土壤微生物群落狀況。本研究揭示物種分類指派流程與參考序列訓練集的選擇,對微生物物種鑑別有很大的影響,隨著16S rRNA基因參考序列資料庫不斷地更新,更應該謹慎選擇與反覆評估,才能更準確的描述微生物間的多樣性關係。
The 16S rRNA gene amplicon sequencing is a high-throughput and gold-standard approach employed in DNA barcoding technique for soil microbial community study. DADA2 implements the divisive amplicon denoising algorithm and produces higher-resolution data sets of amplicon sequence variants (ASVs) for the Illumina sequencing platform. The importance is even greater to link microbial binomial nomenclature and high-resolution ASVs data for subsequent community diversity analysis. In this study, we performed a comparative study of three taxonomic assignment pipelines using DADA2 processed datasets. The efficiency of taxonomic annotation showed that DADA2's assign Taxonomy algorithm goes well with the SILVA 138 reference training set (with Species). Here we used a binary classification test to evaluate the ability of four DADA2-formatted reference training sets (SILVA 138, SILVA 138.1, GTDB, and RefSeq + RDP) in soil microbial classification. The results showed that the GTDB training set had the highest sensitivity, and both SILVA 138 and SILVA 138.1 training sets had the best specificity. While the RefSeq + RDP training set showed the best performing descriptors of accuracy, coverage, Matthews correlation coefficient, and positive predictive value than other training sets. However, the results of microbial diversity analysis showed that the taxonomic assignment of the GTDB training set was the closest to the original ASVs data, reflecting the best soil microbial community compositions. This study revealed that the selection of the taxonomic assignment pipelines and the 16S rDNA reference training set had a great impact on microbial identification. With the continuous updating of the 16S rDNA reference database, we should curate our taxonomic profiling results more carefully to obtain a better microbial diversity description.
The 16S rRNA gene amplicon sequencing is a high-throughput and gold-standard approach employed in DNA barcoding technique for soil microbial community study. DADA2 implements the divisive amplicon denoising algorithm and produces higher-resolution data sets of amplicon sequence variants (ASVs) for the Illumina sequencing platform. The importance is even greater to link microbial binomial nomenclature and high-resolution ASVs data for subsequent community diversity analysis. In this study, we performed a comparative study of three taxonomic assignment pipelines using DADA2 processed datasets. The efficiency of taxonomic annotation showed that DADA2's assign Taxonomy algorithm goes well with the SILVA 138 reference training set (with Species). Here we used a binary classification test to evaluate the ability of four DADA2-formatted reference training sets (SILVA 138, SILVA 138.1, GTDB, and RefSeq + RDP) in soil microbial classification. The results showed that the GTDB training set had the highest sensitivity, and both SILVA 138 and SILVA 138.1 training sets had the best specificity. While the RefSeq + RDP training set showed the best performing descriptors of accuracy, coverage, Matthews correlation coefficient, and positive predictive value than other training sets. However, the results of microbial diversity analysis showed that the taxonomic assignment of the GTDB training set was the closest to the original ASVs data, reflecting the best soil microbial community compositions. This study revealed that the selection of the taxonomic assignment pipelines and the 16S rDNA reference training set had a great impact on microbial identification. With the continuous updating of the 16S rDNA reference database, we should curate our taxonomic profiling results more carefully to obtain a better microbial diversity description.
Journal Citation
71(3):267-279, 台灣農業研究 (JOURNAL OF TAIWAN AGRICULTURAL RESEARCH)
Contact Information
Record Appears in