• BioME - Bioinformatics Multidisciplinary Environment, Instituto Metrópole Digital (IMD), Universidade Federal Do Rio Grande Do Norte (UFRN), Natal, RN, Brazil.
  • Laboratório de Biodados, Departamento de Bioquímica E Imunologia, Instituto de Ciências Biológicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG, Brazil.
  • NCBI 分类法是多种生物信息学工具和数据库的主要分类学来源,因为所有在 INSDC 上保存的序列登记的生物都按其层次结构进行组织。尽管该数据源得到了广泛的使用和应用,但将数据作为表格的替代表示形式将有助于使用信息处理生物信息学数据。为此,由于某些谱系中缺少一些分类等级,算法可能会为所有分类等级提出临时名称。为了解决这个问题,我们开发了一种算法,该算法从 NCBI 分类法中获取树结构并生成分层完整的分类表,并保持其与原始树的兼容性。该算法执行的过程包括尝试将分类等级分配给现有的进化枝或“无等级”节点(如果可能),使用其名称作为创建的分类等级名称的一部分(例如 Ord_Ornithischia)或在需要时插入父节点(例如 Cla_of_Ornithischia),这两个例子都是为恐龙短脊龙谱系给出的。新的层次结构被命名为 Taxallnomy,因为它包含所有分类等级的名称,并且它包含 41 个层次级别,对应于当前在 NCBI 分类法数据库中找到的 41 个分类等级。从Taxallnomy,用户可以获得NCBI Taxonomy数据库中所有可用分类群的41个节点的完整分类谱系,对原始树信息没有任何危害。在这项工作中,我们通过将特定等级的分类信息嵌入系统发育树并生成宏基因组学谱来证明其适用性。分类法适用于任何依赖 NCBI 分类法信息的生物信息学分析。Taxallnomy 会定期更新,但使用分布式 PERL 脚本,用户可以使用 NCBI 分类法作为输入在本地生成它。所有 Taxallnomy 资源均可在 http://bioinfo.icb.ufmg.br/taxallnomy 获得。 NCBI Taxonomy is the main taxonomic source for several bioinformatics tools and databases since all organisms with sequence accessions deposited on INSDC are organized in its hierarchical structure. Despite the extensive use and application of this data source, an alternative representation of data as a table would facilitate the use of information for processing bioinformatics data. To do so, since some taxonomic-ranks are missing in some lineages, an algorithm might propose provisional names for all taxonomic-ranks. To address this issue, we developed an algorithm that takes the tree structure from NCBI Taxonomy and generates a hierarchically complete taxonomic table, maintaining its compatibility with the original tree. The procedures performed by the algorithm consist of attempting to assign a taxonomic-rank to an existing clade or “no rank” node when possible, using its name as part of the created taxonomic-rank name (e.g. Ord_Ornithischia) or interpolating parent nodes when needed (e.g. Cla_of_Ornithischia), both examples given for the dinosaur Brachylophosaurus lineage. The new hierarchical structure was named Taxallnomy because it contains names for all taxonomic-ranks, and it contains 41 hierarchical levels corresponding to the 41 taxonomic-ranks currently found in the NCBI Taxonomy database. From Taxallnomy, users can obtain the complete taxonomic lineage with 41 nodes of all taxa available in the NCBI Taxonomy database, without any hazard to the original tree information. In this work, we demonstrate its applicability by embedding taxonomic information of a specified rank into a phylogenetic tree and by producing metagenomics profiles. Taxallnomy applies to any bioinformatics analyses that depend on the information from NCBI Taxonomy. Taxallnomy is updated periodically but with a distributed PERL script users can generate it locally using NCBI Taxonomy as input. All Taxallnomy resources are available at http://bioinfo.icb.ufmg.br/taxallnomy .