Ortho DB v12 data dump consists of: odb12v1_species.tab.gz Ortho DB organism ids based on NCBI taxonomy ids (mostly species level) odb12v1_levels.tab.gz NCBI taxonomy levels (clades) where orthologous groups (OGs) are calculated odb12v1_level2species.tab.gz correspondence between level ids and organism ids odb12v1_genes.tab.gz Ortho DB genes with some info odb12v1_gene_xrefs.tab.gz xrefs associated with Ortho DB gene odb12v1_OGs.tab.gz Ortho DB orthologous groups (OG) odb12v1_OG2genes.tab.gz OGs to genes correspondence odb12v1_OG_pairs.tab.gz OG hierarchy coded as descendant-antecedent pairs odb12v1_OG_xrefs.tab.gz OG associations with GO, COG and InterPro ids odb12v1_aa_fasta.gz Fasta-formatted AA sequences, all organisms, all genes odb12v1_og_aa_fasta.gz Fasta-formatted AA sequences, all organisms, all genes present in OGs odb12v1_cds_fasta.gz Fasta-formatted CDS sequences, all organisms, all genes odb12v1_dna_fasta.tgz Fasta-formatted genomic source DNA sequences, all organisms, all genes README.txt Main readme file The non-fasta files are in tab-separated format without column headers. The fasta files have headers with orthodb internal gene id as well as a public id. ----------------------------------------------------------------- odb12v1_species.tab 1. NCBI tax id 2. Ortho DB individual organism id, based on NCBI tax id 3. scientific name inherited from the most relevant NCBI tax id 4. genome asssembly id, when available 5. total count of clustered genes in this species 6. total count of the OGs it participates 7. mapping type, clustered(C) or mapped(M) odb12v1_levels.tab: 1. level NCBI tax id 2. scientific name 3. total non-redundant count of genes in all underneath clustered species 4. total count of OGs built on it 5. total non-redundant count of species underneath odb12v1_level2species.tab 1. top-most level NCBI tax id, one of {2, 2157, 2759, 10239} 2. Ortho DB organism id 3. number of hops between the top-most level id and the NCBI tax id assiciated with the organism 4. ordered list of Ortho DB selected intermediate levels from the top-most level to the bottom one odb12v1_genes.tab 1. Ortho DB unique gene id (not stable between releases) 2. Ortho DB individual organism id, composed of NCBI tax id and suffix 3. protein original sequence id, as downloaded along with the sequence 4. semicolon-separated list of synonyms, evaluated by mapping 5. Uniprot id, evaluated by mapping 6. semicolon-separated list of ids from Ensembl, evaluated by mapping 7. NCBI gid or gene name, evaluated by mapping 8. description, evaluated by mapping 9. genomic coordinates relative to genomic DNA, from the source GBFF data 10. genomic DNA id 11. chromosome odb12v1_gene_xrefs.tab 1. Ortho DB gene id 2. external gene identifier, either mapped or the original sequence id from Genes table 3. external DB name, one of {GOterm, InterPro, NCBIproteinGI, UniProt, ENSEMBL, NCBIgid, NCBIgenename} odb12v1_OGs.tab 1. OG unique id (not stable and re-used between releases) 2. level tax_id on which the group was built 3. OG name (the most common gene name within the group) odb12v1_OG2genes.tab 1. OG unique id 2. Ortho DB gene id odb12v1_OG_pairs.tab 1. descendant OG id 2. antecedent OG id odb12v1_OG_xrefs.tab 1. OG unique id 2. external DB or DB section 3. external identifier 4. number of genes in the OG associated with the identifier