oggmap: Step 1 - get taxonomic information
This notebook will demonstrate how to get taxonomic information for your query species with oggmap.
Given a species name or taxonomic ID, the query species lineage information is in oggmap version v0.0.1 extracted with the help of the ete3 python toolkit and the NCBI taxonomy (Huerta-Cepas et al., 2016). In oggmap version v0.0.2 the taxonomic information is ectracted with taxadb2 (see here for more information taxadb2). This information is needed alongside with the taxonomic
classifications for all species used in the OrthoFinder comparison.
Note: If you need to download or update the NCBI taxonomy database via the ete3 python package and oggmap version v0.0.1. Please use the oggmap command line function ncbitax or run the following code:
Note: If you need to download or update the NCBI taxonomy database via the taxadb2 python package and oggmap version v0.0.2. Please use the oggmap command line function ncbitax or run the following code:
Notebook file
Notebook file can be obtained here:
https://raw.githubusercontent.com/kullrich/oggmap/main/docs/notebooks/query_lineage.ipynb
Import libraries
[1]:
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
import matplotlib.pyplot as plt
from statannot import add_stat_annotation
# increase dpi
%matplotlib inline
#plt.rcParams['figure.dpi'] = 300
#plt.rcParams['savefig.dpi'] = 300
plt.rcParams['figure.figsize'] = [6, 4.5]
#plt.rcParams['figure.figsize'] = [4.4, 3.3]
Import oggmap python package submodules
[2]:
# import submodules
from oggmap import qlin, gtf2t2g, of2orthomap, orthomap2tei, datasets, ncbitax
Get query species taxonomic lineage information
The oggmap submodule qlin helps to get taxonomic information for you with the qlin.get_qlin() function as follows:
[3]:
# get query species taxonomic lineage information
query_lineage = qlin.get_qlin(q='Caenorhabditis elegans', dbname='taxadb.sqlite')
query name: Caenorhabditis elegans
query taxID: 6239
query kingdom: Eukaryota
query lineage names:
['root(1)', 'cellular organisms(131567)', 'Eukaryota(2759)', 'Opisthokonta(33154)', 'Metazoa(33208)', 'Eumetazoa(6072)', 'Bilateria(33213)', 'Protostomia(33317)', 'Ecdysozoa(1206794)', 'Nematoda(6231)', 'Chromadorea(119089)', 'Rhabditida(6236)', 'Rhabditina(2301116)', 'Rhabditomorpha(2301119)', 'Rhabditoidea(55879)', 'Rhabditidae(6243)', 'Peloderinae(55885)', 'Caenorhabditis(6237)', 'Caenorhabditis elegans(6239)']
query lineage:
[1, 131567, 2759, 33154, 33208, 6072, 33213, 33317, 1206794, 6231, 119089, 6236, 2301116, 2301119, 55879, 6243, 55885, 6237, 6239]
The query_lineage variable now contains the following information in a list:
query name
query_lineage[0]query taxID
query_lineage[1]query lineage
query_lineage[2]query lineage dictionary
query_lineage[3]query lineage zip
query_lineage[4]query lineage names
query_lineage[5]reverse query lineage
query_lineage[6]query kingdom
query_lineage[7]
[4]:
#query name
query_lineage[0]
[4]:
'Caenorhabditis elegans'
[5]:
#query taxID
query_lineage[1]
[5]:
6239
[6]:
#query lineage
query_lineage[2]
[6]:
[1,
131567,
2759,
33154,
33208,
6072,
33213,
33317,
1206794,
6231,
119089,
6236,
2301116,
2301119,
55879,
6243,
55885,
6237,
6239]
[7]:
#query lineage dictionary
query_lineage[3]
[7]:
{1: 'root',
131567: 'cellular organisms',
2759: 'Eukaryota',
33154: 'Opisthokonta',
33208: 'Metazoa',
6072: 'Eumetazoa',
33213: 'Bilateria',
33317: 'Protostomia',
1206794: 'Ecdysozoa',
6231: 'Nematoda',
119089: 'Chromadorea',
6236: 'Rhabditida',
2301116: 'Rhabditina',
2301119: 'Rhabditomorpha',
55879: 'Rhabditoidea',
6243: 'Rhabditidae',
55885: 'Peloderinae',
6237: 'Caenorhabditis',
6239: 'Caenorhabditis elegans'}
[8]:
#query lineage zip
query_lineage[4]
[8]:
[(1, 'root'),
(131567, 'cellular organisms'),
(2759, 'Eukaryota'),
(33154, 'Opisthokonta'),
(33208, 'Metazoa'),
(6072, 'Eumetazoa'),
(33213, 'Bilateria'),
(33317, 'Protostomia'),
(1206794, 'Ecdysozoa'),
(6231, 'Nematoda'),
(119089, 'Chromadorea'),
(6236, 'Rhabditida'),
(2301116, 'Rhabditina'),
(2301119, 'Rhabditomorpha'),
(55879, 'Rhabditoidea'),
(6243, 'Rhabditidae'),
(55885, 'Peloderinae'),
(6237, 'Caenorhabditis'),
(6239, 'Caenorhabditis elegans')]
[9]:
#query lineage names
query_lineage[5]
[9]:
| PSnum | PStaxID | PSname | |
|---|---|---|---|
| 0 | 0 | 1 | root |
| 1 | 1 | 131567 | cellular organisms |
| 2 | 2 | 2759 | Eukaryota |
| 3 | 3 | 33154 | Opisthokonta |
| 4 | 4 | 33208 | Metazoa |
| 5 | 5 | 6072 | Eumetazoa |
| 6 | 6 | 33213 | Bilateria |
| 7 | 7 | 33317 | Protostomia |
| 8 | 8 | 1206794 | Ecdysozoa |
| 9 | 9 | 6231 | Nematoda |
| 10 | 10 | 119089 | Chromadorea |
| 11 | 11 | 6236 | Rhabditida |
| 12 | 12 | 2301116 | Rhabditina |
| 13 | 13 | 2301119 | Rhabditomorpha |
| 14 | 14 | 55879 | Rhabditoidea |
| 15 | 15 | 6243 | Rhabditidae |
| 16 | 16 | 55885 | Peloderinae |
| 17 | 17 | 6237 | Caenorhabditis |
| 18 | 18 | 6239 | Caenorhabditis elegans |
[10]:
#reverse query lineage
query_lineage[6]
[10]:
[6239,
6237,
55885,
6243,
55879,
2301119,
2301116,
6236,
119089,
6231,
1206794,
33317,
33213,
6072,
33208,
33154,
2759,
131567,
1]
[11]:
#query kingdom
query_lineage[7]
[11]:
'Eukaryota'
Get query species lineage as a tree object
[13]:
import sys
from Bio import Phylo
from io import StringIO
lineage_tree = qlin.get_lineage_topo(qt='6239', dbname='taxadb.sqlite')
newick_str = StringIO()
Phylo.write(lineage_tree, newick_str, "newick")
newick_str.seek(0)
newick_str.read()
[13]:
'(((((((((((((((((((18/6239/Caenorhabditis_elegans:0.00000):0.00000,17/6237/Caenorhabditis:0.00000):0.00000,16/55885/Peloderinae:0.00000):0.00000,15/6243/Rhabditidae:0.00000):0.00000,14/55879/Rhabditoidea:0.00000):0.00000,13/2301119/Rhabditomorpha:0.00000):0.00000,12/2301116/Rhabditina:0.00000):0.00000,11/6236/Rhabditida:0.00000):0.00000,10/119089/Chromadorea:0.00000):0.00000,9/6231/Nematoda:0.00000):0.00000,8/1206794/Ecdysozoa:0.00000):0.00000,7/33317/Protostomia:0.00000):0.00000,6/33213/Bilateria:0.00000):0.00000,5/6072/Eumetazoa:0.00000):0.00000,4/33208/Metazoa:0.00000):0.00000,3/33154/Opisthokonta:0.00000):0.00000,2/2759/Eukaryota:0.00000):0.00000,1/131567/cellular_organisms:0.00000):0.00000,0/1/root:0.00000):0.00000;\n'
If you like to continue, please have a look at the documentation of Step 2 - gene age class assignment to get further insides.