{ "cells": [ { "cell_type": "markdown", "id": "f791b70f", "metadata": {}, "source": [ "# oggmap: Step 1 - get taxonomic information\n", "\n", "This notebook will demonstrate how to get taxonomic information for your query species with `oggmap`.\n", "\n", "Given a species name or taxonomic ID, the query species lineage information is in `oggmap` version `v0.0.1` extracted with the help of the `ete3` python toolkit and the `NCBI taxonomy` ([Huerta-Cepas et al., 2016](https://doi.org/10.1093/molbev/msw046)). In `oggmap` version `v0.0.2` the taxonomic information is ectracted with `taxadb2` (see here for more information [taxadb2](https://pypi.org/project/taxadb2/)). This information is needed alongside with the taxonomic classifications for all species used in the OrthoFinder comparison." ] }, { "cell_type": "markdown", "id": "b722f9d1", "metadata": {}, "source": [ "__Note:__ If you need to download or update the NCBI taxonomy database via the `ete3` python package and `oggmap` version `v0.0.1`. Please use the `oggmap` command line function [ncbitax](https://oggmap.readthedocs.io/en/latest/tutorials/commandline.ncbitax.html) or run the following code:" ] }, { "cell_type": "raw", "id": "07acee84", "metadata": {}, "source": [ "# command line\n", "oggmap ncbitax -u\n", "# import submodule\n", "from oggmap import ncbitax\n", "ncbitax.update_ncbi()" ] }, { "cell_type": "markdown", "id": "6427785c", "metadata": {}, "source": [ "__Note:__ If you need to download or update the NCBI taxonomy database via the `taxadb2` python package and `oggmap` version `v0.0.2`. Please use the `oggmap` command line function [ncbitax](https://oggmap.readthedocs.io/en/latest/tutorials/commandline.ncbitax.html) or run the following code:" ] }, { "cell_type": "raw", "id": "88c15ebd", "metadata": {}, "source": [ "# command line\n", "oggmap ncbitax -u -outdir taxadb -t taxa -dbname taxadb.sqlite \n", "# import submodule\n", "import sys\n", "from oggmap import ncbitax\n", "outdir = 'taxadb'\n", "dbname = 'taxadb.sqlite'\n", "sys.argv = ['ncbitax', '-u', '-outdir', outdir, '-t', 'taxa', '-dbname', dbname]\n", "update_parser = ncbitax.define_parser()\n", "update_args, unknown_args = update_parser.parse_known_args()\n", "ncbitax.update_ncbi(update_args)" ] }, { "cell_type": "markdown", "id": "3087d74b", "metadata": {}, "source": [ "## Notebook file\n", "\n", "Notebook file can be obtained here:\n", "\n", "[https://raw.githubusercontent.com/kullrich/oggmap/main/docs/notebooks/query_lineage.ipynb](https://raw.githubusercontent.com/kullrich/oggmap/main/docs/notebooks/query_lineage.ipynb)" ] }, { "cell_type": "markdown", "id": "4aeb52cf", "metadata": {}, "source": [ "## Import libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "c437db69", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import scanpy as sc\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from statannot import add_stat_annotation\n", "# increase dpi\n", "%matplotlib inline\n", "#plt.rcParams['figure.dpi'] = 300\n", "#plt.rcParams['savefig.dpi'] = 300\n", "plt.rcParams['figure.figsize'] = [6, 4.5]\n", "#plt.rcParams['figure.figsize'] = [4.4, 3.3]" ] }, { "cell_type": "markdown", "id": "3b5c6174", "metadata": {}, "source": [ "## Import oggmap python package submodules" ] }, { "cell_type": "code", "execution_count": 2, "id": "c8e631aa", "metadata": {}, "outputs": [], "source": [ "# import submodules\n", "from oggmap import qlin, gtf2t2g, of2orthomap, orthomap2tei, datasets, ncbitax" ] }, { "cell_type": "markdown", "id": "b49a7a9d", "metadata": {}, "source": [ "## Get query species taxonomic lineage information\n", "\n", "The `oggmap` submodule `qlin` helps to get taxonomic information for you with the `qlin.get_qlin()` function as follows:" ] }, { "cell_type": "code", "execution_count": 3, "id": "cb83ffa1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "query name: Caenorhabditis elegans\n", "query taxID: 6239\n", "query kingdom: Eukaryota\n", "query lineage names: \n", "['root(1)', 'cellular organisms(131567)', 'Eukaryota(2759)', 'Opisthokonta(33154)', 'Metazoa(33208)', 'Eumetazoa(6072)', 'Bilateria(33213)', 'Protostomia(33317)', 'Ecdysozoa(1206794)', 'Nematoda(6231)', 'Chromadorea(119089)', 'Rhabditida(6236)', 'Rhabditina(2301116)', 'Rhabditomorpha(2301119)', 'Rhabditoidea(55879)', 'Rhabditidae(6243)', 'Peloderinae(55885)', 'Caenorhabditis(6237)', 'Caenorhabditis elegans(6239)']\n", "query lineage: \n", "[1, 131567, 2759, 33154, 33208, 6072, 33213, 33317, 1206794, 6231, 119089, 6236, 2301116, 2301119, 55879, 6243, 55885, 6237, 6239]\n" ] } ], "source": [ "# get query species taxonomic lineage information\n", "query_lineage = qlin.get_qlin(q='Caenorhabditis elegans', dbname='taxadb.sqlite')" ] }, { "cell_type": "markdown", "id": "44618b4b", "metadata": {}, "source": [ "The `query_lineage` variable now contains the following information in a list:\n", "- query name `query_lineage[0]`\n", "- query taxID `query_lineage[1]`\n", "- query lineage `query_lineage[2]`\n", "- query lineage dictionary `query_lineage[3]`\n", "- query lineage zip `query_lineage[4]`\n", "- query lineage names `query_lineage[5]`\n", "- reverse query lineage `query_lineage[6]`\n", "- query kingdom `query_lineage[7]`" ] }, { "cell_type": "code", "execution_count": 4, "id": "bc18d648", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Caenorhabditis elegans'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query name\n", "query_lineage[0]" ] }, { "cell_type": "code", "execution_count": 5, "id": "81911f0d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6239" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query taxID\n", "query_lineage[1]" ] }, { "cell_type": "code", "execution_count": 6, "id": "584d95f6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1,\n", " 131567,\n", " 2759,\n", " 33154,\n", " 33208,\n", " 6072,\n", " 33213,\n", " 33317,\n", " 1206794,\n", " 6231,\n", " 119089,\n", " 6236,\n", " 2301116,\n", " 2301119,\n", " 55879,\n", " 6243,\n", " 55885,\n", " 6237,\n", " 6239]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query lineage\n", "query_lineage[2]" ] }, { "cell_type": "code", "execution_count": 7, "id": "53a9e6aa", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1: 'root',\n", " 131567: 'cellular organisms',\n", " 2759: 'Eukaryota',\n", " 33154: 'Opisthokonta',\n", " 33208: 'Metazoa',\n", " 6072: 'Eumetazoa',\n", " 33213: 'Bilateria',\n", " 33317: 'Protostomia',\n", " 1206794: 'Ecdysozoa',\n", " 6231: 'Nematoda',\n", " 119089: 'Chromadorea',\n", " 6236: 'Rhabditida',\n", " 2301116: 'Rhabditina',\n", " 2301119: 'Rhabditomorpha',\n", " 55879: 'Rhabditoidea',\n", " 6243: 'Rhabditidae',\n", " 55885: 'Peloderinae',\n", " 6237: 'Caenorhabditis',\n", " 6239: 'Caenorhabditis elegans'}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query lineage dictionary\n", "query_lineage[3]" ] }, { "cell_type": "code", "execution_count": 8, "id": "70f24083", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(1, 'root'),\n", " (131567, 'cellular organisms'),\n", " (2759, 'Eukaryota'),\n", " (33154, 'Opisthokonta'),\n", " (33208, 'Metazoa'),\n", " (6072, 'Eumetazoa'),\n", " (33213, 'Bilateria'),\n", " (33317, 'Protostomia'),\n", " (1206794, 'Ecdysozoa'),\n", " (6231, 'Nematoda'),\n", " (119089, 'Chromadorea'),\n", " (6236, 'Rhabditida'),\n", " (2301116, 'Rhabditina'),\n", " (2301119, 'Rhabditomorpha'),\n", " (55879, 'Rhabditoidea'),\n", " (6243, 'Rhabditidae'),\n", " (55885, 'Peloderinae'),\n", " (6237, 'Caenorhabditis'),\n", " (6239, 'Caenorhabditis elegans')]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query lineage zip\n", "query_lineage[4]" ] }, { "cell_type": "code", "execution_count": 9, "id": "6c911c5e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PSnumPStaxIDPSname
001root
11131567cellular organisms
222759Eukaryota
3333154Opisthokonta
4433208Metazoa
556072Eumetazoa
6633213Bilateria
7733317Protostomia
881206794Ecdysozoa
996231Nematoda
1010119089Chromadorea
11116236Rhabditida
12122301116Rhabditina
13132301119Rhabditomorpha
141455879Rhabditoidea
15156243Rhabditidae
161655885Peloderinae
17176237Caenorhabditis
18186239Caenorhabditis elegans
\n", "
" ], "text/plain": [ " PSnum PStaxID PSname\n", "0 0 1 root\n", "1 1 131567 cellular organisms\n", "2 2 2759 Eukaryota\n", "3 3 33154 Opisthokonta\n", "4 4 33208 Metazoa\n", "5 5 6072 Eumetazoa\n", "6 6 33213 Bilateria\n", "7 7 33317 Protostomia\n", "8 8 1206794 Ecdysozoa\n", "9 9 6231 Nematoda\n", "10 10 119089 Chromadorea\n", "11 11 6236 Rhabditida\n", "12 12 2301116 Rhabditina\n", "13 13 2301119 Rhabditomorpha\n", "14 14 55879 Rhabditoidea\n", "15 15 6243 Rhabditidae\n", "16 16 55885 Peloderinae\n", "17 17 6237 Caenorhabditis\n", "18 18 6239 Caenorhabditis elegans" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query lineage names\n", "query_lineage[5]" ] }, { "cell_type": "code", "execution_count": 10, "id": "b1edb441", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[6239,\n", " 6237,\n", " 55885,\n", " 6243,\n", " 55879,\n", " 2301119,\n", " 2301116,\n", " 6236,\n", " 119089,\n", " 6231,\n", " 1206794,\n", " 33317,\n", " 33213,\n", " 6072,\n", " 33208,\n", " 33154,\n", " 2759,\n", " 131567,\n", " 1]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#reverse query lineage\n", "query_lineage[6]" ] }, { "cell_type": "code", "execution_count": 11, "id": "f73f8b28", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Eukaryota'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#query kingdom\n", "query_lineage[7]" ] }, { "cell_type": "markdown", "id": "f55223cc", "metadata": {}, "source": [ "## Get query species lineage as a tree object" ] }, { "cell_type": "code", "execution_count": 13, "id": "84b76ec6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'(((((((((((((((((((18/6239/Caenorhabditis_elegans:0.00000):0.00000,17/6237/Caenorhabditis:0.00000):0.00000,16/55885/Peloderinae:0.00000):0.00000,15/6243/Rhabditidae:0.00000):0.00000,14/55879/Rhabditoidea:0.00000):0.00000,13/2301119/Rhabditomorpha:0.00000):0.00000,12/2301116/Rhabditina:0.00000):0.00000,11/6236/Rhabditida:0.00000):0.00000,10/119089/Chromadorea:0.00000):0.00000,9/6231/Nematoda:0.00000):0.00000,8/1206794/Ecdysozoa:0.00000):0.00000,7/33317/Protostomia:0.00000):0.00000,6/33213/Bilateria:0.00000):0.00000,5/6072/Eumetazoa:0.00000):0.00000,4/33208/Metazoa:0.00000):0.00000,3/33154/Opisthokonta:0.00000):0.00000,2/2759/Eukaryota:0.00000):0.00000,1/131567/cellular_organisms:0.00000):0.00000,0/1/root:0.00000):0.00000;\\n'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sys\n", "from Bio import Phylo\n", "from io import StringIO\n", "lineage_tree = qlin.get_lineage_topo(qt='6239', dbname='taxadb.sqlite')\n", "newick_str = StringIO()\n", "Phylo.write(lineage_tree, newick_str, \"newick\")\n", "newick_str.seek(0)\n", "newick_str.read()" ] }, { "cell_type": "markdown", "id": "ed2f9d7c", "metadata": {}, "source": [ "If you like to continue, please have a look at the documentation of [Step 2 - gene age class assignment](https://oggmap.readthedocs.io/en/latest/tutorials/get_orthomap.html) to get further insides." ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:oggmap_env]", "language": "python", "name": "conda-env-oggmap_env-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" } }, "nbformat": 4, "nbformat_minor": 5 }