{ "cells": [ { "cell_type": "markdown", "id": "d717b014", "metadata": {}, "source": [ "# oggmap: Step 4 - TEI calculation\n", "\n", "This notebook will demonstrate how to to add a transcriptome evolutionary index (short: TEI) to scRNA data." ] }, { "cell_type": "markdown", "id": "eeb58ae3", "metadata": {}, "source": [ "## Notebook file\n", "\n", "Notebook file can be obtained here:\n", "\n", "[https://raw.githubusercontent.com/kullrich/oggmap/main/docs/notebooks/add_tei.ipynb](https://raw.githubusercontent.com/kullrich/oggmap/main/docs/notebooks/add_tei.ipynb)" ] }, { "cell_type": "markdown", "id": "fd9f9c62", "metadata": {}, "source": [ "## Import libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "e81e5626", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import scanpy as sc\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from statannot import add_stat_annotation\n", "# increase dpi\n", "%matplotlib inline\n", "#plt.rcParams['figure.dpi'] = 300\n", "#plt.rcParams['savefig.dpi'] = 300\n", "plt.rcParams['figure.figsize'] = [6, 4.5]\n", "#plt.rcParams['figure.figsize'] = [4.4, 3.3]" ] }, { "cell_type": "markdown", "id": "966bf9fe", "metadata": {}, "source": [ "## Import oggmap python package submodules" ] }, { "cell_type": "code", "execution_count": 2, "id": "3d39256b", "metadata": {}, "outputs": [], "source": [ "# import submodules\n", "from oggmap import qlin, gtf2t2g, of2orthomap, orthomap2tei, datasets, ncbitax" ] }, { "cell_type": "markdown", "id": "dd9ad0db", "metadata": {}, "source": [ "## Step 0, Step 1, Step 2 and Step 3" ] }, { "cell_type": "markdown", "id": "0c112b1b", "metadata": {}, "source": [ "In order to come to Step 4, TEI calculation, one needs to have the results from Step 0, Step 1, Step 2 and Step 3.\n", "\n", "The query species in this part is: __*Danio rerio*__ (zebrafish).\n", "\n", "Please have a look at the documentation of [Step 0 - run OrthoFinder](https://oggmap.readthedocs.io/en/latest/tutorials/orthofinder.html) to get to know what information and files are mandatory to extract gene age classes from [OrthoFinder](https://oggmap.readthedocs.io/en/latest/tutorials/https://github.com/davidemms/OrthoFinder) results.\n", "\n", "In [Step 1 - get taxonomic information](https://oggmap.readthedocs.io/en/latest/tutorials/query_lineage.html) you have already been introduced how to extract query lineage information with `oggmap` and the `qlin.get_qlin()` function.\n", "\n", "In [Step 2 - gene age class assignment](https://oggmap.readthedocs.io/en/latest/tutorials/get_orthomap.html) you have already been introduced how to extract an orthomap (gene age class) from [OrthoFinder](https://oggmap.readthedocs.io/en/latest/tutorials/https://github.com/davidemms/OrthoFinder) results with `oggmap` and the `of2orthomap.get_orthomap()` function or how to import pre-calculated orthomaps with the `orthomap2tei.read_orthomap()` function.\n", "\n", "In [Step 3 - map gene/transcript IDs](https://oggmap.readthedocs.io/en/latest/tutorials/geneset_overlap.html) you have already been introduced how to extract gene IDs from `GTF` file with `orthoamp` and the `gtf2t2g.parse_gtf()` function. You have also been introduced how to use the `orthomap2tei.geneset_overlap()` function to check the overlap between the gene IDs and have learned how to use the `orthomap2tei.replace_by()` function to e.g. reduce isoform gene IDs to gene IDs." ] }, { "cell_type": "markdown", "id": "8d8488bd", "metadata": {}, "source": [ "### Step 0 - run OrthoFinder\n", "\n", "For this documentation part all mandatory [OrthoFinder](https://oggmap.readthedocs.io/en/latest/tutorials/https://github.com/davidemms/OrthoFinder) ([Emms and Kelly, 2019](https://doi.org/10.1186/s13059-019-1832-y)) results have been pre-calculated.\n", "\n", "Please have a look at the documentation of [Step 0 - run OrthoFinder](https://oggmap.readthedocs.io/en/latest/tutorials/orthofinder.html) to get further insides.\n", "\n", "The results are available here: \n", "\n", "https://doi.org/10.5281/zenodo.7242264\n", "\n", "or can be accessed with the `dataset` submodule of `oggmap`\n", "\n", "`datasets.ensembl113_last(datapath='data')` (download folder set to `'data'`)." ] }, { "cell_type": "markdown", "id": "b7809c37", "metadata": {}, "source": [ "### Step 1 - get taxonomic information\n", "\n", "Please have a look at the documentation of [Step 1 - get taxonomic information](https://oggmap.readthedocs.io/en/latest/tutorials/query_lineage.html) to get further insides." ] }, { "cell_type": "code", "execution_count": 3, "id": "c5dc5c7f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "query name: Danio rerio\n", "query taxID: 7955\n", "query kingdom: Eukaryota\n", "query lineage names: \n", "['root(1)', 'cellular organisms(131567)', 'Eukaryota(2759)', 'Opisthokonta(33154)', 'Metazoa(33208)', 'Eumetazoa(6072)', 'Bilateria(33213)', 'Deuterostomia(33511)', 'Chordata(7711)', 'Craniata(89593)', 'Vertebrata(7742)', 'Gnathostomata(7776)', 'Teleostomi(117570)', 'Euteleostomi(117571)', 'Actinopterygii(7898)', 'Actinopteri(186623)', 'Neopterygii(41665)', 'Teleostei(32443)', 'Osteoglossocephalai(1489341)', 'Clupeocephala(186625)', 'Otomorpha(186634)', 'Ostariophysi(32519)', 'Otophysi(186626)', 'Cypriniphysae(186627)', 'Cypriniformes(7952)', 'Cyprinoidei(30727)', 'Danionidae(2743709)', 'Danioninae(2743711)', 'Danio(7954)', 'Danio rerio(7955)']\n", "query lineage: \n", "[1, 131567, 2759, 33154, 33208, 6072, 33213, 33511, 7711, 89593, 7742, 7776, 117570, 117571, 7898, 186623, 41665, 32443, 1489341, 186625, 186634, 32519, 186626, 186627, 7952, 30727, 2743709, 2743711, 7954, 7955]\n" ] } ], "source": [ "# get query species taxonomic lineage information\n", "query_lineage = qlin.get_qlin(q='Danio rerio', dbname='data/taxadb.sqlite')" ] }, { "cell_type": "markdown", "id": "29f7d754", "metadata": {}, "source": [ "### Step 2 - gene age class assignment\n", "\n", "Here, `oggmap` use the query species information and [OrthoFinder](https://oggmap.readthedocs.io/en/latest/tutorials/https://github.com/davidemms/OrthoFinder) results to extract the oldest common tree node per orthogroup along a species tree and to assign this node as the gene age to the corresponding genes.\n", "\n", "Please have a look at the documentation of [Step 2 - gene age class assignment](https://oggmap.readthedocs.io/en/latest/tutorials/get_orthomap.html) to get further insides." ] }, { "cell_type": "markdown", "id": "86828cf6", "metadata": {}, "source": [ "### Step 3 - map gene/transcript IDs\n", "\n", "To be able to link gene ages assignments from an orthomap and gene or transcript of scRNA dataset, one needs to check the overlap of the annotated gene names. With the `gtf2t2g` submodule of `oggmap` and the `gtf2t2g.parse_gtf()` function, one can extract gene and transcript names from a given gene feature file (`GTF`).\n", "\n", "Please have a look at the documentation of [Step 3 - map gene/transcript IDs](https://oggmap.readthedocs.io/en/latest/tutorials/geneset_overlap.html) to get further insides." ] }, { "cell_type": "markdown", "id": "a6d05faf", "metadata": {}, "source": [ "Here, the pre-calculated orthomap from *Danio rerio* (zebrafish) obtained via Step 0, Step 1, Step 2 and Step 3 is loaded as follows:" ] }, { "cell_type": "code", "execution_count": 4, "id": "e57a22b1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100% [..........................................................................] 1901053 / 1901053" ] }, { "data": { "text/plain": [ "'data/zebrafish_ensembl_113_orthomap.tsv'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datasets.zebrafish_ensembl113_orthomap(datapath='data')" ] }, { "cell_type": "code", "execution_count": 5, "id": "5d445639", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | seqID | \n", "Orthogroup | \n", "PSnum | \n", "PStaxID | \n", "PSname | \n", "PScontinuity | \n", "geneID | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "ENSDART00000013359.10 | \n", "OG0000000 | \n", "10 | \n", "7742 | \n", "Vertebrata | \n", "0.909091 | \n", "ENSDARG00000013014 | \n", "
| 1 | \n", "ENSDART00000014270.5 | \n", "OG0000000 | \n", "10 | \n", "7742 | \n", "Vertebrata | \n", "0.909091 | \n", "ENSDARG00000094080 | \n", "
| 2 | \n", "ENSDART00000099395.5 | \n", "OG0000000 | \n", "10 | \n", "7742 | \n", "Vertebrata | \n", "0.909091 | \n", "ENSDARG00000068659 | \n", "
| 3 | \n", "ENSDART00000145885.2 | \n", "OG0000000 | \n", "10 | \n", "7742 | \n", "Vertebrata | \n", "0.909091 | \n", "ENSDARG00000094125 | \n", "
| 4 | \n", "ENSDART00000159675.3 | \n", "OG0000000 | \n", "10 | \n", "7742 | \n", "Vertebrata | \n", "0.909091 | \n", "ENSDARG00000101806 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 24956 | \n", "ENSDART00000191935.1 | \n", "OG0025717 | \n", "25 | \n", "30727 | \n", "Cyprinoidei | \n", "1.000000 | \n", "ENSDARG00000114540 | \n", "
| 24957 | \n", "ENSDART00000171005.2 | \n", "OG0025718 | \n", "22 | \n", "186626 | \n", "Otophysi | \n", "0.666667 | \n", "ENSDARG00000102218 | \n", "
| 24958 | \n", "ENSDART00000143229.2 | \n", "OG0025719 | \n", "29 | \n", "7955 | \n", "Danio rerio | \n", "1.000000 | \n", "ENSDARG00000069978 | \n", "
| 24959 | \n", "ENSDART00000143837.3 | \n", "OG0025719 | \n", "29 | \n", "7955 | \n", "Danio rerio | \n", "1.000000 | \n", "ENSDARG00000078193 | \n", "
| 24960 | \n", "ENSDART00000143384.2 | \n", "OG0025720 | \n", "22 | \n", "186626 | \n", "Otophysi | \n", "0.666667 | \n", "ENSDARG00000092452 | \n", "
24961 rows × 7 columns
\n", "| \n", " | tei | \n", "
|---|---|
| hpf3.3_ZFHIGH_WT_DS5_AAAAGTTGCCTC | \n", "5.514485 | \n", "
| hpf3.3_ZFHIGH_WT_DS5_AAACAAGTGTAT | \n", "5.503633 | \n", "
| hpf3.3_ZFHIGH_WT_DS5_AAACACCTCGTC | \n", "5.488072 | \n", "
| hpf3.3_ZFHIGH_WT_DS5_AAATGAGGTTTN | \n", "5.50889 | \n", "
| hpf3.3_ZFHIGH_WT_DS5_AACCCTCTCGAT | \n", "5.574979 | \n", "
| ... | \n", "... | \n", "
| hpf24_DEW057_TGACACAACAG_GCCACATC | \n", "5.189284 | \n", "
| hpf24_DEW057_CTTACGGG_AACCTGAC | \n", "5.399853 | \n", "
| hpf24_DEW057_TGAACATCTAT_GACGATGG | \n", "5.154028 | \n", "
| hpf24_DEW057_TGAGGTTTCTC_CTCAGAAT | \n", "5.050597 | \n", "
| hpf24_DEW057_ACGTGCTAG_CAAGTCAT | \n", "5.344325 | \n", "
71203 rows × 1 columns
\n", "