{ "cells": [ { "cell_type": "markdown", "id": "1837a910", "metadata": {}, "source": [ "# Case study: re-analysis of hydra (*Hydra vulgaris*) single-cell data\n", "\n", "This notebook will demonstrate scRNA-seq processing with orthomap using hydra scRNA data from ([Cazet et al., 2022](https://doi.org/10.1101/2022.06.21.496857)).\n", "\n", "scRNA data were obtained from https://research.nhgri.nih.gov/HydraAEP, converted into Scanpy `AnnData` objects ([Wolf et al., 2018](https://doi.org/10.1186/s13059-017-1382-0)) and are availabe here:\n", "\n", "https://doi.org/10.5281/zenodo.7366178\n", "\n", "or can be accessed with the `dataset` submodule of `oggmap`\n", "\n", "`datasets.cazet22(datapath='data')` (download folder set to `'data'`)." ] }, { "cell_type": "markdown", "id": "4cb20c22", "metadata": {}, "source": [ "## Notebook file\n", "\n", "Notebook file can be obtained here:\n", "\n", "[https://raw.githubusercontent.com/kullrich/oggmap/main/docs/notebooks/hvulgaris_example.ipynb](https://raw.githubusercontent.com/kullrich/oggmap/main/docs/notebooks/hvulgaris_example.ipynb)" ] }, { "cell_type": "markdown", "id": "e8514a1e", "metadata": {}, "source": [ "## Steps\n", "\n", "To process the scRNA data, we will do the following:\n", "\n", "0. Use pre-calculated gene age classification\n", "1. Get query species taxonomic lineage information\n", "2. Get query species orthomap\n", "3. Map OrthoFinder gene names and scRNA gene/transcript names\n", "4. Get TEI values and add them to scRNA dataset\n", "5. Get partial TEI values to visualize gene age class contributions\n", "6. Process scRNA data and visualize TEI" ] }, { "cell_type": "markdown", "id": "1e19f53a", "metadata": {}, "source": [ "## Import libraries" ] }, { "cell_type": "code", "execution_count": 1, "id": "31b6bd96", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import scanpy as sc\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from statannot import add_stat_annotation\n", "# increase dpi\n", "%matplotlib inline\n", "#plt.rcParams['figure.dpi'] = 300\n", "#plt.rcParams['savefig.dpi'] = 300\n", "#plt.rcParams['figure.figsize'] = [6, 4.5]\n", "plt.rcParams['figure.figsize'] = [4.4, 3.3]" ] }, { "cell_type": "markdown", "id": "66de61db", "metadata": {}, "source": [ "## Import oggmap python package submodules" ] }, { "cell_type": "code", "execution_count": 2, "id": "419c452d", "metadata": {}, "outputs": [], "source": [ "# import submodules\n", "from oggmap import qlin, gtf2t2g, of2orthomap, orthomap2tei, datasets" ] }, { "cell_type": "markdown", "id": "8eddce09", "metadata": {}, "source": [ "## Step 0 - Use pre-calculated gene age classification\n", "\n", "Orthomap was pre-calculated ([Cazet et al., 2022](https://doi.org/10.1101/2022.06.21.496857)) and obtained from here https://research.nhgri.nih.gov/HydraAEP, it is also available here:\n", "\n", "https://doi.org/10.5281/zenodo.7242263\n", "\n", "or can be accessed with the `dataset` submodule of `oggmap`\n", "\n", "`datasets.cazet22_orthomap('data')` (download folder set to `'data'`).\n", "\n", "If you want to use your own OrthoFinder results:\n", "\n", "`oggmap` can extract gene age classification from existing OrthoFinder results and link them with scRNA data.\n", "\n", "A detailed how-to is available here:\n", "\n", "https://orthomap.readthedocs.io/en/latest/tutorials/orthofinder.html" ] }, { "cell_type": "code", "execution_count": 3, "id": "b8ad737f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100% [........................................................] 264477 / 264477" ] }, { "data": { "text/plain": [ "'data/Cazet2022_Orthomap.tsv'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# download pre-calculated orthomap into data folder\n", "datasets.cazet22_orthomap('data')" ] }, { "cell_type": "markdown", "id": "43adeaa5", "metadata": {}, "source": [ "## Step 1 - get query species taxonomic lineage information\n", "\n", "Given a species name or taxonomic ID, the query species lineage information is extracted with the help of the `ete3` python toolkit and the `NCBI taxonomy` ([Huerta-Cepas et al., 2016](https://doi.org/10.1093/molbev/msw046)). This information is needed alongside with the taxonomic classifications for all species used in the OrthoFinder comparison.\n", "\n", "The `oggmap` submodule `qlin` helps to get this information for you with the `qlin.get_qlin()` function as follows:" ] }, { "cell_type": "code", "execution_count": 4, "id": "72567473", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "query name: Hydra vulgaris\n", "query taxID: 6087\n", "query kingdom: Eukaryota\n", "query lineage names: \n", "['root(1)', 'cellular organisms(131567)', 'Eukaryota(2759)', 'Opisthokonta(33154)', 'Metazoa(33208)', 'Eumetazoa(6072)', 'Cnidaria(6073)', 'Hydrozoa(6074)', 'Hydroidolina(37516)', 'Anthoathecata(406427)', 'Aplanulata(1612408)', 'Hydridae(6080)', 'Hydra(6083)', 'Hydra vulgaris(6087)']\n", "query lineage: \n", "[1, 131567, 2759, 33154, 33208, 6072, 6073, 6074, 37516, 406427, 1612408, 6080, 6083, 6087]\n" ] } ], "source": [ "# get query species taxonomic lineage information\n", "query_lineage = qlin.get_qlin(q='Hydra vulgaris')" ] }, { "cell_type": "markdown", "id": "9228bdf0", "metadata": {}, "source": [ "## Step 2 - gene age class assignment (query species orthomap)\n", "\n", "Orthomap was pre-calculated ([Cazet et al., 2022](https://doi.org/10.1101/2022.06.21.496857)) and obtained from here https://research.nhgri.nih.gov/HydraAEP, it is also available here:\n", "\n", "\n", "https://github.com/cejuliano/brown_hydra_genomes/blob/main/06_geneAge/geneAge.csv\n", "\n", "and here:\n", "\n", "https://doi.org/10.5281/zenodo.7242263\n", "\n", "or can be accessed with the `dataset` submodule of `oggmap`\n", "\n", "`datasets.cazet22_orthomap('data')` (download folder set to `'data'`).\n", "\n", "The pre-calculated orthomap can be imported with the `read_orthomap` function from the `orthomap2tei` submodule as follwos:" ] }, { "cell_type": "code", "execution_count": 5, "id": "660dc0c8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | age | \n", "ID | \n", "ageN | \n", "
|---|---|---|---|
| 0 | \n", "N36 | \n", "G013495 | \n", "11 | \n", "
| 1 | \n", "N36 | \n", "G012562 | \n", "11 | \n", "
| 2 | \n", "N36 | \n", "G013704 | \n", "11 | \n", "
| 3 | \n", "N36 | \n", "G012561 | \n", "11 | \n", "
| 4 | \n", "N36 | \n", "G013496 | \n", "11 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| 19944 | \n", "N0 | \n", "G000765 | \n", "1 | \n", "
| 19945 | \n", "N0 | \n", "G000767 | \n", "1 | \n", "
| 19946 | \n", "N0 | \n", "G001616 | \n", "1 | \n", "
| 19947 | \n", "N0 | \n", "G015670 | \n", "1 | \n", "
| 19948 | \n", "N0 | \n", "G024576 | \n", "1 | \n", "
19949 rows × 3 columns
\n", "| \n", " | orig.ident | \n", "nCount_RNA | \n", "nFeature_RNA | \n", "nCount_SCT | \n", "nFeature_SCT | \n", "integrated_snn_res.0.7 | \n", "seurat_clusters | \n", "curatedIdent | \n", "mg1 | \n", "mg2 | \n", "... | \n", "ZNF449;MA1656.1 | \n", "ZNF652;MA1657.1 | \n", "FOXA3;MA1683.1 | \n", "ceh-38;MA1699.1 | \n", "Clamp;MA1700.1 | \n", "elt-2;MA1701.1 | \n", "Pdp1;MA1702.1 | \n", "pqm-1;MA1703.1 | \n", "zip-8;MA1704.1 | \n", "TAI | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TTTCCGTAGAAN-D01-D1_S1 | \n", "D01-D1_S1 | \n", "64276.0 | \n", "6699 | \n", "8505.0 | \n", "3008 | \n", "11 | \n", "11 | \n", "Ec_Head | \n", "0.000000 | \n", "0.000000 | \n", "... | \n", "0.005337 | \n", "0.000000 | \n", "0.058842 | \n", "0.021006 | \n", "0.238216 | \n", "0.136273 | \n", "0.000000 | \n", "0.000000 | \n", "0.914948 | \n", "3.797976 | \n", "
| CAGTACCCGCTT-D01-D1_S1 | \n", "D01-D1_S1 | \n", "63988.0 | \n", "6380 | \n", "7836.0 | \n", "2400 | \n", "9 | \n", "9 | \n", "En_Foot | \n", "0.070943 | \n", "0.000000 | \n", "... | \n", "0.000000 | \n", "0.000000 | \n", "0.465768 | \n", "0.000000 | \n", "0.003879 | \n", "0.050755 | \n", "0.000000 | \n", "0.001250 | \n", "0.235164 | \n", "3.969117 | \n", "
| CTTTTCCGATGA-D01-D1_S1 | \n", "D01-D1_S1 | \n", "69511.0 | \n", "6770 | \n", "8178.0 | \n", "2645 | \n", "0 | \n", "0 | \n", "En_BodyCol/SC | \n", "0.000000 | \n", "0.013072 | \n", "... | \n", "0.000000 | \n", "0.000000 | \n", "1.003803 | \n", "0.097580 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.434233 | \n", "3.526038 | \n", "
| GCTCCCGCCCGC-D01-D1_S1 | \n", "D01-D1_S1 | \n", "69530.0 | \n", "6241 | \n", "7915.0 | \n", "2251 | \n", "6 | \n", "6 | \n", "En_Head | \n", "0.022236 | \n", "0.007755 | \n", "... | \n", "0.001358 | \n", "0.000000 | \n", "1.322596 | \n", "0.676596 | \n", "0.000000 | \n", "0.005780 | \n", "0.000000 | \n", "0.000000 | \n", "0.646697 | \n", "3.636835 | \n", "
| TTTATGATTAGG-D01-D1_S1 | \n", "D01-D1_S1 | \n", "65456.0 | \n", "6867 | \n", "7889.0 | \n", "2920 | \n", "0 | \n", "0 | \n", "En_BodyCol/SC | \n", "0.000000 | \n", "0.000000 | \n", "... | \n", "0.001410 | \n", "0.006779 | \n", "0.509137 | \n", "0.000000 | \n", "0.073598 | \n", "0.001549 | \n", "0.008877 | \n", "0.022202 | \n", "0.272550 | \n", "3.520175 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| GGGTCGCCGTGC-D12-N2_S2 | \n", "D12-N2_S2 | \n", "501.0 | \n", "351 | \n", "804.0 | \n", "379 | \n", "24 | \n", "24 | \n", "I_En1N | \n", "0.033858 | \n", "0.000000 | \n", "... | \n", "0.099983 | \n", "0.000000 | \n", "0.051087 | \n", "0.000000 | \n", "0.058318 | \n", "0.000000 | \n", "0.013654 | \n", "1.106784 | \n", "0.072763 | \n", "4.124720 | \n", "
| GGCGTCTGTGCG-D12-N2_S2 | \n", "D12-N2_S2 | \n", "510.0 | \n", "309 | \n", "823.0 | \n", "334 | \n", "12 | \n", "12 | \n", "I_DesmoNB | \n", "0.000000 | \n", "0.000000 | \n", "... | \n", "0.000000 | \n", "0.986436 | \n", "0.023317 | \n", "0.024707 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.022734 | \n", "4.331113 | \n", "
| AGGGTTCGCTCA-D12-N2_S2 | \n", "D12-N2_S2 | \n", "524.0 | \n", "356 | \n", "816.0 | \n", "380 | \n", "23 | \n", "23 | \n", "I_Ec1N | \n", "0.000000 | \n", "0.015268 | \n", "... | \n", "0.000425 | \n", "0.000000 | \n", "0.007155 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "1.107668 | \n", "4.511764 | \n", "
| GGTGGGTTATAC-D12-N2_S2 | \n", "D12-N2_S2 | \n", "652.0 | \n", "383 | \n", "862.0 | \n", "386 | \n", "24 | \n", "24 | \n", "I_En1N | \n", "0.085185 | \n", "0.004089 | \n", "... | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.025188 | \n", "0.000000 | \n", "1.879879 | \n", "0.023827 | \n", "4.127208 | \n", "
| GGGTAAAGGCGG-D12-N2_S2 | \n", "D12-N2_S2 | \n", "515.0 | \n", "331 | \n", "840.0 | \n", "338 | \n", "28 | \n", "28 | \n", "I_En2N | \n", "0.000000 | \n", "0.000000 | \n", "... | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.188288 | \n", "0.319323 | \n", "0.037424 | \n", "0.234783 | \n", "0.089618 | \n", "5.029578 | \n", "
29339 rows × 675 columns
\n", "| \n", " | g1_g2_overlap | \n", "g1_ratio | \n", "g2_ratio | \n", "
|---|---|---|---|
| 0 | \n", "19949 | \n", "0.989583 | \n", "1.0 | \n", "
| \n", " | tei | \n", "
|---|---|
| TTTCCGTAGAAN-D01-D1_S1 | \n", "1.885530 | \n", "
| CAGTACCCGCTT-D01-D1_S1 | \n", "2.552758 | \n", "
| CTTTTCCGATGA-D01-D1_S1 | \n", "2.821474 | \n", "
| GCTCCCGCCCGC-D01-D1_S1 | \n", "2.825493 | \n", "
| TTTATGATTAGG-D01-D1_S1 | \n", "2.162769 | \n", "
| ... | \n", "... | \n", "
| GGGTCGCCGTGC-D12-N2_S2 | \n", "2.435130 | \n", "
| GGCGTCTGTGCG-D12-N2_S2 | \n", "2.513725 | \n", "
| AGGGTTCGCTCA-D12-N2_S2 | \n", "2.879771 | \n", "
| GGTGGGTTATAC-D12-N2_S2 | \n", "2.387097 | \n", "
| GGGTAAAGGCGG-D12-N2_S2 | \n", "1.959223 | \n", "
29339 rows × 1 columns
\n", "| curatedIdent | \n", "Ec_BasalDisk | \n", "Ec_BodyCol/SC | \n", "Ec_Head | \n", "Ec_Peduncle | \n", "Ec_Tentacle | \n", "En_BodyCol/SC | \n", "En_Foot | \n", "En_Head | \n", "En_Tentacle | \n", "I_DesmoNB | \n", "... | \n", "I_GranGl | \n", "I_ISC | \n", "I_IsoNB | \n", "I_IsoNC | \n", "I_MaleGC | \n", "I_Neuro | \n", "I_SpumMucGl | \n", "I_StenoNB | \n", "I_StenoNC | \n", "I_ZymoGl | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ps | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| 1 | \n", "0.560994 | \n", "0.595713 | \n", "0.590387 | \n", "0.581276 | \n", "0.562467 | \n", "0.589827 | \n", "0.580900 | \n", "0.581904 | \n", "0.562355 | \n", "0.589530 | \n", "... | \n", "0.598142 | \n", "0.670901 | \n", "0.574888 | \n", "0.554668 | \n", "0.635307 | \n", "0.661613 | \n", "0.605504 | \n", "0.612284 | \n", "0.553138 | \n", "0.592371 | \n", "
| 2 | \n", "0.511210 | \n", "0.496005 | \n", "0.495925 | \n", "0.498049 | \n", "0.499676 | \n", "0.508730 | \n", "0.507193 | \n", "0.513416 | \n", "0.521005 | \n", "0.444571 | \n", "... | \n", "0.502875 | \n", "0.469989 | \n", "0.453770 | \n", "0.505122 | \n", "0.497674 | \n", "0.467740 | \n", "0.484118 | \n", "0.436852 | \n", "0.505544 | \n", "0.496516 | \n", "
| 3 | \n", "0.036718 | \n", "0.033811 | \n", "0.034563 | \n", "0.037433 | \n", "0.038390 | \n", "0.040587 | \n", "0.043163 | \n", "0.040554 | \n", "0.036169 | \n", "0.028546 | \n", "... | \n", "0.036281 | \n", "0.027780 | \n", "0.028548 | \n", "0.044000 | \n", "0.035234 | \n", "0.028230 | \n", "0.036847 | \n", "0.027264 | \n", "0.044341 | \n", "0.034395 | \n", "
| 4 | \n", "0.195327 | \n", "0.172565 | \n", "0.173816 | \n", "0.187209 | \n", "0.207873 | \n", "0.169317 | \n", "0.175720 | \n", "0.175979 | \n", "0.186915 | \n", "0.151476 | \n", "... | \n", "0.191275 | \n", "0.105603 | \n", "0.173582 | \n", "0.177877 | \n", "0.110640 | \n", "0.114570 | \n", "0.174782 | \n", "0.149452 | \n", "0.188667 | \n", "0.219986 | \n", "
| 5 | \n", "0.076601 | \n", "0.064414 | \n", "0.065810 | \n", "0.067613 | \n", "0.073942 | \n", "0.068400 | \n", "0.071836 | \n", "0.071641 | \n", "0.082624 | \n", "0.183921 | \n", "... | \n", "0.065494 | \n", "0.044966 | \n", "0.181550 | \n", "0.107528 | \n", "0.053688 | \n", "0.046995 | \n", "0.065424 | \n", "0.154116 | \n", "0.119037 | \n", "0.060822 | \n", "
| 6 | \n", "0.137453 | \n", "0.108081 | \n", "0.109838 | \n", "0.116427 | \n", "0.129939 | \n", "0.089112 | \n", "0.097962 | \n", "0.095880 | \n", "0.108046 | \n", "0.205680 | \n", "... | \n", "0.096518 | \n", "0.066193 | \n", "0.232879 | \n", "0.136739 | \n", "0.080290 | \n", "0.069968 | \n", "0.098696 | \n", "0.173043 | \n", "0.133907 | \n", "0.123893 | \n", "
| 7 | \n", "0.175829 | \n", "0.164752 | \n", "0.167630 | \n", "0.174248 | \n", "0.198587 | \n", "0.126557 | \n", "0.128408 | \n", "0.136689 | \n", "0.153862 | \n", "0.231249 | \n", "... | \n", "0.155013 | \n", "0.112669 | \n", "0.181249 | \n", "0.236929 | \n", "0.142587 | \n", "0.120460 | \n", "0.127831 | \n", "0.154073 | \n", "0.215567 | \n", "0.153125 | \n", "
| 8 | \n", "0.311870 | \n", "0.248873 | \n", "0.260274 | \n", "0.267582 | \n", "0.317446 | \n", "0.287415 | \n", "0.318531 | \n", "0.302624 | \n", "0.348600 | \n", "0.228224 | \n", "... | \n", "0.223682 | \n", "0.128542 | \n", "0.264267 | \n", "0.338612 | \n", "0.174242 | \n", "0.161667 | \n", "0.280838 | \n", "0.255724 | \n", "0.331506 | \n", "0.216531 | \n", "
| 9 | \n", "0.033367 | \n", "0.031295 | \n", "0.032025 | \n", "0.035780 | \n", "0.039247 | \n", "0.045665 | \n", "0.049970 | \n", "0.042696 | \n", "0.049330 | \n", "0.019145 | \n", "... | \n", "0.023680 | \n", "0.012810 | \n", "0.025420 | \n", "0.022936 | \n", "0.026645 | \n", "0.013645 | \n", "0.019248 | \n", "0.015464 | \n", "0.023421 | \n", "0.025568 | \n", "
| 10 | \n", "0.087663 | \n", "0.077823 | \n", "0.098993 | \n", "0.092296 | \n", "0.078132 | \n", "0.034018 | \n", "0.034585 | \n", "0.037312 | \n", "0.040021 | \n", "0.024821 | \n", "... | \n", "0.046094 | \n", "0.023452 | \n", "0.043020 | \n", "0.047926 | \n", "0.031439 | \n", "0.035920 | \n", "0.069036 | \n", "0.034344 | \n", "0.044832 | \n", "0.037622 | \n", "
| 11 | \n", "0.082774 | \n", "0.055791 | \n", "0.057948 | \n", "0.065420 | \n", "0.068868 | \n", "0.098293 | \n", "0.102523 | \n", "0.085102 | \n", "0.096652 | \n", "0.039194 | \n", "... | \n", "0.043754 | \n", "0.027352 | \n", "0.045534 | \n", "0.064072 | \n", "0.044852 | \n", "0.031666 | \n", "0.049687 | \n", "0.044516 | \n", "0.073683 | \n", "0.050072 | \n", "
11 rows × 32 columns
\n", "