Case study: re-analysis of hydra (Hydra vulgaris) single-cell data
This notebook will demonstrate scRNA-seq processing with orthomap using hydra scRNA data from (Cazet et al., 2022).
scRNA data were obtained from https://research.nhgri.nih.gov/HydraAEP, converted into Scanpy AnnData objects (Wolf et al., 2018) and are availabe here:
https://doi.org/10.5281/zenodo.7366178
or can be accessed with the dataset submodule of oggmap
datasets.cazet22(datapath='data') (download folder set to 'data').
Notebook file
Notebook file can be obtained here:
https://raw.githubusercontent.com/kullrich/oggmap/main/docs/notebooks/hvulgaris_example.ipynb
Steps
To process the scRNA data, we will do the following:
Use pre-calculated gene age classification
Get query species taxonomic lineage information
Get query species orthomap
Map OrthoFinder gene names and scRNA gene/transcript names
Get TEI values and add them to scRNA dataset
Get partial TEI values to visualize gene age class contributions
Process scRNA data and visualize TEI
Import libraries
[1]:
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
import matplotlib.pyplot as plt
from statannot import add_stat_annotation
# increase dpi
%matplotlib inline
#plt.rcParams['figure.dpi'] = 300
#plt.rcParams['savefig.dpi'] = 300
#plt.rcParams['figure.figsize'] = [6, 4.5]
plt.rcParams['figure.figsize'] = [4.4, 3.3]
Import oggmap python package submodules
[2]:
# import submodules
from oggmap import qlin, gtf2t2g, of2orthomap, orthomap2tei, datasets
Step 0 - Use pre-calculated gene age classification
Orthomap was pre-calculated (Cazet et al., 2022) and obtained from here https://research.nhgri.nih.gov/HydraAEP, it is also available here:
https://doi.org/10.5281/zenodo.7242263
or can be accessed with the dataset submodule of oggmap
datasets.cazet22_orthomap('data') (download folder set to 'data').
If you want to use your own OrthoFinder results:
oggmap can extract gene age classification from existing OrthoFinder results and link them with scRNA data.
A detailed how-to is available here:
https://orthomap.readthedocs.io/en/latest/tutorials/orthofinder.html
[3]:
# download pre-calculated orthomap into data folder
datasets.cazet22_orthomap('data')
100% [........................................................] 264477 / 264477
[3]:
'data/Cazet2022_Orthomap.tsv'
Step 1 - get query species taxonomic lineage information
Given a species name or taxonomic ID, the query species lineage information is extracted with the help of the ete3 python toolkit and the NCBI taxonomy (Huerta-Cepas et al., 2016). This information is needed alongside with the taxonomic classifications for all species used in the OrthoFinder comparison.
The oggmap submodule qlin helps to get this information for you with the qlin.get_qlin() function as follows:
[4]:
# get query species taxonomic lineage information
query_lineage = qlin.get_qlin(q='Hydra vulgaris')
query name: Hydra vulgaris
query taxID: 6087
query kingdom: Eukaryota
query lineage names:
['root(1)', 'cellular organisms(131567)', 'Eukaryota(2759)', 'Opisthokonta(33154)', 'Metazoa(33208)', 'Eumetazoa(6072)', 'Cnidaria(6073)', 'Hydrozoa(6074)', 'Hydroidolina(37516)', 'Anthoathecata(406427)', 'Aplanulata(1612408)', 'Hydridae(6080)', 'Hydra(6083)', 'Hydra vulgaris(6087)']
query lineage:
[1, 131567, 2759, 33154, 33208, 6072, 6073, 6074, 37516, 406427, 1612408, 6080, 6083, 6087]
Step 2 - gene age class assignment (query species orthomap)
Orthomap was pre-calculated (Cazet et al., 2022) and obtained from here https://research.nhgri.nih.gov/HydraAEP, it is also available here:
https://github.com/cejuliano/brown_hydra_genomes/blob/main/06_geneAge/geneAge.csv
and here:
https://doi.org/10.5281/zenodo.7242263
or can be accessed with the dataset submodule of oggmap
datasets.cazet22_orthomap('data') (download folder set to 'data').
The pre-calculated orthomap can be imported with the read_orthomap function from the orthomap2tei submodule as follwos:
[5]:
# get query species orthomap
# download pre-calculated orthomap here: https://doi.org/10.5281/zenodo.7242263
# or download with datasets.cazet22_orthomap('data')
query_orthomap = orthomap2tei.read_orthomap(orthomapfile='data/Cazet2022_Orthomap.tsv')
query_orthomap
[5]:
| age | ID | ageN | |
|---|---|---|---|
| 0 | N36 | G013495 | 11 |
| 1 | N36 | G012562 | 11 |
| 2 | N36 | G013704 | 11 |
| 3 | N36 | G012561 | 11 |
| 4 | N36 | G013496 | 11 |
| ... | ... | ... | ... |
| 19944 | N0 | G000765 | 1 |
| 19945 | N0 | G000767 | 1 |
| 19946 | N0 | G001616 | 1 |
| 19947 | N0 | G015670 | 1 |
| 19948 | N0 | G024576 | 1 |
19949 rows × 3 columns
Gene age assignments per query species lineage node
Given an orthomap, one can get an overview of the gene age assignments per query species lineage node.
The oggmap submodule of2orhomap and the of2orthomap.get_counts_per_ps() function will show the distribution of the gene age classes and can be further visualized as follows:
[6]:
# show count per taxonomic group (PStaxID)
of2orthomap.get_counts_per_ps(omap_df=query_orthomap,
psnum_col='ageN',
pstaxid_col=None,
psname_col=None)
# bar plot count per taxonomic group (PSname)
ax = of2orthomap.get_counts_per_ps(omap_df=query_orthomap,
psnum_col='ageN',
pstaxid_col=None,
psname_col=None).plot.bar(y='counts', x='ageN')
ax.set_title('H. vulgaris - Number of genes per gene age class')
plt.show()
Step 3 - map OrthoFinder gene names and scRNA gene/transcript names
To be able to link gene ages assignments from an orthomap and gene or transcript of scRNA dataset, one needs to check the overlap of the annotated gene names. With the gtf2t2g submodule of orthomap and the parse_gtf function, one can extract gene and transcript names from a given gene feature file (GTF).
Here, pre-calculated orthomap gene names already overlap, so no GTF import is necessary (Cazet et al., 2022).
If in your case gene or transcript IDs between an orthomap and scRNA data do not match directly, please have a look at a detailed how-to to match them:
https://oggmap.readthedocs.io/en/latest/tutorials/geneset_overlap.html
Import now, the scRNA dataset of the query species
Here, data is used like in the original publication (Cazet et al., 2022).
scRNA data were downloaded from https://research.nhgri.nih.gov/HydraAEP, converted into Scanpy AnnData objects (Wolf et al., 2018) and are availabe here:
https://doi.org/10.5281/zenodo.7366178
or can be accessed with the dataset submodule of orthomap
datasets.cazet22('data') (download folder set to 'data').
[7]:
# load scRNA data
# download hydra scRNA data here: https://doi.org/10.5281/zenodo.7366178
# or download with datasets.cazet22(datapath='data')
#hvulgaris_data = datasets.cazet22('data')
hvulgaris_data = sc.read(filename='data/aepAtlasNonDub.h5ad')
Get an overview of observations
[8]:
hvulgaris_data
[8]:
AnnData object with n_obs × n_vars = 29339 × 20159
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_SCT', 'nFeature_SCT', 'integrated_snn_res.0.7', 'seurat_clusters', 'curatedIdent', 'mg1', 'mg2', 'mg3', 'mg4', 'mg5', 'mg6', 'mg7', 'mg8', 'mg9', 'mg10', 'mg11', 'mg12', 'mg13', 'mg14', 'mg15', 'mg16', 'mg17', 'mg18', 'mg19', 'mg20', 'mg21', 'mg22', 'mg23', 'mg24', 'mg25', 'mg26', 'mg27', 'mg28', 'mg29', 'mg30', 'mg31', 'mg32', 'mg33', 'mg34', 'mg35', 'mg36', 'mg37', 'mg38', 'mg39', 'mg40', 'mg41', 'mg42', 'mg43', 'mg44', 'mg45', 'mg46', 'mg47', 'mg48', 'mg49', 'mg50', 'mg51', 'mg52', 'mg53', 'mg54', 'mg55', 'mg56', 'RUNX1;MA0002.2', 'TFAP2A;MA0003.4', 'Arnt;MA0004.1', 'Ahr::Arnt;MA0006.1', 'TBXT;MA0009.2', 'br(var.2);MA0011.1', 'br(var.4);MA0013.1', 'PAX5;MA0014.3', 'NR2F1;MA0017.2', 'CREB1;MA0018.4', 'dl;MA0022.1', 'E2F1;MA0024.3', 'NFIL3;MA0025.2', 'Eip74EF;MA0026.1', 'ELK1;MA0028.2', 'FOXF2;MA0030.1', 'FOXD1;MA0031.1', 'FOXC1;MA0032.2', 'GATA2;MA0036.3', 'GFI1;MA0038.2', 'KLF4;MA0039.4', 'Foxd3;MA0041.1', 'HLF;MA0043.3', 'HNF1A;MA0046.2', 'FOXA2;MA0047.3', 'NHLH1;MA0048.2', 'hb;MA0049.1', 'IRF2;MA0051.1', 'MEF2A;MA0052.4', 'MAX;MA0058.3', 'MAX::MYC;MA0059.1', 'NFYA;MA0060.3', 'GABPA;MA0062.3', 'NKX2-5;MA0063.2', 'Pparg::Rxra;MA0065.2', 'Pax2;MA0067.1', 'PAX6;MA0069.1', 'RORA;MA0071.1', 'RREB1;MA0073.1', 'ELK4;MA0076.2', 'SOX9;MA0077.1', 'Sox17;MA0078.1', 'SP1;MA0079.4', 'SPI1;MA0080.5', 'SPIB;MA0081.2', 'SRF;MA0083.3', 'SRY;MA0084.1', 'sna;MA0086.2', 'TAL1::TCF3;MA0091.1', 'Hand1::Tcf3;MA0092.1', 'YY1;MA0095.2', 'ETS1;MA0098.3', 'FOS::JUN;MA0099.3', 'MYB;MA0100.3', 'REL;MA0101.1', 'CEBPA;MA0102.4', 'ZEB1;MA0103.3', 'MYCN;MA0104.4', 'NFKB1;MA0105.4', 'TP53;MA0106.3', 'RELA;MA0107.1', 'TBP;MA0108.2', 'HLTF;MA0109.1', 'Spz1;MA0111.1', 'HNF4A;MA0114.4', 'Znf423;MA0116.1', 'NFIC::TLX1;MA0119.1', 'Nkx3-2;MA0122.3', 'Nkx3-1;MA0124.2', 'Nobox;MA0125.1', 'ZNF354C;MA0130.1', 'HINFP;MA0131.2', 'Lhx3;MA0135.1', 'ELF5;MA0136.2', 'STAT1;MA0137.3', 'REST;MA0138.2', 'CTCF;MA0139.1', 'ESRRB;MA0141.3', 'SOX2;MA0143.4', 'TFCP2;MA0145.3', 'Zfx;MA0146.2', 'MYC;MA0147.3', 'FOXA1;MA0148.4', 'HNF1B;MA0153.2', 'INSM1;MA0155.1', 'FEV;MA0156.2', 'FOXO3;MA0157.2', 'RARA::RXRA;MA0159.1', 'NR4A2;MA0160.1', 'NFIC;MA0161.2', 'EGR1;MA0162.4', 'PLAG1;MA0163.1', 'Vsx2;MA0180.1', 'Deaf1;MA0185.1', 'brk;MA0213.1', 'exd;MA0222.1', 'inv;MA0229.1', 'pan;MA0237.2', 'Bgb::run;MA0242.1', 'slbo;MA0244.1', 'tin;MA0247.2', 'twi;MA0249.2', 'vnd;MA0253.1', 'z;MA0255.1', 'ESR2;MA0258.2', 'ARNT::HIF1A;MA0259.1', 'che-1;MA0260.1', 'lin-14;MA0261.1', 'ceh-10::ttx-3;MA0263.1', 'ceh-22;MA0264.1', 'SOX10;MA0442.2', 'btd;MA0443.1', 'D;MA0445.1', 'fkh;MA0446.1', 'gt;MA0447.1', 'h;MA0449.1', 'hkb;MA0450.1', 'kni;MA0451.1', 'odd;MA0454.1', 'opa;MA0456.1', 'slp1;MA0458.1', 'tll;MA0459.1', 'Atoh1;MA0461.2', 'BATF::JUN;MA0462.2', 'BHLHE40;MA0464.2', 'CDX2;MA0465.2', 'Crx;MA0467.1', 'E2F3;MA0469.3', 'E2F4;MA0470.2', 'E2F6;MA0471.2', 'ELF1;MA0473.3', 'ERG;MA0474.2', 'FLI1;MA0475.2', 'FOS;MA0476.1', 'FOSL1;MA0477.2', 'FOSL2;MA0478.1', 'Foxo1;MA0480.1', 'FOXP1;MA0481.3', 'Gfi1b;MA0483.1', 'HNF4G;MA0484.2', 'HSF1;MA0486.2', 'JUN;MA0488.1', 'JUN(var.2);MA0489.1', 'JUNB;MA0490.2', 'JUND;MA0491.2', 'JUND(var.2);MA0492.1', 'Klf1;MA0493.1', 'MEF2C;MA0497.1', 'MEIS1;MA0498.2', 'MYOD1;MA0499.2', 'MYOG;MA0500.2', 'MAF::NFE2;MA0501.1', 'NFYB;MA0502.2', 'Nkx2-5(var.2);MA0503.1', 'Nr5a2;MA0505.1', 'NRF1;MA0506.1', 'POU2F2;MA0507.1', 'PRDM1;MA0508.3', 'RFX1;MA0509.2', 'RFX5;MA0510.2', 'RUNX2;MA0511.2', 'SMAD2::SMAD3::SMAD4;MA0513.1', 'Sox3;MA0514.1', 'Sox6;MA0515.1', 'Stat5a::Stat5b;MA0519.1', 'Tcf12;MA0521.1', 'TCF3;MA0522.3', 'TCF7L2;MA0523.1', 'TFAP2C;MA0524.2', 'USF2;MA0526.3', 'ZBTB33;MA0527.1', 'ZNF263;MA0528.2', 'cnc::maf-S;MA0530.1', 'CTCF;MA0531.1', 'Stat92E;MA0532.1', 'su(Hw);MA0533.1', 'Mad;MA0535.1', 'blmp-1;MA0537.1', 'daf-12;MA0538.1', 'dpy-27;MA0540.1', 'efl-1;MA0541.1', 'elt-3;MA0542.1', 'eor-1;MA0543.1', 'snpc-4;MA0544.1', 'hlh-1;MA0545.1', 'pha-4;MA0546.1', 'Bach1::Mafk;MA0591.1', 'ESRRA;MA0592.3', 'FOXP2;MA0593.1', 'HOXA9;MA0594.2', 'SREBF1;MA0595.1', 'SREBF2;MA0596.1', 'THAP1;MA0597.1', 'EHF;MA0598.3', 'KLF5;MA0599.1', 'RFX2;MA0600.2', 'Arntl;MA0603.1', 'Atf1;MA0604.1', 'ATF3;MA0605.2', 'Bhlha15;MA0607.1', 'Creb3l2;MA0608.1', 'Dux;MA0611.1', 'EMX1;MA0612.2', 'FOXG1;MA0613.1', 'Foxj2;MA0614.1', 'HES2;MA0616.2', 'LIN54;MA0619.1', 'MITF;MA0620.3', 'mix-a;MA0621.1', 'Mlxip;MA0622.1', 'NEUROG1;MA0623.2', 'NFATC1;MA0624.1', 'NFATC3;MA0625.1', 'Npas2;MA0626.1', 'POU2F3;MA0627.2', 'Rhox11;MA0629.1', 'SHOX;MA0630.1', 'TCFL5;MA0632.2', 'Twist2;MA0633.1', 'ALX3;MA0634.1', 'BHLHE41;MA0636.1', 'CENPB;MA0637.1', 'CREB3;MA0638.1', 'DBP;MA0639.1', 'ELF3;MA0640.2', 'ELF4;MA0641.1', 'EN2;MA0642.1', 'ESX1;MA0644.1', 'ETV6;MA0645.1', 'GRHL1;MA0647.1', 'GSC;MA0648.1', 'HEY2;MA0649.1', 'IRF8;MA0652.1', 'IRF9;MA0653.1', 'ISX;MA0654.1', 'JDP2;MA0655.1', 'JDP2(var.2);MA0656.1', 'KLF13;MA0657.1', 'LHX6;MA0658.1', 'MIXL1;MA0662.1', 'MLX;MA0663.1', 'MLXIPL;MA0664.1', 'MSC;MA0665.1', 'MYF6;MA0667.1', 'NEUROD2;MA0668.1', 'NEUROG2;MA0669.1', 'NFIX;MA0671.1', 'NKX2-8;MA0673.1', 'NKX6-1;MA0674.1', 'OLIG2;MA0678.1', 'ONECUT1;MA0679.2', 'PHOX2B;MA0681.2', 'PITX1;MA0682.2', 'POU4F2;MA0683.1', 'RUNX3;MA0684.2', 'SP4;MA0685.1', 'SPDEF;MA0686.1', 'SPIC;MA0687.1', 'TBX2;MA0688.1', 'TBX21;MA0690.1', 'TFAP4;MA0691.1', 'TFEB;MA0692.1', 'VDR;MA0693.2', 'ZBTB7B;MA0694.1', 'ZBTB7C;MA0695.1', 'ZIC1;MA0696.1', 'ZIC3;MA0697.1', 'ZBTB18;MA0698.1', 'LBX2;MA0699.1', 'LHX2;MA0700.2', 'LMX1B;MA0703.2', 'MEOX2;MA0706.1', 'MNX1;MA0707.1', 'MSX2;MA0708.1', 'NOTO;MA0710.1', 'OTX1;MA0711.1', 'OTX2;MA0712.2', 'PHOX2A;MA0713.1', 'PITX3;MA0714.1', 'PROP1;MA0715.1', 'PRRX1;MA0716.1', 'RAX2;MA0717.1', 'RAX;MA0718.1', 'RHOXF1;MA0719.1', 'Shox2;MA0720.1', 'UNCX;MA0721.1', 'VENTX;MA0724.1', 'NR3C2;MA0727.1', 'Nr2f6(var.2);MA0728.1', 'RARA(var.2);MA0730.1', 'BCL6B;MA0731.1', 'GLIS2;MA0736.1', 'GLIS3;MA0737.1', 'HIC2;MA0738.1', 'Hic1;MA0739.1', 'KLF14;MA0740.1', 'KLF16;MA0741.1', 'Klf12;MA0742.1', 'SCRT1;MA0743.2', 'SCRT2;MA0744.2', 'SNAI2;MA0745.2', 'SP3;MA0746.2', 'SP8;MA0747.1', 'YY2;MA0748.2', 'ZBTB7A;MA0750.2', 'ZIC4;MA0751.1', 'ONECUT2;MA0756.1', 'ONECUT3;MA0757.1', 'E2F7;MA0758.1', 'ELK3;MA0759.1', 'ERF;MA0760.1', 'ETV1;MA0761.2', 'ETV2;MA0762.1', 'ETV3;MA0763.1', 'ETV4;MA0764.2', 'ETV5;MA0765.2', 'GATA5;MA0766.2', 'LEF1;MA0768.1', 'TCF7;MA0769.2', 'HSF2;MA0770.1', 'HSF4;MA0771.1', 'IRF7;MA0772.1', 'MEIS2;MA0774.1', 'MEIS3;MA0775.1', 'MYBL2;MA0777.1', 'PAX1;MA0779.1', 'PAX9;MA0781.1', 'PKNOX2;MA0783.1', 'POU1F1;MA0784.1', 'POU3F1;MA0786.1', 'POU3F2;MA0787.1', 'POU3F3;MA0788.1', 'POU4F1;MA0790.1', 'POU4F3;MA0791.1', 'POU6F2;MA0793.1', 'PROX1;MA0794.1', 'TGIF1;MA0796.1', 'TGIF2;MA0797.1', 'RFX3;MA0798.2', 'RFX4;MA0799.1', 'EOMES;MA0800.1', 'MGA;MA0801.1', 'TBR1;MA0802.1', 'TBX15;MA0803.1', 'TBX19;MA0804.1', 'TBX1;MA0805.1', 'TBX4;MA0806.1', 'TBX5;MA0807.1', 'TEAD3;MA0808.1', 'TFAP2A(var.2);MA0810.1', 'TFAP2B;MA0811.1', 'TFAP2C(var.2);MA0814.2', 'Ascl2;MA0816.1', 'BHLHE23;MA0817.1', 'BHLHE22;MA0818.1', 'CLOCK;MA0819.1', 'FIGLA;MA0820.1', 'HES5;MA0821.1', 'HES7;MA0822.1', 'HEY1;MA0823.1', 'MNT;MA0825.1', 'OLIG1;MA0826.1', 'OLIG3;MA0827.1', 'SREBF2(var.2);MA0828.1', 'SREBF1(var.2);MA0829.2', 'TCF4;MA0830.2', 'TFE3;MA0831.2', 'Tcf21;MA0832.1', 'ATF4;MA0833.2', 'ATF7;MA0834.1', 'BATF3;MA0835.2', 'CEBPD;MA0836.2', 'CEBPE;MA0837.1', 'CEBPG;MA0838.1', 'CREB3L1;MA0839.1', 'Creb5;MA0840.1', 'NFE2;MA0841.1', 'TEF;MA0843.1', 'FOXB1;MA0845.1', 'FOXC2;MA0846.1', 'FOXD2;MA0847.2', 'Foxj3;MA0851.1', 'FOXK1;MA0852.2', 'Alx4;MA0853.1', 'Alx1;MA0854.1', 'Rarg;MA0859.1', 'TP73;MA0861.1', 'GMEB2;MA0862.1', 'MTF1;MA0863.1', 'E2F2;MA0864.2', 'E2F8;MA0865.1', 'SOX4;MA0867.2', 'SOX8;MA0868.2', 'Sox1;MA0870.1', 'TFEC;MA0871.2', 'BSX;MA0876.1', 'BARHL1;MA0877.2', 'CDX1;MA0878.2', 'Dlx1;MA0879.1', 'Dmbx1;MA0883.1', 'EVX1;MA0887.1', 'GBX1;MA0889.1', 'GBX2;MA0890.1', 'GSC2;MA0891.1', 'HESX1;MA0894.1', 'HMBOX1;MA0895.1', 'Hmx2;MA0897.1', 'Hmx3;MA0898.1', 'HOXB13;MA0901.2', 'HOXC10;MA0905.1', 'HOXD13;MA0909.2', 'ISL2;MA0914.1', 'dve;MA0915.1', 'Ets21C;MA0916.1', 'gcm2;MA0917.1', 'fkh-2;MA0920.1', 'ceh-48;MA0921.1', 'ces-2;MA0922.1', 'unc-86;MA0926.1', 'zfh-2;MA0928.1', 'HES1;MA1099.2', 'ASCL1;MA1100.2', 'BACH2;MA1101.2', 'CTCFL;MA1102.2', 'FOXK2;MA1103.2', 'GATA6;MA1104.2', 'GRHL2;MA1105.2', 'HIF1A;MA1106.1', 'KLF9;MA1107.2', 'MXI1;MA1108.2', 'NEUROD1;MA1109.1', 'NR4A1;MA1112.2', 'PBX3;MA1114.1', 'POU5F1;MA1115.1', 'RELB;MA1117.1', 'SIX1;MA1118.1', 'SIX2;MA1119.1', 'SOX13;MA1120.1', 'TEAD2;MA1121.1', 'TFDP1;MA1122.1', 'TWIST1;MA1123.2', 'ZNF24;MA1124.1', 'FOS::JUN(var.2);MA1126.1', 'FOSB::JUN;MA1127.1', 'FOSL1::JUN;MA1128.1', 'FOSL1::JUN(var.2);MA1129.1', 'FOSL2::JUN;MA1130.1', 'FOSL2::JUN(var.2);MA1131.1', 'JUN::JUNB;MA1132.1', 'JUN::JUNB(var.2);MA1133.1', 'FOS::JUNB;MA1134.1', 'FOSB::JUNB;MA1135.1', 'FOSB::JUNB(var.2);MA1136.1', 'FOSL1::JUNB;MA1137.1', 'FOSL2::JUNB;MA1138.1', 'FOSL2::JUNB(var.2);MA1139.1', 'JUNB(var.2);MA1140.2', 'FOS::JUND;MA1141.1', 'FOSL1::JUND;MA1142.1', 'FOSL1::JUND(var.2);MA1143.1', 'FOSL2::JUND;MA1144.1', 'FOSL2::JUND(var.2);MA1145.1', 'NR1H4::RXRA;MA1146.1', 'PPARA::RXRA;MA1148.1', 'RARA::RXRG;MA1149.1', 'RORB;MA1150.1', 'RORC;MA1151.1', 'SOX15;MA1152.1', 'Smad4;MA1153.1', 'ZSCAN4;MA1155.1', 'IRF3;MA1418.1', 'IRF4;MA1419.1', 'TCF7L1;MA1421.1', 'atf-7;MA1438.1', 'elt-6;MA1439.1', 'fkh-9;MA1440.1', 'unc-30;MA1443.1', 'daf-16;MA1446.1', 'fos-1;MA1448.1', 'nhr-6;MA1451.1', 'dmrt99B;MA1455.1', 'grh;MA1457.1', 'M1BP;MA1459.1', 'pho;MA1460.1', 'sv;MA1461.1', 'vfl;MA1462.1', 'ARGFX;MA1463.1', 'ARNT2;MA1464.1', 'ATF6;MA1466.1', 'ATOH1(var.2);MA1467.1', 'ATOH7;MA1468.1', 'BHLHA15(var.2);MA1472.1', 'CDX4;MA1473.1', 'CREB3L4;MA1474.1', 'DMRTA2;MA1478.1', 'DMRTC2;MA1479.1', 'DPRX;MA1480.1', 'DRGX;MA1481.1', 'ELF2;MA1483.1', 'ETS2;MA1484.1', 'FERD3L;MA1485.1', 'FOXE1;MA1487.1', 'FOXN3;MA1489.1', 'GLI3;MA1491.1', 'HES6;MA1493.1', 'HNF4A(var.2);MA1494.1', 'HOXA1;MA1495.1', 'HOXB7;MA1501.1', 'IKZF1;MA1508.1', 'IRF6;MA1509.1', 'KLF10;MA1511.1', 'KLF11;MA1512.1', 'KLF15;MA1513.1', 'KLF17;MA1514.1', 'KLF2;MA1515.1', 'KLF3;MA1516.1', 'KLF6;MA1517.1', 'MAF;MA1520.1', 'MAZ;MA1522.1', 'MSANTD3;MA1523.1', 'MSGN1;MA1524.1', 'NFATC4;MA1525.1', 'NFIC(var.2);MA1527.1', 'NHLH2;MA1529.1', 'NR1D1;MA1531.1', 'NR1I2;MA1533.1', 'NR1I3;MA1534.1', 'NR2C1;MA1535.1', 'NR2C2(var.2);MA1536.1', 'NR2F6(var.3);MA1539.1', 'NR5A1;MA1540.1', 'OSR1;MA1542.1', 'OVOL1;MA1544.1', 'OVOL2;MA1545.1', 'PAX3(var.2);MA1546.1', 'PITX2;MA1547.1', 'RXRG(var.2);MA1556.1', 'SNAI1;MA1558.1', 'SNAI3;MA1559.1', 'SOHLH2;MA1560.1', 'SOX12;MA1561.1', 'SOX14;MA1562.1', 'SOX18;MA1563.1', 'SP9;MA1564.1', 'TBX18;MA1565.1', 'TBX6;MA1567.1', 'TCF21(var.2);MA1568.1', 'TFAP4(var.2);MA1570.1', 'TGIF2LX;MA1571.1', 'TGIF2LY;MA1572.1', 'THRB(var.2);MA1575.1', 'THRB(var.3);MA1576.1', 'VEZF1;MA1578.1', 'ZBTB6;MA1581.1', 'ZFP57;MA1583.1', 'ZIC5;MA1584.1', 'ZKSCAN1;MA1585.1', 'ZNF135;MA1587.1', 'ZNF136;MA1588.1', 'ZNF274;MA1592.1', 'ZNF382;MA1594.1', 'ZNF460;MA1596.1', 'ZNF528;MA1597.1', 'ZNF684;MA1600.1', 'ZNF75D;MA1601.1', 'Ebf2;MA1604.1', 'Foxf1;MA1606.1', 'Foxl2;MA1607.1', 'Plagl1;MA1615.1', 'Prdm15;MA1616.1', 'Ptf1a;MA1618.1', 'Ptf1a(var.2);MA1619.1', 'Ptf1a(var.3);MA1620.1', 'Rbpjl;MA1621.1', 'Smad2::Smad3;MA1622.1', 'Stat2;MA1623.1', 'Stat5b;MA1625.1', 'Wt1;MA1627.1', 'Zic1::Zic2;MA1628.1', 'Zic2;MA1629.1', 'Znf281;MA1630.1', 'ASCL1(var.2);MA1631.1', 'ATF2;MA1632.1', 'BATF;MA1634.1', 'BHLHE22(var.2);MA1635.1', 'EBF3;MA1637.1', 'HAND2;MA1638.1', 'MYF5;MA1641.1', 'NEUROG2(var.2);MA1642.1', 'NFIB;MA1643.1', 'NFYC;MA1644.1', 'NKX2-2;MA1645.1', 'OSR2;MA1646.1', 'PRDM4;MA1647.1', 'TCF12(var.2);MA1648.1', 'ZBTB12;MA1649.1', 'ZBTB14;MA1650.1', 'ZFP42;MA1651.1', 'ZNF148;MA1653.1', 'ZNF16;MA1654.1', 'ZNF341;MA1655.1', 'ZNF449;MA1656.1', 'ZNF652;MA1657.1', 'FOXA3;MA1683.1', 'ceh-38;MA1699.1', 'Clamp;MA1700.1', 'elt-2;MA1701.1', 'Pdp1;MA1702.1', 'pqm-1;MA1703.1', 'zip-8;MA1704.1', 'TAI'
var: 'features', 'id'
[9]:
hvulgaris_data.obs
[9]:
| orig.ident | nCount_RNA | nFeature_RNA | nCount_SCT | nFeature_SCT | integrated_snn_res.0.7 | seurat_clusters | curatedIdent | mg1 | mg2 | ... | ZNF449;MA1656.1 | ZNF652;MA1657.1 | FOXA3;MA1683.1 | ceh-38;MA1699.1 | Clamp;MA1700.1 | elt-2;MA1701.1 | Pdp1;MA1702.1 | pqm-1;MA1703.1 | zip-8;MA1704.1 | TAI | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TTTCCGTAGAAN-D01-D1_S1 | D01-D1_S1 | 64276.0 | 6699 | 8505.0 | 3008 | 11 | 11 | Ec_Head | 0.000000 | 0.000000 | ... | 0.005337 | 0.000000 | 0.058842 | 0.021006 | 0.238216 | 0.136273 | 0.000000 | 0.000000 | 0.914948 | 3.797976 |
| CAGTACCCGCTT-D01-D1_S1 | D01-D1_S1 | 63988.0 | 6380 | 7836.0 | 2400 | 9 | 9 | En_Foot | 0.070943 | 0.000000 | ... | 0.000000 | 0.000000 | 0.465768 | 0.000000 | 0.003879 | 0.050755 | 0.000000 | 0.001250 | 0.235164 | 3.969117 |
| CTTTTCCGATGA-D01-D1_S1 | D01-D1_S1 | 69511.0 | 6770 | 8178.0 | 2645 | 0 | 0 | En_BodyCol/SC | 0.000000 | 0.013072 | ... | 0.000000 | 0.000000 | 1.003803 | 0.097580 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.434233 | 3.526038 |
| GCTCCCGCCCGC-D01-D1_S1 | D01-D1_S1 | 69530.0 | 6241 | 7915.0 | 2251 | 6 | 6 | En_Head | 0.022236 | 0.007755 | ... | 0.001358 | 0.000000 | 1.322596 | 0.676596 | 0.000000 | 0.005780 | 0.000000 | 0.000000 | 0.646697 | 3.636835 |
| TTTATGATTAGG-D01-D1_S1 | D01-D1_S1 | 65456.0 | 6867 | 7889.0 | 2920 | 0 | 0 | En_BodyCol/SC | 0.000000 | 0.000000 | ... | 0.001410 | 0.006779 | 0.509137 | 0.000000 | 0.073598 | 0.001549 | 0.008877 | 0.022202 | 0.272550 | 3.520175 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| GGGTCGCCGTGC-D12-N2_S2 | D12-N2_S2 | 501.0 | 351 | 804.0 | 379 | 24 | 24 | I_En1N | 0.033858 | 0.000000 | ... | 0.099983 | 0.000000 | 0.051087 | 0.000000 | 0.058318 | 0.000000 | 0.013654 | 1.106784 | 0.072763 | 4.124720 |
| GGCGTCTGTGCG-D12-N2_S2 | D12-N2_S2 | 510.0 | 309 | 823.0 | 334 | 12 | 12 | I_DesmoNB | 0.000000 | 0.000000 | ... | 0.000000 | 0.986436 | 0.023317 | 0.024707 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.022734 | 4.331113 |
| AGGGTTCGCTCA-D12-N2_S2 | D12-N2_S2 | 524.0 | 356 | 816.0 | 380 | 23 | 23 | I_Ec1N | 0.000000 | 0.015268 | ... | 0.000425 | 0.000000 | 0.007155 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.107668 | 4.511764 |
| GGTGGGTTATAC-D12-N2_S2 | D12-N2_S2 | 652.0 | 383 | 862.0 | 386 | 24 | 24 | I_En1N | 0.085185 | 0.004089 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.025188 | 0.000000 | 1.879879 | 0.023827 | 4.127208 |
| GGGTAAAGGCGG-D12-N2_S2 | D12-N2_S2 | 515.0 | 331 | 840.0 | 338 | 28 | 28 | I_En2N | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.188288 | 0.319323 | 0.037424 | 0.234783 | 0.089618 | 5.029578 |
29339 rows × 675 columns
Helper functions to match gene names
The orthomap2tei submodule contains the orthomap2tei.geneset_overlap() helper function to check for gene name overlap between the constructed orthomap from OrthoFinder results and a given scRNA dataset.
[10]:
# check overlap of orthomap <seqID> and scRNA data <var_names>
orthomap2tei.geneset_overlap(geneset1=hvulgaris_data.var_names,
geneset2=query_orthomap['ID'])
[10]:
| g1_g2_overlap | g1_ratio | g2_ratio | |
|---|---|---|---|
| 0 | 19949 | 0.989583 | 1.0 |
Step 4 - Get TEI values and add them to scRNA dataset
Since now the gene names correspond to each other in the orthomap and the scRNA adata object, one can calculate the transcriptome evolutionary index (TEI) and add them to the scRNA dataset (adata object).
The TEI measure represents the weighted arithmetic mean (expression levels as weights for the phylostratum value) over all evolutionary age categories denoted as phylostra.
\({TEI_s = \sum (e_{is} * ps_i) / \sum e_{is}}\)
, where \({TEI_s}\) denotes the TEI value in developmental stage \({s, e_{is}}\) denotes the gene expression level of gene \({i}\) in stage \({s}\), and \({ps_i}\) denotes the corresponding phylostratum of gene \({i, i = 1,...,N}\) and \({N = total\ number\ of\ genes}\).
Note: If e.g. two different isoforms would fall into two different gene age classes, their gene ages might differ based on the oldest ortholog found in their corresponding orthologous groups. However, both isoforms share the same gene name and their gene ages would clash. In this case one can decide either to use the keep='min' or keep='max' gene age to be kept by the get_tei function, which defaults to keep in this cases the keep='min' or in other words the ‘older’ gene age.
To be able to re-use the original count data, they are added as a new layer to the adata object. This is useful because later on the count data can be used to extract either the relative expression per gene age class or re-calculate other metrics.
This can be done either on un-normalized counts, on normalized and log-transformed data.
[11]:
hvulgaris_data.layers['counts'] = hvulgaris_data.X
add TEI to adata object
Using the submodule orthomap2tei from orthomap and the orthomap2tei.get_tei() function, transcriptome evolutionary index (TEI) values are calculated and directyl added to the existing adata object (add_obs=True).
There are other options to e.g. not start from the adata.X counts but from another layer from the adata object, the default is to use the adata.X (layer=None). The values can be pre-processed by the normalize_total option and the log1p option.
If add_obs=True the resulting TEI values are added to the existing adata object as a new observation with the name set with the obs_name option.
If add_var=True the gene age values are added to the existing adata object as a new variable with the name set with the var_name option.
Note: Genes not assigned to any gene class will get a missing assignment.
If one wants to calculate bootstrap TEI values per cell, the boot option can be set to boot=True and gene age classes will be randomly chosen prior calculating TEI values bt=10 times.
[12]:
# add TEI values to existing adata object
orthomap2tei.get_tei(adata=hvulgaris_data,
gene_id=query_orthomap['ID'],
gene_age=query_orthomap['ageN'],
keep='min',
layer=None,
add_var=True,
var_name='Phylostrata',
add_obs=True,
obs_name='tei',
boot=False,
bt=10,
normalize_total=False,
log1p=False,
target_sum=1e6)
[12]:
| tei | |
|---|---|
| TTTCCGTAGAAN-D01-D1_S1 | 1.885530 |
| CAGTACCCGCTT-D01-D1_S1 | 2.552758 |
| CTTTTCCGATGA-D01-D1_S1 | 2.821474 |
| GCTCCCGCCCGC-D01-D1_S1 | 2.825493 |
| TTTATGATTAGG-D01-D1_S1 | 2.162769 |
| ... | ... |
| GGGTCGCCGTGC-D12-N2_S2 | 2.435130 |
| GGCGTCTGTGCG-D12-N2_S2 | 2.513725 |
| AGGGTTCGCTCA-D12-N2_S2 | 2.879771 |
| GGTGGGTTATAC-D12-N2_S2 | 2.387097 |
| GGGTAAAGGCGG-D12-N2_S2 | 1.959223 |
29339 rows × 1 columns
Step 5 - downstream analysis
Once the gene age data has been added to the scRNA dataset, one can e.g. plot the corresponding transcriptome evolutionary index (TEI) values by any given observation pre-defined in the scRNA dataset.
Here, we plot them against the assigned embryo stage and against assigned cell types of the zebrafish using the scanpy sc.pl.violin() function as follows:
Boxplot gene age class per sample timepoint
[13]:
sc.pl.violin(adata=hvulgaris_data,
keys=['tei'],
groupby='curatedIdent',
rotation=90,
palette='Paired',
stripplot=False,
inner='box')
Get partial TEI values to visualize gene age class contributions
Partial TEI values can give an idea about which gene age class contributed at most to the global TEI pattern.
In detail, each gene gets a TEI contribution profile as follows:
\({TEI_{is} = f_{is} * ps_i}\)
, where \({TEI_{is}}\) is the partial TEI value of gene \({i}\), \({f_{is} = e_{is} / \sum e_{is}}\) and \({ps_i}\) is the phylostratum of gene i.
\({TEI_{is}}\) values are combined per \({ps}\).
The partial TEI values combined per strata give an overall impression of the contribution of each strata to the global TEI pattern.
One can either start from counts (adata.X) which is set as default or any other layer defined by the layer option (layer=None).
In addition, the counts can be normalized and log-transformed prior calculating partial TEI values (normalize_total=False, log1p=False, target_sum=1e6).
Further, these values can be combined per given observation, e.g. cell type (group_by='cell.type').
The get_pstrata function of the orthomap2tei submodule will return two matrix, the first contains the sum of each partial TEI per gene age class and the second the corresponding frequencies.
Both can be further processed by returning the cumsum over the gene age classes. To get them set the option cumsum=True. The cumsum will result in either for the first matrix the TEI value per cell or mean TEI value per group, if one choose a observation with the group_by option. Or in case of the second frequency matrix will result in 1.
With the standard_scale option either gene age classes (standard_scale=0 rows) or cells or groups (standard_scale=1 columns) can be scaled, subtract the minimum and divide each by its maximum. By default no scaling is applied (standard_scale=None).
Here, we will scale each gene age class (standard_scale=0) to reproduce findings from original publication (Cazet et al., 2022).
The resulting data will be visualized in the downstream section.
[14]:
hvulgaris_pstrata = orthomap2tei.get_pstrata(adata=hvulgaris_data,
gene_id=query_orthomap['ID'],
gene_age=query_orthomap['ageN'],
keep='min',
layer=None,
cumsum=False,
group_by_obs='curatedIdent',
obs_fillna='__NaN',
obs_type='mean',
standard_scale=None,
normalize_total=True,
log1p=True,
target_sum=1e6)
hvulgaris_pstrata[0]
[14]:
| curatedIdent | Ec_BasalDisk | Ec_BodyCol/SC | Ec_Head | Ec_Peduncle | Ec_Tentacle | En_BodyCol/SC | En_Foot | En_Head | En_Tentacle | I_DesmoNB | ... | I_GranGl | I_ISC | I_IsoNB | I_IsoNC | I_MaleGC | I_Neuro | I_SpumMucGl | I_StenoNB | I_StenoNC | I_ZymoGl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ps | |||||||||||||||||||||
| 1 | 0.560994 | 0.595713 | 0.590387 | 0.581276 | 0.562467 | 0.589827 | 0.580900 | 0.581904 | 0.562355 | 0.589530 | ... | 0.598142 | 0.670901 | 0.574888 | 0.554668 | 0.635307 | 0.661613 | 0.605504 | 0.612284 | 0.553138 | 0.592371 |
| 2 | 0.511210 | 0.496005 | 0.495925 | 0.498049 | 0.499676 | 0.508730 | 0.507193 | 0.513416 | 0.521005 | 0.444571 | ... | 0.502875 | 0.469989 | 0.453770 | 0.505122 | 0.497674 | 0.467740 | 0.484118 | 0.436852 | 0.505544 | 0.496516 |
| 3 | 0.036718 | 0.033811 | 0.034563 | 0.037433 | 0.038390 | 0.040587 | 0.043163 | 0.040554 | 0.036169 | 0.028546 | ... | 0.036281 | 0.027780 | 0.028548 | 0.044000 | 0.035234 | 0.028230 | 0.036847 | 0.027264 | 0.044341 | 0.034395 |
| 4 | 0.195327 | 0.172565 | 0.173816 | 0.187209 | 0.207873 | 0.169317 | 0.175720 | 0.175979 | 0.186915 | 0.151476 | ... | 0.191275 | 0.105603 | 0.173582 | 0.177877 | 0.110640 | 0.114570 | 0.174782 | 0.149452 | 0.188667 | 0.219986 |
| 5 | 0.076601 | 0.064414 | 0.065810 | 0.067613 | 0.073942 | 0.068400 | 0.071836 | 0.071641 | 0.082624 | 0.183921 | ... | 0.065494 | 0.044966 | 0.181550 | 0.107528 | 0.053688 | 0.046995 | 0.065424 | 0.154116 | 0.119037 | 0.060822 |
| 6 | 0.137453 | 0.108081 | 0.109838 | 0.116427 | 0.129939 | 0.089112 | 0.097962 | 0.095880 | 0.108046 | 0.205680 | ... | 0.096518 | 0.066193 | 0.232879 | 0.136739 | 0.080290 | 0.069968 | 0.098696 | 0.173043 | 0.133907 | 0.123893 |
| 7 | 0.175829 | 0.164752 | 0.167630 | 0.174248 | 0.198587 | 0.126557 | 0.128408 | 0.136689 | 0.153862 | 0.231249 | ... | 0.155013 | 0.112669 | 0.181249 | 0.236929 | 0.142587 | 0.120460 | 0.127831 | 0.154073 | 0.215567 | 0.153125 |
| 8 | 0.311870 | 0.248873 | 0.260274 | 0.267582 | 0.317446 | 0.287415 | 0.318531 | 0.302624 | 0.348600 | 0.228224 | ... | 0.223682 | 0.128542 | 0.264267 | 0.338612 | 0.174242 | 0.161667 | 0.280838 | 0.255724 | 0.331506 | 0.216531 |
| 9 | 0.033367 | 0.031295 | 0.032025 | 0.035780 | 0.039247 | 0.045665 | 0.049970 | 0.042696 | 0.049330 | 0.019145 | ... | 0.023680 | 0.012810 | 0.025420 | 0.022936 | 0.026645 | 0.013645 | 0.019248 | 0.015464 | 0.023421 | 0.025568 |
| 10 | 0.087663 | 0.077823 | 0.098993 | 0.092296 | 0.078132 | 0.034018 | 0.034585 | 0.037312 | 0.040021 | 0.024821 | ... | 0.046094 | 0.023452 | 0.043020 | 0.047926 | 0.031439 | 0.035920 | 0.069036 | 0.034344 | 0.044832 | 0.037622 |
| 11 | 0.082774 | 0.055791 | 0.057948 | 0.065420 | 0.068868 | 0.098293 | 0.102523 | 0.085102 | 0.096652 | 0.039194 | ... | 0.043754 | 0.027352 | 0.045534 | 0.064072 | 0.044852 | 0.031666 | 0.049687 | 0.044516 | 0.073683 | 0.050072 |
11 rows × 32 columns
[16]:
plt.rcParams['figure.figsize'] = [6.5, 4.5]
ax = sns.lineplot(hvulgaris_pstrata[0].transpose(), palette='tab20', dashes=False)
ax.legend(fontsize=3, title='age class')
ax.set_title('H. vulgaris - Contribution of gene age classes to global TEI')
ax.set_xlabel('cell type')
ax.set_ylabel('TEI')
sns.move_legend(ax, 'upper left', bbox_to_anchor=(1, 1))
plt.xticks(rotation=90)
plt.show()
plt.rcParams['figure.figsize'] = [4.4, 3.3]
Heatmap partial TEI per gene age class
[17]:
sns.clustermap(data=hvulgaris_pstrata[0],
row_cluster=False,
col_cluster=True,
cmap='viridis')
[17]:
<seaborn.matrix.ClusterGrid at 0x7fb95c3b7430>
Heatmap partial TEI cumsum per gene age class and sample timepoint - first matrix
[18]:
sns.clustermap(data=hvulgaris_pstrata[0].cumsum(0),
row_cluster=False,
col_cluster=True,
cmap='viridis')
[18]:
<seaborn.matrix.ClusterGrid at 0x7fb98ef385b0>
Heatmap partial TEI per gene age class and sample timepoint - second matrix (frequencies)
[19]:
sns.clustermap(data=hvulgaris_pstrata[1],
row_cluster=False,
col_cluster=True,
cmap='viridis')
[19]:
<seaborn.matrix.ClusterGrid at 0x7fb98eeb4040>
Heatmap partial TEI cumsum per gene age class and sample timepoint - second matrix (frequencies)
[20]:
sns.clustermap(data=hvulgaris_pstrata[1].cumsum(0),
row_cluster=False,
col_cluster=True,
cmap='viridis')
[20]:
<seaborn.matrix.ClusterGrid at 0x7fb95770e610>
Color UMAP/TSNE by TEI
Follwoing the basic tutorial of the Scanpy python toolkit (Wolf et al., 2018), one can highlight TEI values on a dimensional reduction of the scRNA dataset, like PCA, UMAP or TSNE.
Filtering
[21]:
sc.pp.filter_genes(hvulgaris_data, min_cells=3)
sc.pp.filter_cells(hvulgaris_data, min_genes=200)
Normalization, Log transformation and Scaling
[22]:
sc.pp.normalize_total(hvulgaris_data, target_sum=1e6)
sc.pp.log1p(hvulgaris_data)
sc.pp.scale(hvulgaris_data, max_value=10)
PCA and Neighbor calculations
[23]:
sc.tl.pca(hvulgaris_data, svd_solver='arpack')
sc.pl.pca(hvulgaris_data, color=['curatedIdent', 'tei'])
/opt/anaconda3/envs/scanpy/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
[24]:
sc.pp.neighbors(hvulgaris_data)
Embedding the neighborhood graph
[25]:
plt.rcParams['figure.figsize'] = [6.5, 4.5]
sc.tl.paga(hvulgaris_data, groups='curatedIdent')
sc.pl.paga(hvulgaris_data, title='H. vulgaris - cell type - PAGA graph')
plt.rcParams['figure.figsize'] = [4.4, 3.3]
[26]:
plt.rcParams['figure.figsize'] = [6.5, 4.5]
sc.pl.paga(hvulgaris_data, title='H. vulgaris - cell type - PAGA graph', color=['tei'])
plt.rcParams['figure.figsize'] = [4.4, 3.3]
UMAP
[27]:
sc.tl.umap(hvulgaris_data,
init_pos='paga')
sc.pl.umap(hvulgaris_data,
title='H. vulgaris - cell type - UMAP', color=['curatedIdent'])
/opt/anaconda3/envs/scanpy/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
Color UMAP by TEI
[28]:
#plt.rcParams['figure.figsize'] = [7.5, 4.5]
sc.pl.umap(hvulgaris_data,
title='H. vulgaris - TEI - UMAP',
color=['tei'],
color_map='viridis',
vmin='p5',
vmax='p95')
#plt.rcParams['figure.figsize'] = [6, 4.5]
3D-UMAP
[29]:
plt.rcParams['figure.figsize'] = [7.5, 4.5]
#3d
sc.tl.umap(hvulgaris_data,
n_components=3)
sc.pl.umap(hvulgaris_data,
title='H. vulgaris - cell type - UMAP',
color=['curatedIdent'],
projection='3d')
plt.rcParams['figure.figsize'] = [4.4, 3.3]
/opt/anaconda3/envs/scanpy/lib/python3.8/site-packages/scanpy/plotting/_tools/scatterplots.py:325: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = ax.scatter(
[30]:
plt.rcParams['figure.figsize'] = [7.5, 4.5]
#3d
sc.pl.umap(hvulgaris_data,
title='H. vulgaris - TEI - UMAP',
color=['tei'],
color_map='viridis',
vmin='p5',
vmax='p95',
projection='3d')
plt.rcParams['figure.figsize'] = [4.4, 3.3]
Please have a look at the documentation for other case studies.