Tutorials
This section contains a variety of tutorials that should help get you started with the oggmap package.
Getting started
If you are running oggmap for the first time, we recommend to either follow the individual oggmap steps or getting started with either the zebrafish case study (using OrthoFinder results) or the nematode case study (using pre-calculated orthomaps), which both cover all essential steps.
Pre-calculated orthomaps
In addition to extract gene age classes from OrthoFinder results, oggmap has the functionality to parse and extract gene age classes from pre-calculated orthologous group databases, like eggNOG or plaza.
If your query species is part of one of these databases, it might be sufficient to use the gene age classes directly from them and not start the time consuming step of orthologous group detection with OrthoFinder or any other related tool (see benchmark of tools at Quest for Orthologs).
Note
Since gene age class assignment for any query species relies on taxonomic sampling to cover at best all possible species tree nodes from the root (origin of life) up to the query species, the pre-calculated orthologous group databases might lack species information for certain tree nodes. Orthologous group detection algorithm do not account for missing species and as such will influence the taxonomic completeness score.
Note
To link gene age classes and expression data one should use the same genome annotation version for both, the orthologous group detection and the gene expression counting. To use the same genome annotation has the benefit not to miss any gene in one or the other and decreases the source of error during gene ID mapping.
eggNOG database version 6.0 orthomaps
includes 1322 species
Extracted orthomaps for all Eukaryota from eggNOG database version 6.0 can be downloaded here:
eggnog6_eukaryota_orthomaps.tsv.zip
# to get eggnog6_eukaryota_orthomaps on Linux run:
wget https://zenodo.org/records/14911022/files/eggnog6_eukaryota_orthomaps.tsv.zip
# on Mac:
curl https://zenodo.org/records/14911022/files/eggnog6_eukaryota_orthomaps.tsv.zip --remote-name
To get an orthomap for e.g. the species Caenorhabditis elegans (taxID: 6239):
import pandas as pd
from oggmap import qlin, gtf2t2g, of2orthomap, orthomap2tei, datasets, ncbitax
eggnog6_eukaryota_orthomaps = pd.read_csv('eggnog6_eukaryota_orthomaps.tsv.zip', delimiter='\t')
query_lineage = qlin.get_qlin(q='Caenorhabditis elegans', dbname='taxadb.sqlite')
>>> query name: Caenorhabditis elegans
query taxID: 6239
query kingdom: Eukaryota
query lineage names:
['root(1)', 'cellular organisms(131567)', 'Eukaryota(2759)', 'Opisthokonta(33154)',
'Metazoa(33208)', 'Eumetazoa(6072)', 'Bilateria(33213)', 'Protostomia(33317)',
'Ecdysozoa(1206794)', 'Nematoda(6231)', 'Chromadorea(119089)', 'Rhabditida(6236)',
'Rhabditina(2301116)', 'Rhabditomorpha(2301119)', 'Rhabditoidea(55879)',
'Rhabditidae(6243)', 'Peloderinae(55885)', 'Caenorhabditis(6237)', 'Caenorhabditis elegans(6239)']
query lineage:
[1, 131567, 2759, 33154, 33208, 6072, 33213, 33317, 1206794, 6231, 119089, 6236,
2301116, 2301119, 55879, 6243, 55885, 6237, 6239]
query_orthomap = eggnog6_eukaryota_orthomaps[eggnog6_eukaryota_orthomaps['taxID']==query_lineage[1]]
query_orthomap
>>> taxID name seqID ... PStaxID PSname PScontinuity
13301320 6239 Caenorhabditis elegans 6239.C55B7.6a.1 ... 131567 cellular organisms 1.0
13301321 6239 Caenorhabditis elegans 6239.F14D12.5.1 ... 131567 cellular organisms 1.0
13301322 6239 Caenorhabditis elegans 6239.F41D9.5.1 ... 131567 cellular organisms 1.0
13301323 6239 Caenorhabditis elegans 6239.K12G11.1.1 ... 131567 cellular organisms 1.0
13301324 6239 Caenorhabditis elegans 6239.K12G11.2.1 ... 131567 cellular organisms 1.0
... ... ... ... ... ... ... ...
13319237 6239 Caenorhabditis elegans 6239.R09E12.8.1 ... 6237 Caenorhabditis 1.0
13319238 6239 Caenorhabditis elegans 6239.F39H2.1.1 ... 119089 Chromadorea 1.0
13319239 6239 Caenorhabditis elegans 6239.C32D5.9.1 ... 2759 Eukaryota 1.0
13319240 6239 Caenorhabditis elegans 6239.ZK593.6a.1 ... 2759 Eukaryota 1.0
13319241 6239 Caenorhabditis elegans 6239.F29C12.3b.1 ... 6231 Nematoda 1.0
[17922 rows x 8 columns]
plaza database version 5.0 orthomaps
The plaza database offers two different sets of gene family clusters, either homologous (HOMFAM) or orthologous gene families (ORTHOFAM).
plaza dicots database version 5.0
includes 98 species
Extracted orthomaps for all dicots (HOMFAM and ORTHOFAM) from plaza dicots database version 5.0 can be downloaded here:
plaza_v5_dicots_HOMFAM_orthomaps.tsv.zip
plaza_v5_dicots_ORTHOFAM_orthomaps.tsv.zip
# to get extracted orthomaps for all dicots on Linux run:
wget https://zenodo.org/records/14911022/files/plaza_v5_dicots_HOMFAM_orthomaps.tsv.zip
wget https://zenodo.org/records/14911022/files/plaza_v5_dicots_ORTHOFAM_orthomaps.tsv.zip
# on Mac:
curl https://zenodo.org/records/14911022/files/plaza_v5_dicots_HOMFAM_orthomaps.tsv.zip --remote-name
curl https://zenodo.org/records/14911022/files/plaza_v5_dicots_ORTHOFAM_orthomaps.tsv.zip --remote-name
plaza monocots database version 5.0
includes 52 species
Extracted orthomaps for all monocots (HOMFAM and ORTHOFAM) from plaza monocots database version 5.0 can be downloaded here:
plaza_v5_monocots_HOMFAM_orthomaps.tsv.zip
plaza_v5_monocots_ORTHOFAM_orthomaps.tsv.zip
# to get extracted orthomaps for all monocots on Linux run:
wget https://zenodo.org/records/14911022/files/plaza_v5_monocots_HOMFAM_orthomaps.tsv.zip
wget https://zenodo.org/records/14911022/files/plaza_v5_monocots_ORTHOFAM_orthomaps.tsv.zip
# on Mac:
curl https://zenodo.org/records/14911022/files/plaza_v5_dicots_HOMFAM_orthomaps.tsv.zip --remote-name
curl https://zenodo.org/records/14911022/files/plaza_v5_monocots_ORTHOFAM_orthomaps.tsv.zip --remote-name
To get an orthomap for e.g. the species Arabidopsis thaliana (taxID: 3702):
import pandas as pd
from oggmap import qlin, gtf2t2g, of2orthomap, orthomap2tei, datasets
plaza_v5_dicots_HOMFAM_orthomaps = pd.read_csv('plaza_v5_dicots_HOMFAM_orthomaps.tsv.zip', delimiter='\t')
query_lineage = qlin.get_qlin(q='Arabidopsis thaliana', dbname='taxadb.sqlite')
query name: Arabidopsis thaliana
query taxID: 3702
query kingdom: Eukaryota
query lineage names:
['root(1)', 'cellular organisms(131567)', 'Eukaryota(2759)', 'Viridiplantae(33090)',
'Streptophyta(35493)', 'Streptophytina(131221)', 'Embryophyta(3193)', 'Tracheophyta(58023)',
'Euphyllophyta(78536)', 'Spermatophyta(58024)', 'Magnoliopsida(3398)', 'Mesangiospermae(1437183)',
'eudicotyledons(71240)', 'Gunneridae(91827)', 'Pentapetalae(1437201)', 'rosids(71275)',
'malvids(91836)', 'Brassicales(3699)', 'Brassicaceae(3700)', 'Camelineae(980083)',
'Arabidopsis(3701)', 'Arabidopsis thaliana(3702)']
query lineage:
[1, 131567, 2759, 33090, 35493, 131221, 3193, 58023, 78536, 58024, 3398, 1437183,
71240, 91827, 1437201, 71275, 91836, 3699, 3700, 980083, 3701, 3702]
query_orthomap = plaza_v5_dicots_HOMFAM_orthomaps[plaza_v5_dicots_HOMFAM_orthomaps['taxID']==query_lineage[1]]
query_orthomap
>>> shortname common_name taxID ... PStaxID PSname PScontinuity
290600 ath Arabidopsis_thaliana 3702 ... 131221 Streptophytina 1.0
290601 ath Arabidopsis_thaliana 3702 ... 131221 Streptophytina 1.0
290602 ath Arabidopsis_thaliana 3702 ... 131221 Streptophytina 1.0
290603 ath Arabidopsis_thaliana 3702 ... 131221 Streptophytina 1.0
290604 ath Arabidopsis_thaliana 3702 ... 131221 Streptophytina 1.0
... ... ... ... ... ... ... ...
318250 ath Arabidopsis_thaliana 3702 ... 3702 Arabidopsis thaliana 1.0
318251 ath Arabidopsis_thaliana 3702 ... 3702 Arabidopsis thaliana 1.0
318252 ath Arabidopsis_thaliana 3702 ... 3702 Arabidopsis thaliana 1.0
318253 ath Arabidopsis_thaliana 3702 ... 3702 Arabidopsis thaliana 1.0
318254 ath Arabidopsis_thaliana 3702 ... 3702 Arabidopsis thaliana 1.0
[27655 rows x 9 columns]
oggmap - Steps
This section contains the main steps of oggmap to extract gene age information for a query species up to linking the extracted gene age classes and expression data of single-cell data sets.
Step 0 - run OrthoFinder: This tutorial introduces how to run your own OrthoFinder analysis.
Step 1 - get taxonomic information: This tutorial introduces how to get taxonomic information.
Step 2 - gene age class assignment: This tutorial introduces how to extract an orthomap (gene age class) from OrthoFinder results or how to import pre-calculated orthomaps.
Step 3 - map gene/transcript IDs: This tutorial introduces how to match gene or transcript IDs between an orthomap and scRNA data.
Step 4 - TEI calculation: This tutorial introduces how to add a transcriptome evolutionary index (short: TEI) to scRNA data.
Step 4 - other evolutionary indices: This tutorial introduces how to use other evolutionary indices like nucleotide diversity to calculate TEI.
oggmap - Downstream analysis
This section contains different downstream analysis options (Step 5).
plotting results: This tutorial introduces some basic concepts of plotting results.
relative expression: This tutorial introduces relative expression per gene age class and its contribution to the global TEI per cell or cell type.
partial TEI values: This tutorial introduces partial TEI and its contribution to the global TEI per cell or cell type.
Case studies
hematopoiesis - case study: Notebook - Mus musculus hematopoiesis scRNA data example.
nematode embryogenesis - case study: Notebook - Caenorhabditis elegans embryogenesis scRNA data example.
zebrafish embryogenesis - case study: Notebook - Danio rerio embryogenesis scRNA data example.
frog embryogenesis - case study: Notebook - Xenopus tropicalis embryogenesis scRNA data example.
mouse - case study: Notebook - Mus musculus embryogenesis scRNA data example.
hydra - case study: Notebook - Hydra vulgaris cell atlas scRNA data example.
Note
A demo dataset is available for each of the tutorial notebooks above. These datasets allow you to begin exploring oggmap even if you do not have any data at any step in the analysis pipeline.
Command line functions
Command line functions: This section highlight all oggmap functions that can be run via the command line.
myTAI - Function correspondance
Correspondance of myTAI and oggmap functions: This tutorial covers which oggmap functions correspond to myTAI functions.
Prerequisites
This tutorial assumes that you have basic Python programming experience. In particular, we assume you are familiar with using a notebook from the following python data science libraries: jupyter.
To better understand plotting and data access, the user should try to get familiar with the python libraries: pandas, matplotlib and seaborn.
oggmap is a python package but part of it can be run on the command line. For the installation of oggmap, we recommend using Anaconda (see here). If you are not familiar with Anaconda or python environment management, please use our pre-built docker image.
Code and data availability
We provide links for the notebook in each tutorial section.
You can download the demo input data in the notebooks using the oggmap.datasets module.
- Step 4 - TEI calculation
- Command line functions
- Step 4 - other evolutionary indices
- How to add other evolutionary indices to scRNA data
- oggmap: Step 4 - other evolutionary indices
- Notebook file
- Import libraries
- Import oggmap python package submodules
- Step 0, Step 1, Step 2 and Step 3
- Step 0 - Use different pre-calculated evolutionary indices
- Step 2 - gene based measurement (query species evolutionary index)
- Step 3 - map OrthoFinder gene names and scRNA gene/transcript names
- Step 4 - Get TEI values and add them to scRNA dataset
- Step 5 - downstream analysis
- oggmap: Step 4 - other evolutionary indices
- How to add other evolutionary indices to scRNA data
- frog embryogenesis - case study
- Xenopus tropicalis embryogenesis single-cell data analysis example
- Case study: re-analysis of frog (Xenopus tropicalis) embryogenesis single-cell data
- Notebook file
- Steps
- Import libraries
- Import oggmap python package submodules
- Step 0 - run OrthoFinder to obtain orthogroups
- Step 1 - get query species taxonomic lineage information
- Step 2 - gene age class assignment (query species orthomap)
- Step 3 - map OrthoFinder gene names and scRNA gene/transcript names
- Step 4 - Get TEI values and add them to scRNA dataset
- Step 5 - downstream analysis
- Case study: re-analysis of frog (Xenopus tropicalis) embryogenesis single-cell data
- Xenopus tropicalis embryogenesis single-cell data analysis example
- Step 3 - map gene/transcript IDs
- Step 2 - gene age class assignment
- How to extract an orthomap (gene age class) from OrthoFinder results
- hydra - case study
- Hydra vulgaris single-cell data analysis example
- Case study: re-analysis of hydra (Hydra vulgaris) single-cell data
- Notebook file
- Steps
- Import libraries
- Import oggmap python package submodules
- Step 0 - Use pre-calculated gene age classification
- Step 1 - get query species taxonomic lineage information
- Step 2 - gene age class assignment (query species orthomap)
- Step 3 - map OrthoFinder gene names and scRNA gene/transcript names
- Step 4 - Get TEI values and add them to scRNA dataset
- Step 5 - downstream analysis
- Boxplot gene age class per sample timepoint
- Get partial TEI values to visualize gene age class contributions
- Heatmap partial TEI per gene age class
- Heatmap partial TEI cumsum per gene age class and sample timepoint - first matrix
- Heatmap partial TEI per gene age class and sample timepoint - second matrix (frequencies)
- Heatmap partial TEI cumsum per gene age class and sample timepoint - second matrix (frequencies)
- Color UMAP/TSNE by TEI
- Case study: re-analysis of hydra (Hydra vulgaris) single-cell data
- Hydra vulgaris single-cell data analysis example
- mouse - case study
- Correspondance of myTAI and oggmap functions
- nematode embryogenesis - case study
- Caenorhabditis elegans embryogenesis single-cell data analysis example
- Case study: re-analysis of nematode (Caenorhabditis elegans) embryogenesis single-cell data
- Notebook file
- Steps
- Import libraries
- Import oggmap python package submodules
- Step 0 - Use pre-calculated gene age classification
- Step 0 - Use different pre-calculated evolutionary indices
- Step 1 - get query species taxonomic lineage information
- Step 2 - gene age class assignment (query species orthomap)
- Step 3 - map OrthoFinder gene names and scRNA gene/transcript names
- step 4 - Get TEI values and add them to scRNA dataset
- Step 5 - downstream analysis
- Boxplot gene age class per sample timepoint
- Boxplot TajimaD class per sample timepoint
- Boxplot Fst class per sample timepoint
- Boxplot NormalizedPi class per sample timepoint
- Boxplot gene age class per sample timepoint and add significance
- Boxplot gene age class per sample timepoint and per cell type
- Scatterplot TEI vs TajimaD and color each cell by sample timepoint per cell type
- Scatterplot TEI vs Fst and color each cell by sample timepoint per cell type
- Scatterplot TEI vs NormalizedPi and color each cell by sample timepoint per cell type
- Plot relative expression per gene age class per sample timepoint
- Get partial TEI values to visualize gene age class contributions
- Color UMAP/TSNE by TEI
- Case study: re-analysis of nematode (Caenorhabditis elegans) embryogenesis single-cell data
- Caenorhabditis elegans embryogenesis single-cell data analysis example
- Step 0 - run OrthoFinder
- hematopoiesis - case study
- Mus musculus hematopoiesis single-cell data analysis example
- Case study: re-analysis myeloid and erythroid differentiation of mouse (Mus musculus) single-cell data
- Notebook file
- Steps
- Import libraries
- Import oggmap python package submodules
- Step 0 - run OrthoFinder to obtain orthogroups
- Step 1 - get query species taxonomic lineage information
- Step 2 - gene age class assignment (query species orthomap)
- Step 3 - map OrthoFinder gene names and scRNA gene/transcript names
- Step 4 - get TEI values and add them to scRNA dataset
- Step 5 - downstream analysis
- Boxplot gene age class per cluster
- Use cluster names as cell-type
- Boxplot gene age class per cell type
- Plot relative expression per gene age class per cell type
- Get partial TEI values to visualize gene age class contributions
- Color UMAP/TSNE by TEI
- Identification of highly variable genes
- Log transformation
- PCA and neighbor calculations
- Cell clustering
- Embedding the neighborhood graph
- Color PAGA graph
- Case study: re-analysis myeloid and erythroid differentiation of mouse (Mus musculus) single-cell data
- Mus musculus hematopoiesis single-cell data analysis example
- plotting results
- partial TEI values
- Step 1 - get taxonomic information
- relative expression
- zebrafish embryogenesis - case study
- Danio rerio embryogenesis single-cell data analysis example
- Case study: re-analysis of zebrafish (Danio rerio) embryogenesis single-cell data
- Notebook file
- Steps
- Import libraries
- Import oggmap python package submodules
- Step 0 - run OrthoFinder to obtain orthogroups
- Step 1 - get query species taxonomic lineage information
- Step 2 - gene age class assignment (query species orthomap)
- Step 3 - map OrthoFinder gene names and scRNA gene/transcript names
- Step 4 - get TEI values and add them to scRNA dataset
- Step 5 - downstream analysis
- Boxplot gene age class per embryo stage
- Boxplot gene age class per embryo stage and add significance
- Boxplot gene age class per embryo stage and per cell type
- Plot relative expression per gene age class per sample stage
- Get partial TEI values to visualize gene age class contributions
- Gene age class contribution of one cell type
- Color UMAP/TSNE by TEI
- Case study: re-analysis of zebrafish (Danio rerio) embryogenesis single-cell data
- Danio rerio embryogenesis single-cell data analysis example