oggmap.orthomap2tei module

Author: Kristian K Ullrich date: February 2025 email: ullrich@evolbio.mpg.de License: GPL-3
oggmap.orthomap2tei.add_gene_age2adata_var(adata, gene_id, gene_age, keep='min', var_name='Phylostrata')
This function add gene age to an existing AnnData object.

Parameters:

adata (AnnData) – AnnData object of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

gene_id (list) – Expects GeneID column from orthomap DataFrame.

gene_age (list) – Expects GeneID column from orthomap DataFrame.

keep (str) – In case of duplicated GeneIDs with different Phylostrata assignments, either keep ‘min’ or ‘max’ value.

var_name (str) – Variable name to be used for gene age values in existing AnnData object.

Returns:

Altered AnnData.

Return type:

AnnData

Example
>>> import scanpy as sc
>>> from oggmap import datasets, orthomap2tei
>>> # download pre-calculated orthomap
>>> #query_orthomap = orthomap2tei.read_orthomap(orthomapfile='Sun2021_Orthomap.tsv')
>>> sun21_orthomap_file = datasets.sun21_orthomap(datapath='.')
>>> # load query species orthomap
>>> query_orthomap = orthomap2tei.read_orthomap(orthomapfile=sun21_orthomap_file)
>>> # download and load scRNA data
>>> #packer19_small = sc.read('packer19_small.h5ad')
>>> packer19_small = datasets.packer19_small(datapath='.')
>>> # add gene age values to existing adata object
>>> orthomap2tei.add_gene_age2adata_var(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'])
>>> packer19_small.var
oggmap.orthomap2tei.geneset_overlap(geneset1, geneset2)
This function shows the overlap of two lists. To check e.g. <GeneID> from an orthomap and <adata.var_names> from an AnnData object.

Parameters:

geneset1 (list) – List of gene or transcript names set 1.

geneset2 (list) – List of gene or transcript names set 2.

Returns:

Overlap.

Return type:

pandas.DataFrame

Example
>>> from oggmap import orthomap2tei
>>> geneset1 = ['g1.1', 'g1.2', 'g2.1', 'g3.1', 'g3.2']
>>> geneset2 = ['g1.1', 'g2.1', 'g3.1']
>>> orthomap2tei.geneset_overlap(geneset1, geneset2)
oggmap.orthomap2tei.get_bins(tobin_df, bincol, q=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], method='median_unbiased')

This function sorts values of a DataFrame column and return the binary categories in which they fall.

Parameters:

tobin_df (pandas.DataFrame) – DataFrame which contains the column that should be binned.

bincol (str) – Name of the columns that should be binned.

q (list) – Array of inner quantiles to be used for binning. Border quantiles will be set automatically.

method (str) – This parameter specifies the method to use for estimating the quantile.

Returns:

DataFrame with additional column which contain the binned values as categories.

Rytpe:

pandas.DataFrame
oggmap.orthomap2tei.get_e50(adata, gene_id, gene_age, keep='min', layer=None, group_by_var=None, var_type='mean', group_by_obs=None, obs_type='mean', standard_scale=None, normalize_total=False, log1p=False, target_sum=1000000.0, min_expr=1, max_expr=None)
Parameters:

adata

gene_id

gene_age

keep

layer

use

col_type

standard_scale

group_by

group_type

normalize_total

log1p

target_sum

chunk_size – Number of chunks.

min_expr

max_expr

Returns:

Example
>>>
oggmap.orthomap2tei.get_ematrix(adata, layer=None, group_by_var=None, var_type='mean', var_fillna='__NaN', group_by_obs=None, obs_type='mean', obs_fillna='__NaN', standard_scale=None, normalize_total=True, log1p=True, target_sum=1000000.0, chunk_size=100000)
This function computes expression profiles for all genes or group of genes ‘group_by_var’ (default: None).

The expression values are first combined per var type ‘var_type’ (default: mean).

The resulting values can be combined per observation group ‘group_by_obs’ e.g.: pre-defined cell types (default: None), according to the selected observation type ‘obs_type’ (default:’mean’) and further scaled between 0 and 1 (default: None) either per var (standard_scale=0) or per obs (standard_scale=1).

In detail, if standard_scale axis is set to None, the var_type mean/median/sum expression is being computed over cells and, if group_by_obs is not None, combined per given obs group by mean/median/sum.

In detail, if standard_scale axis is set to 0, the mean/median/sum relative expression profile is being computed over cells and, if group_by_obs is not None, combined per given obs group by mean/median/sum as follows:

f_c = (e_c - e_min)/(e_max - e_min)

where e_min and e_max denote either the minimum/maximum mean/median/sum expression level over cells c.

In detail, if standard_scale axis is set to 1, the mean/median/sum relative expression profile is being computed over gene age classes (phylostrata) and, if group_by_obs is not None, combined per given obs group by mean/median/sum as follows:

f_ps = (e_ps - e_min)/(e_max - e_min)

where e_min and e_max denote either the minimum/maximum mean/median/sum expression level over gene age class (phylostrata ps).

This linear transformation corresponds to a shift by e_min - e_max. As a result, the relative expression level f_c of cell c or f_ps of phylotstratum ps with minimum e_c or e_ps is 0, whereas the relative expression level f_c of cell c or f_ps of phylotstratum ps with maximum e_c or e_ps is 1, and the relative expression levels of all other cells c or phylostrata ps range between 0 and 1.

Parameters:

adata (AnnData) – AnnData object of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

layer (str) – Layer to work on instead of X. If None, X is used.

group_by_var (str) – AnnData variable to be used as a group to combine count values.

var_type (str) – Specify how values should be combined per variable group. Possible values are ‘mean’, ‘median’, ‘sum’, ‘min’ and ‘max’.

var_fillna (str) – Specify how NaN values should be named for variable.

group_by_obs (str) – AnnData observation to be used as a group to combine count values.

obs_type (str) – Specify how values should be combined per observation group. Possible values are ‘mean’, ‘median’, ‘sum’, ‘min’ and ‘max’.

obs_fillna (str) – Specify how NaN values should be named for observation.

standard_scale (int) – Wether or not to standardize the given axis (0: colums, 1: rows) between 0 and 1, meaning for each variable or group, subtract the minimum and divide each by its maximum.

normalize_total (bool) – Normalize counts per cell.

log1p (bool) – Logarithmize the data matrix.

target_sum (float) – After normalization, each observation (cell) has a total count equal to target_sum.

chunk_size (int) – Number of chunks.

Returns:

Expression profile DataFrame.

Return type:

pandas.DataFrame

Example
>>> import scanpy as sc
>>> import matplotlib.pyplot as plt
>>> import seaborn as sns
>>> from oggmap import orthomap2tei, datasets
>>> # download pre-calculated orthomap
>>> #query_orthomap = orthomap2tei.read_orthomap(orthomapfile='Sun2021_Orthomap.tsv')
>>> sun21_orthomap_file = datasets.sun21_orthomap(datapath='.')
>>> # load query species orthomap
>>> query_orthomap = orthomap2tei.read_orthomap(orthomapfile=sun21_orthomap_file)
>>> # download and load scRNA data
>>> #packer19_small = sc.read('packer19_small.h5ad')
>>> packer19_small = datasets.packer19_small(datapath='.')
>>> # add gene age values to existing adata object
>>> orthomap2tei.add_gene_age2adata_var(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'])
>>> packer19_small_ematrix_grouped = orthomap2tei.get_ematrix(
>>>     adata=packer19_small,
>>>     group_by_var='Phylostrata',
>>>     group_by_obs='embryo.time.bin')
>>> packer19_small_ematrix_grouped.transpose().plot.line(stacked=True, cmap='Accent')
>>> plt.show()
>>> sns.heatmap(packer19_small_ematrix_grouped, annot=True, cmap='viridis')
>>> plt.show()
>>> # normalize counts (transcript per million - tpm)
>>> packer19_small_ematrix_grouped_tpm = orthomap2tei.get_ematrix(
>>>     adata=packer19_small,
>>>     group_by_var='Phylostrata',
>>>     group_by_obs='embryo.time.bin',
>>>     normalize_total=True)
>>> packer19_small_ematrix_grouped_tpm.transpose().plot.line(stacked=True, cmap='Accent')
>>> plt.show()
>>> sns.heatmap(packer19_small_ematrix_grouped_tpm, annot=True, cmap='viridis')
>>> plt.show()
oggmap.orthomap2tei.get_geneset_overlap(geneset1, geneset2)
This function returns the overlap of two lists. To check e.g. <GeneID> from an orthomap and <adata.var_names> from an AnnData object.

Parameters:

geneset1 (list) – List of gene or transcript names set 1.

geneset2 (list) – List of gene or transcript names set 2.

Returns:

Overlap.

Return type:

list

Example
>>> from oggmap import orthomap2tei
>>> geneset1 = ['g1.1', 'g1.2', 'g2.1', 'g3.1', 'g3.2']
>>> geneset2 = ['g1.1', 'g2.1', 'g3.1']
>>> orthomap2tei.get_geneset_overlap(geneset1, geneset2)
oggmap.orthomap2tei.get_pmatrix(adata, gene_id, gene_age, keep='min', layer=None, layer_name='pmatrix', add_var=True, add_obs=True, normalize_total=True, log1p=True, target_sum=1000000.0, chunk_size=100000)
This function computes the partial transcriptome evolutionary index (TEI) values for each single gene.

Prior TEI calculation, counts can be normalized (default: False) to a total count number (default: 1e6) and log transformed (default: False).

In detail, each gene gets a TEI contribution profile as follows: deqn{TEI_is = f_is * ps_i} where TEI_is is the partial TEI value of gene i, eqn{f_is = e_is / sum e_is} and eqn{ps_i} is the phylostratum of gene i.

The partial TEI matrix can be used to perform different cluster analyses and also gives an overall impression of the contribution of each gene to the global TEI pattern.

Parameters:

adata (AnnData) – AnnData object of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

gene_id (list) – Expects GeneID column from orthomap DataFrame.

gene_age (list) – Expects Phylostratum column from orthomap DataFrame.

keep (str) – Either define ‘min’ (ascending pre-sorting) or ‘max’ (non-ascending pre-sorting) to keep duplicates.

layer (str) – Layer to work on instead of X. If None, X is used.

layer_name (str) – Layer to add to existing AnnData object.

add_var (bool) – Add original variables to new AnnData object.

add_obs (bool) – Add original observations to new AnnData object.

normalize_total (bool) – Normalize counts per cell prior TEI calculation.

log1p (bool) – Logarithmize the data matrix prior TEI calculation.

target_sum (float) – After normalization, each observation (cell) has a total count equal to target_sum.

chunk_size (int) – Number of chunks.

Returns:

Partial transcriptome evolutionary index (TEI) values.

Return type:

AnnData

Example
>>> import scanpy as sc
>>> from oggmap import orthomap2tei, datasets
>>> # download pre-calculated orthomap
>>> #query_orthomap = orthomap2tei.read_orthomap(orthomapfile='Sun2021_Orthomap.tsv')
>>> sun21_orthomap_file = datasets.sun21_orthomap(datapath='.')
>>> # load query species orthomap
>>> query_orthomap = orthomap2tei.read_orthomap(orthomapfile=sun21_orthomap_file)
>>> # download and load scRNA data
>>> #packer19_small = sc.read('packer19_small.h5ad')
>>> packer19_small = datasets.packer19_small(datapath='.')
>>> # get pmatrix as new adata object
>>> packer19_small_pmatrix = orthomap2tei.get_pmatrix(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'])
oggmap.orthomap2tei.get_pstrata(adata, gene_id, gene_age, keep='min', layer=None, cumsum=False, group_by_obs=None, obs_fillna='__NaN', obs_type='mean', standard_scale=None, normalize_total=True, log1p=True, target_sum=1000000.0, chunk_size=100000)
This function computes the partial transcriptome evolutionary index (TEI) values combined for each stratum.

The resulting values can be combined per observation group e.g.: pre-defined cell types (default: None), according to the selected observation type (default:’mean’) and further scaled between 0 and 1 (default: None) either per var (standard_scale=0) or per obs (standard_scale=1).

Prior TEI calculation, counts can be normalized (default: False) to a total count number (default: 1e6) and log transformed (default: False).

In detail, each gene gets a TEI contribution profile as follows: deqn{TEI_is = f_is * ps_i} where TEI_is is the partial TEI value of gene i, eqn{f_is = e_is / sum e_is} and eqn{ps_i} is the phylostratum of gene i.

eqn{TEI_is} values are combined per eqn{ps}.

The partial TEI values combined per strata give an overall impression of the contribution of each stratum to the global TEI pattern.

Parameters:

adata (AnnData) – AnnData object of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

gene_id (list) – Expects GeneID column from orthomap DataFrame.

gene_age (list) – Expects Phylostratum column from orthomap DataFrame.

keep (str) – Either define ‘min’ (ascending pre-sorting) or ‘max’ (non-ascending pre-sorting) to keep duplicates.

layer (str) – Layer to work on instead of X. If None, X is used.

cumsum (bool) – Return cumsum.

group_by_obs (str) – AnnData observation to be used as a group to combine partial transcriptome evolutionary index (TEI) values.

obs_fillna (str) – Specify how NaN values should be named for observation.

obs_type (str) – Specify how values should be combined per observation group. Possible values are ‘mean’, ‘median’, ‘sum’, ‘min’ and ‘max’.

standard_scale (int) – Wether or not to standardize the given axis (0: colums, 1: rows) between 0 and 1, meaning for each variable or group, subtract the minimum and divide each by its maximum.

normalize_total (bool) – Normalize counts per cell prior TEI calculation.

log1p (bool) – Logarithmize the data matrix prior TEI calculation.

target_sum (float) – After normalization, each observation (cell) has a total count equal to target_sum.

chunk_size (int) – Number of chunks.

Returns:

List of two DataFrame. First DataFrame contains the summed partial TEI values per strata. Second DataFrame contains summed partial TEI values divided by the corresponding global TEI value, which represent percentage of global TEI per strata.

Return type:

list

Example
>>> import scanpy as sc
>>> import matplotlib.pyplot as plt
>>> import seaborn as sns
>>> from oggmap import orthomap2tei, datasets
>>> # download pre-calculated orthomap
>>> #query_orthomap = orthomap2tei.read_orthomap(orthomapfile='Sun2021_Orthomap.tsv')
>>> sun21_orthomap_file = datasets.sun21_orthomap(datapath='.')
>>> # load query species orthomap
>>> query_orthomap = orthomap2tei.read_orthomap(orthomapfile=sun21_orthomap_file)
>>> # download and load scRNA data
>>> #packer19_small = sc.read('packer19_small.h5ad')
>>> packer19_small = datasets.packer19_small(datapath='.')
>>> # get pstrata
>>> packer19_small_pstrata = orthomap2tei.get_pstrata(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'])
>>> # get cumsum over strata
>>> packer19_small_pstrata_cumsum = orthomap2tei.get_pstrata(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'],
>>>     cumsum=True)
>>> # group by embryo.time.bin observation
>>> packer19_small_pstrata_grouped = orthomap2tei.get_pstrata(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'],
>>>     group_by_obs='embryo.time.bin')
>>> # plot strata as lines
>>> packer19_small_pstrata_grouped[0].transpose().plot.line(stacked=True, cmap='Accent')
>>> plt.show()
>>> # plot heatmap using partial TEI values
>>> sns.heatmap(packer19_small_pstrata_grouped[0], annot=True, cmap='viridis')
>>> plt.show()
>>> # plot heatmap using partial TEI percent
>>> sns.heatmap(packer19_small_pstrata_grouped[1], annot=True, cmap='viridis')
>>> plt.show()
oggmap.orthomap2tei.get_rematrix(adata, gene_id, gene_age, keep='min', layer=None, use='counts', var_type='mean', group_by_obs=None, obs_fillna='__NaN', obs_type='mean', standard_scale=None, normalize_total=True, log1p=True, target_sum=1000000.0, chunk_size=100000)
This function computes relative expression profiles.

In detail, if standard_scale axis is set to None, the mean/median/sum expression is being computed over cells and, if group_by_obs is not None, combined per given obs group by mean/median/sum.

In detail, if standard_scale axis is set to 0, the mean/median/sum relative expression profile is being computed over cells and, if group_by_obs is not None, combined per given obs group by mean/median/sum as follows:

f_c = (e_c - e_min)/(e_max - e_min)

where e_min and e_max denote either the minimum/maximum mean/median/sum expression level over cells c.

In detail, if standard_scale axis is set to 1, the mean/median/sum relative expression profile is being computed over gene age classes (phylostrata) and, if group_by_obs is not None, combined per given obs group by mean/median/sum as follows:

f_ps = (e_ps - e_min)/(e_max - e_min)

where e_min and e_max denote either the minimum/maximum mean/median/sum expression level over gene age class (phylostrata ps).

This linear transformation corresponds to a shift by e_min - e_max. As a result, the relative expression level f_c of cell c or f_ps of phylotstratum ps with minimum e_c or e_ps is 0, whereas the relative expression level f_c of cell c or f_ps of phylotstratum ps with maximum e_c or e_ps is 1, and the relative expression levels of all other cells c or phylostrata ps range between 0 and 1.

Parameters:

adata (AnnData) – The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

gene_id (list) – Expects GeneID column from orthomap DataFrame.

gene_age (list) – Expects Phylostratum column from orthomap DataFrame.

keep (str) – Either define ‘min’ (ascending pre-sorting) or ‘max’ (non-ascending pre-sorting) to keep duplicates.

layer (str) – Layer to work on instead of X. If None, X is used.

use (str) – Specify if counts from adata.X (default) should be combined per age group to calculate the relative expression or if the corresponding ‘pmatrix’ (partial TEI values, see get_pmatrix) or ‘wmatrix’ (gene aged weighted expression) should be used. If layer is not None adata.X refers to adata.layers[layer].

var_type (str) – Specify how values should be combined per variable group. Possible values are ‘mean’, ‘median’, ‘sum’, ‘min’ and ‘max’.

group_by_obs (str) – AnnData observation to be used as a group to combine count values.

obs_fillna (str) – Specify how NaN values should be named for observation.

obs_type (str) – Specify how values should be combined per observation group. Possible values are ‘mean’, ‘median’, ‘sum’, ‘min’ and ‘max’.

standard_scale (int) – Wether or not to standardize the given axis (0: colums, 1: rows) between 0 and 1, meaning for each variable or group, subtract the minimum and divide each by its maximum.

normalize_total (bool) – Normalize counts per cell prior TEI calculation.

log1p (bool) – Logarithmize the data matrix prior TEI calculation.

target_sum (float) – After normalization, each observation (cell) has a total count equal to target_sum.

chunk_size (int) – Number of chunks.

Returns:

Relative expression profile DataFrame.

Return type:

pandas.DataFrame

Example
>>> import scanpy as sc
>>> import matplotlib.pyplot as plt
>>> import seaborn as sns
>>> from oggmap import orthomap2tei, datasets
>>> # download pre-calculated orthomap
>>> #query_orthomap = orthomap2tei.read_orthomap(orthomapfile='Sun2021_Orthomap.tsv')
>>> sun21_orthomap_file = datasets.sun21_orthomap(datapath='.')
>>> # load query species orthomap
>>> query_orthomap = orthomap2tei.read_orthomap(orthomapfile=sun21_orthomap_file)
>>> # download and load scRNA data
>>> #packer19_small = sc.read('packer19_small.h5ad')
>>> packer19_small = datasets.packer19_small(datapath='.')
>>> # get rematrix
>>> packer19_small_rematrix = orthomap2tei.get_rematrix(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'])
>>> # group by embryo.time.bin observation
>>> packer19_small_rematrix_grouped = orthomap2tei.get_rematrix(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'],
>>>     group_by_obs='embryo.time.bin')
>>> # plot heatmap using partial TEI values
>>> sns.heatmap(packer19_small_rematrix_grouped, cmap='viridis')
>>> plt.show()
>>> # group by embryo.time.bin observation and scale over rows
>>> packer19_small_rematrix_grouped_rows = orthomap2tei.get_rematrix(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'],
>>>     group_by_obs='embryo.time.bin',
>>>     standard_scale=0)
>>> # plot heatmap using partial TEI values
>>> sns.heatmap(packer19_small_rematrix_grouped_rows, cmap='viridis')
>>> plt.show()
>>> # group by embryo.time.bin observation and scale over columns
>>> packer19_small_rematrix_grouped_columns = orthomap2tei.get_rematrix(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'],
>>>     group_by_obs='embryo.time.bin',
>>>     standard_scale=1)
>>> # plot heatmap using partial TEI values
>>> sns.heatmap(packer19_small_rematrix_grouped_columns, cmap='viridis')
>>> plt.show()
oggmap.orthomap2tei.get_tei(adata, gene_id, gene_age, keep='min', layer=None, add_var=True, var_name='Phylostrata', add_obs=True, obs_name='tei', boot=False, bt=10, normalize_total=True, log1p=True, target_sum=1000000.0, chunk_size=100000)
This function computes the phylogenetically based transcriptome evolutionary index (TEI) similar to Domazet-Loso & Tautz, 2010.

The TEI measure represents the weighted arithmetic mean (expression levels as weights for the phylostratum value) over all evolutionary age categories denoted as _phylostra_.

:: math::
deqn{TEI_s = sum (e_is * ps_i) / sum e_is}

where eqn{TEI_s} denotes the TEI value in developmental stage eqn{s, e_is} denotes the gene expression level of gene eqn{i} in stage eqn{s}, and eqn{ps_i} denotes the corresponding phylostratum of gene eqn{i, i = 1,…,N} and eqn{N = total number of genes}.

If the parameter boot is set to true, the strata values are sampled and the global TEI is calculated bt times.

Parameters:

adata (AnnData) – AnnData object of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

gene_id (list) – Expects GeneID column from orthomap DataFrame.

gene_age (list) – Expects GeneID column from orthomap DataFrame.

keep (str) – In case of duplicated GeneIDs with different Phylostrata assignments, either keep ‘min’ or ‘max’ value.

layer (str) – Layer to work on instead of X. If None, X is used.

add_var (bool) – Add gene age values as variable to existing AnnData object using var_name.

var_name (str) – Variable name to be used for gene age values in existing AnnData object.

add_obs (bool) – Add TEI values as observation to existing AnnData object using obs_name.

obs_name (str) – Observation name to be used for TEI values in existing AnnData object.

boot (bool) – Specify if bootstrap TEI values should be calculated and returned as DataFrame.

bt (int) – Number of bootstrap to calculate.

normalize_total (bool) – Normalize counts per cell prior TEI calculation.

log1p (bool) – Logarithmize the data matrix prior TEI calculation.

target_sum (float) – After normalization, each observation (cell) has a total count equal to target_sum.

chunk_size (int) – Number of chunks.

Returns:

Transcriptome evolutionary index (TEI) values.

Return type:

pandas.DataFrame

Example
>>> import scanpy as sc
>>> import matplotlib.pyplot as plt
>>> import seaborn as sns
>>> from statannot import add_stat_annotation
>>> from statannotations.Annotator import Annotator
>>> from oggmap import datasets, orthomap2tei
>>> # download pre-calculated orthomap
>>> #query_orthomap = orthomap2tei.read_orthomap(orthomapfile='Sun2021_Orthomap.tsv')
>>> sun21_orthomap_file = datasets.sun21_orthomap(datapath='.')
>>> # load query species orthomap
>>> query_orthomap = orthomap2tei.read_orthomap(orthomapfile=sun21_orthomap_file)
>>> # download and load scRNA data
>>> #packer19_small = sc.read('packer19_small.h5ad')
>>> packer19_small = datasets.packer19_small(datapath='.')
>>> # add TEI values to existing adata object
>>> orthomap2tei.get_tei(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'],
>>>     add_var=True,
>>>     add_obs=True)
>>> # plot tei boxplot grouped by embryo.time.bin observation
>>> ax = sns.boxplot(
>>>     x='embryo.time.bin',
>>>     y='tei',
>>>     data=packer19_small.obs)
>>> test_results = Annotator(
>>>     ax,
>>>     x='embryo.time.bin',
>>>     y='tei',
>>>     pairs=orthomap2tei._get_pairwise_comb_self(
>>>         list1=packer19_small.obs['embryo.time.bin'].value_counts().index),
>>>     data=packer19_small.obs)
>>> test_results.configure(
>>>     test='Mann-Whitney',
>>>     text_format='star',
>>>     loc='outside',
>>>     verbose=2)
>>> test_results.apply_and_annotate()
>>> plt.show()
>>> # plot tei violinplot for each cell.type grouped by cell.type and embryo.time.bin observation
>>> # create new observation as a combination from embryo.time.bin and cell.type
>>> packer19_small.obs['etb_cell.type'] = packer19_small.obs[['embryo.time.bin', 'cell.type']].apply(
>>>     lambda x: str(x[0]) + '_' + x[1], axis=1)
>>> # convert into category
>>> packer19_small.obs['etb_cell.type'] = packer19_small.obs['etb_cell.type'].astype('category')
>>> # reorder categories
>>> packer19_small.obs['etb_cell.type'] = packer19_small.obs['etb_cell.type'].cat
>>>     .reorder_categories(list(packer19_small.obs['etb_cell.type']
>>>     .value_counts().index[np.argsort([int(x.split('_')[0]) for x in
>>>     list(packer19_small.obs['etb_cell.type'].value_counts().index)])]))
>>> for c in packer19_small.obs['cell.type'].value_counts().index:
>>>     plt.figure()
>>>     sns.violinplot(
>>>     x=packer19_small.obs[packer19_small.obs['cell.type'].isin([c])]['etb_cell.type'].cat
>>>     .remove_unused_categories(),
>>>     y='tei',
>>>     data=packer19_small.obs[packer19_small.obs['cell.type'].isin([c])])
>>> plt.show()
>>> # get 10 bootstrap TEI values
>>> orthomap2tei.get_tei(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'],
>>>     boot=True,
>>>     bt=10)
oggmap.orthomap2tei.mergeby_from_counts(adata, layer=None, group_by_var=None, var_fillna='__NaN', group_by_obs=None, obs_fillna='__NaN', level='obs', min_expr=None, max_expr=None, normalize_total=False, log1p=False, target_sum=1000000.0, chunk_size=100000)
This function groups all counts of an existing AnnData object as an array based on variable or observation groups. The resulting pandas.DataFrame can be used to e.g. apply statistics or visualize the groups more easily.

Parameters:

adata (AnnData) – The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

layer (str) – Layer to work on instead of X. If None, X is used.

group_by_var (str) – AnnData variable to be used as a group to combine count values.

var_fillna (str) – Specify how NaN values should be named for variable.

group_by_obs (str) – AnnData observation to be used as a group to combine count values.

obs_fillna (str) – Specify how NaN values should be named for observation.

level (str) – Specify if observation or variable should be used as primary group (only effects output orientation).

min_expr (float) – Specify minimal expression to be included.

max_expr (float) – Specify maximal expression to be included.

normalize_total (bool) – Normalize counts per cell.

log1p (bool) – Logarithmize the data matrix.

target_sum (float) – After normalization, each observation (cell) has a total count equal to target_sum.

chunk_size (int) – Number of chunks.

Returns:

List of three DataFrame. First DataFrame contains the grouped data (each cell contains a numpy.ndarray). Second DataFrame contains original variable and observation assignment and groupings.

Return type:

list

Example
>>> import scanpy as sc
>>> import matplotlib.pyplot as plt
>>> import seaborn as sns
>>> from oggmap import orthomap2tei, datasets
>>> # download and load scRNA data
>>> #packer19_small = sc.read('packer19_small.h5ad')
>>> packer19_small = datasets.packer19_small(datapath='.')
>>> # add gene age values to existing adata object
>>> orthomap2tei.add_gene_age2adata_var(
>>>     adata=packer19_small,
>>>     gene_id=query_orthomap['GeneID'],
>>>     gene_age=query_orthomap['Phylostratum'])
>>> # get group counts
>>> packer19_small_group_counts = orthomap2tei.mergeby_from_counts(
>>>     adata=packer19_small,
>>>     group_by_var='Phylostrata',
>>>     group_by_obs='embryo.time.bin')
oggmap.orthomap2tei.mergeby_from_dataframe(df, col_group=None, col_fillna='__NaN', row_group=None, row_fillna='__NaN', level='col', min_expr=None, max_expr=None)

This function groups all values of an existing pandas.DataFrame as an array based on column or row groups. The resulting pandas.DataFrame can be used to e.g. apply statistics or visualize the groups more easily.

Parameters:

df (pandas.DataFrame) – DataFrame to be grouped

col_group (numpy.ndarray) – numpy.ndarray containing group assignment for columns (needs to be same length and same order as Columns of df).

col_fillna (str)

row_group (numpy.ndarray) – numpy.ndarray containing group assignment for row (needs to be same length and same order as Index of df).

row_fillna (str)

level (str) – Specify if col or row should be used as primary group (only effects output orientation).

min_expr (float) – Specify minimal expression to be included.

max_expr (float) – Specify maximal expression to be included.

Returns:

List of three DataFrame. First DataFrame contains the grouped data (each cell contains a numpy.ndarray). Second DataFrame contains original col and third row assignment and groupings.

Return type:

list
oggmap.orthomap2tei.read_orthomap(orthomapfile)
This function reads a pre-calculated orthomap file <GeneID><tab><Phylostratum>.

Parameters:

orthomapfile (str) – File name of pre-calculated orthomap file.

Returns:

Orthomap.

Return type:

pandas.DataFrame

Example
>>> from oggmap import orthomap2tei, datasets
>>> # download pre-calculated orthomap
>>> #query_orthomap = orthomap2tei.read_orthomap(orthomapfile='Sun2021_Orthomap.tsv')
>>> sun21_orthomap_file = datasets.sun21_orthomap(datapath='.')
>>> # load query species orthomap
>>> query_orthomap = orthomap2tei.read_orthomap(orthomapfile=sun21_orthomap_file)
>>> query_orthomap
oggmap.orthomap2tei.replace_by(x_orig, xmatch, xreplace, keep=False)
This function assumes that <x_orig> and <xmatch> match and will return <xreplace> sorted by <x_orig>. It is mandatory that <xmatch> and <xreplace> have the same length and reflect pairs: <xmatch[0]> is the original value and <xreplace[0]> is the corresponding new value.

Parameters:

x_orig (list) – List of original values to be used for sorting.

xmatch (list) – List of matches to the original values. Each xmatch position pairs with xreplace position.

xreplace (list) – List of replace values. Each xreplace position pairs with xmatch position.

keep (bool) – Keep x_orig value instead of NaN if no match is found.

Returns:

Replacement ordered by the original values.

Return type:

list

Example
>>> from oggmap import orthomap2tei
>>> geneset1 = ['g1.1', 'g1.2', 'g2.1', 'g3.1', 'g3.2']
>>> geneset2 = ['g1.1', 'g2.1', 'g3.1', 'g5.1']
>>> transcriptset2 = ['t1.1', 't2.1', 't3.1', 't5.1']
>>> orthomap2tei.replace_by(
>>>     x_orig=geneset1,
>>>     xmatch=geneset2,
>>>     xreplace=transcriptset2)