oggmap.datasets module
Author: Kristian K Ullrich date: February 2025 email: ullrich@evolbio.mpg.de License: GPL-3
- oggmap.datasets.broccoli_example(datapath='.')
Broccoli results (default settings) for translated coding sequences (CDS) from four plant sepcies (keeping only longest isoforms) A. lyrata, A. thaliana, C. hirsuta and C. rubella.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7242263
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Broccoli.GeneCount file, OrthoGroups file and species list file.
- Return type:
list of str
Example
>>> from oggmap import datasets >>> datasets.broccoli_example(datapath='.')
- oggmap.datasets.cazet22(datapath='.')
scRNA count data for Hydra vulgaris from:
Cazet, Jack, Stefan Siebert, Hannah Morris Little, Philip Bertemes, Abby S. Primack, Peter Ladurner, Matthias Achrainer et al. (2022) New Hydra genomes reveal conserved principles of hydrozoan transcriptional regulation., bioRxiv, 2022.06.21.496857.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7366178
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
AnnData object.
- Return type:
AnnData
Example
>>> from oggmap import datasets >>> datasets.cazet22(datapath='.')
- oggmap.datasets.cazet22_orthomap(datapath='.')
Pre-calculated orthomap for Hydra vulgaris from:
Cazet, Jack, Stefan Siebert, Hannah Morris Little, Philip Bertemes, Abby S. Primack, Peter Ladurner, Matthias Achrainer et al. (2022) New Hydra genomes reveal conserved principles of hydrozoan transcriptional regulation., bioRxiv, 2022.06.21.496857.
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Orthomap file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.cazet22_orthomap(datapath='.')
- oggmap.datasets.ensembl105(datapath='.')
OrthoFinder results (-S diamond_ultra_sens) for all translated coding sequences (CDS) from Ensembl release-105 (keeping only longest isoforms) and Xtropicalisv9.0.Named.primaryTrs.pep.fa from www.xenbase.org.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7242263
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Orthogroups.GeneCount file, OrthoGroups file and species list file.
- Return type:
list of str
Example
>>> from oggmap import datasets >>> datasets.ensembl105(datapath='.')
- oggmap.datasets.ensembl110_last(datapath='.')
OrthoFinder results (-S last) for all translated coding sequences (CDS) from Ensembl release-110 (keeping only longest isoforms) and Xtropicalisv9.0.Named.primaryTrs.pep.fa from www.xenbase.org.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7242263
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Orthogroups.GeneCount file, OrthoGroups file and species list file.
- Return type:
list of str
Example
>>> from oggmap import datasets >>> datasets.ensembl110_last(datapath='.')
- oggmap.datasets.ensembl113_last(datapath='.')
OrthoFinder results (-S last) for all translated coding sequences (CDS) from Ensembl release-113 (keeping only longest isoforms) and Xtropicalisv9.0.Named.primaryTrs.pep.fa from www.xenbase.org.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7242263
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Orthogroups.GeneCount file, OrthoGroups file and species list file.
- Return type:
list of str
Example
>>> from oggmap import datasets >>> datasets.ensembl113_last(datapath='.')
- oggmap.datasets.ma21_fst(datapath='.')
Pre-calculated TajimaD, NormalizedPi, FayWu and Fst for Caenorhabditis elegans from:
Ma, F., Lau, C.Y. and Zheng, C., 2021. Large genetic diversity and strong positive selection in F-box and GPCR genes among the wild isolates of Caenorhabditis elegans. Genome Biology and Evolution, 13(5), p.evab048.
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to diversity file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.ma21_fst(datapath='.')
- oggmap.datasets.mouse_ensembl105_gtf(datapath='.')
Download GTF for species Mus musculus from ensembl release-105 https://ftp.ensembl.org/pub/release-105/gtf/mus_musculus/
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to GTF file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.mouse_ensembl105_gtf(datapath='.')
- oggmap.datasets.mouse_ensembl110_gtf(datapath='.')
Download GTF for species Mus musculus from ensembl release-110 https://ftp.ensembl.org/pub/release-110/gtf/mus_musculus/
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to GTF file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.mouse_ensembl110_gtf(datapath='.')
- oggmap.datasets.mouse_ensembl113_gtf(datapath='.')
Download GTF for species Mus musculus from ensembl release-113 https://ftp.ensembl.org/pub/release-113/gtf/mus_musculus/
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to GTF file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.mouse_ensembl113_gtf(datapath='.')
- oggmap.datasets.mouse_synonyms(datapath='.')
Mus musculus gene synonyms from here: https://github.com/mustafapir/geneName
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Synonyms file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.mouse_synonyms(datapath='.')
- oggmap.datasets.mytai_example(datapath='.')
expression count data for Arabidopsis thaliana from:
Drost, H.G., Janitza, P., Grosse, I. and Quint, M., 2017. Cross-kingdom comparison of the developmental hourglass. Current Opinion in Genetics & Development, 45, pp.69-75.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7242263
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
AnnData object.
- Return type:
AnnData
Example
>>> from oggmap import datasets >>> datasets.mytai_example(datapath='.')
- oggmap.datasets.packer19(datapath='.')
scRNA count data for Caenorhabditis elegans from:
Packer, J.S., Zhu, Q., Huynh, C., Sivaramakrishnan, P., Preston, E., Dueck, H., Stefanik, D., Tan, K., Trapnell, C., Kim, J. and Waterston, R.H., 2019. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science, 365(6459), p.eaax1971.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7245547
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
AnnData object.
- Return type:
AnnData
Example
>>> from oggmap import datasets >>> datasets.packer19(datapath='.')
- oggmap.datasets.packer19_small(datapath='.')
scRNA count data for Caenorhabditis elegans from:
Packer, J.S., Zhu, Q., Huynh, C., Sivaramakrishnan, P., Preston, E., Dueck, H., Stefanik, D., Tan, K., Trapnell, C., Kim, J. and Waterston, R.H., 2019. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science, 365(6459), p.eaax1971.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7245547
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
AnnData object.
- Return type:
AnnData
Example
>>> from oggmap import datasets >>> datasets.packer19_small(datapath='.')
- oggmap.datasets.qiu22_frog(datapath='.')
combined scRNA count data for Xenopus tropicalis from:
Qiu, C., Cao, J., Martin, B.K., Li, T., Welsh, I.C., Srivatsan, S., Huang, X., Calderon, D., Noble, W.S., Disteche, C.M. and Murray, S.A., 2022. Systematic reconstruction of cellular trajectories across mouse embryogenesis. Nature genetics, 54(3), pp.328-341.
original scRNA count data from:
Briggs, J.A., Weinreb, C., Wagner, D.E., Megason, S., Peshkin, L., Kirschner, M.W. and Klein, A.M., 2018. The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution. Science, 360(6392), p.eaar5780.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7244440
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
AnnData object.
- Return type:
AnnData
Example
>>> from oggmap import datasets >>> datasets.qiu22_frog(datapath='.')
- oggmap.datasets.qiu22_mouse(datapath='.')
combined scRNA count data for Mus musculus from:
Qiu, C., Cao, J., Martin, B.K., Li, T., Welsh, I.C., Srivatsan, S., Huang, X., Calderon, D., Noble, W.S., Disteche, C.M. and Murray, S.A., 2022. Systematic reconstruction of cellular trajectories across mouse embryogenesis. Nature genetics, 54(3), pp.328-341.
original scRNA count data from:
Mohammed, H., Hernando-Herraez, I., Savino, A., Scialdone, A., Macaulay, I., Mulas, C., Chandra, T., Voet, T., Dean, W., Nichols, J. and Marioni, J.C., 2017. Single-cell landscape of transcriptional heterogeneity and cell fate decisions during mouse early gastrulation. Cell reports, 20(5), pp.1215-1228.
Cheng, S., Pei, Y., He, L., Peng, G., Reinius, B., Tam, P.P., Jing, N. and Deng, Q., 2019. Single-cell RNA-seq reveals cellular heterogeneity of pluripotency transition and X chromosome dynamics during early mouse development. Cell reports, 26(10), pp.2593-2607.
Pijuan-Sala, B., Griffiths, J.A., Guibentif, C., Hiscock, T.W., Jawaid, W., Calero-Nieto, F.J., Mulas, C., Ibarra-Soria, X., Tyser, R.C., Ho, D.L.L. and Reik, W., 2019. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature, 566(7745), pp.490-495.
Cao, J., Spielmann, M., Qiu, X., Huang, X., Ibrahim, D.M., Hill, A.J., Zhang, F., Mundlos, S., Christiansen, L., Steemers, F.J. and Trapnell, C., 2019. The single-cell transcriptional landscape of mammalian organogenesis. Nature, 566(7745), pp.496-502.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7244567
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
AnnData object.
- Return type:
AnnData
Example
>>> from oggmap import datasets >>> datasets.qiu22_mouse(datapath='.')
- oggmap.datasets.qiu22_zebrafish(datapath='.')
combined scRNA count data for Danio rerio from:
Qiu, C., Cao, J., Martin, B.K., Li, T., Welsh, I.C., Srivatsan, S., Huang, X., Calderon, D., Noble, W.S., Disteche, C.M. and Murray, S.A., 2022. Systematic reconstruction of cellular trajectories across mouse embryogenesis. Nature genetics, 54(3), pp.328-341.
original scRNA count data from:
Farrell, J.A., Wang, Y., Riesenfeld, S.J., Shekhar, K., Regev, A. and Schier, A.F., 2018. Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis. Science, 360(6392), p.eaar3131.
Wagner, D.E., Weinreb, C., Collins, Z.M., Briggs, J.A., Megason, S.G. and Klein, A.M., 2018. Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo. Science, 360(6392), pp.981-987.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7243602
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
AnnData object.
- Return type:
AnnData
Example
>>> from oggmap import datasets >>> datasets.qiu22_zebrafish(datapath='.')
- oggmap.datasets.sun21_orthomap(datapath='.')
Pre-calculated orthomap for Caenorhabditis elegans from:
Sun, S., Rödelsperger, C. and Sommer, R.J., 2021. Single worm transcriptomics identifies a developmental core network of oscillating genes with deep conservation across nematodes. Genome research, 31(9), pp.1590-1601.
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Orthomap file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.sun21_orthomap(datapath='.')
- oggmap.datasets.ws288(datapath='.')
OrthoFinder results (-S last) for all translated coding sequences (CDS) from WormBase release-WS288, WormBase ParaSite release-WBPS18 (keeping only longest isoforms) and dd_Smed_v6.pcf.contigs.fasta (transdecoder and miniprothint peptides) from https://planmine.mpibpc.mpg.de.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7242263
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Orthogroups.GeneCount file, OrthoGroups file and species list file.
- Return type:
list of str
Example
>>> from oggmap import datasets >>> datasets.ws288(datapath='.')
- oggmap.datasets.zebrafish_ensembl105_gtf(datapath='.')
Download GTF for species Danio rerio from ensembl release-105 https://ftp.ensembl.org/pub/release-105/gtf/danio_rerio/
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to GTF file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.zebrafish_ensembl105_gtf(datapath='.')
- oggmap.datasets.zebrafish_ensembl105_orthomap(datapath='.')
Pre-calculated and gene ID matched orthomap for Danio rerio extracted from OrthoFinder results:
OrthoFinder results for all translated coding sequences (CDS) from Ensembl release-105 (keeping only longest isoforms) and Xtropicalisv9.0.Named.primaryTrs.pep.fa from www.xenbase.org.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7242263
Gene ID matching was done using the following GTF file for species Danio rerio from ensembl release-105: https://ftp.ensembl.org/pub/release-105/gtf/danio_rerio/
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Orthomap file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.zebrafish_ensembl105_orthomap(datapath='.')
- oggmap.datasets.zebrafish_ensembl110_gtf(datapath='.')
Download GTF for species Danio rerio from ensembl release-110 https://ftp.ensembl.org/pub/release-110/gtf/danio_rerio/
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to GTF file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.zebrafish_ensembl110_gtf(datapath='.')
- oggmap.datasets.zebrafish_ensembl110_orthomap(datapath='.')
Pre-calculated and gene ID matched orthomap for Danio rerio extracted from OrthoFinder results:
OrthoFinder results for all translated coding sequences (CDS) from Ensembl release-110 (keeping only longest isoforms) and Xtropicalisv9.0.Named.primaryTrs.pep.fa from www.xenbase.org.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7242263
Gene ID matching was done using the following GTF file for species Danio rerio from ensembl release-110: https://ftp.ensembl.org/pub/release-110/gtf/danio_rerio/
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Orthomap file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.zebrafish_ensembl110_orthomap(datapath='.')
- oggmap.datasets.zebrafish_ensembl113_gtf(datapath='.')
Download GTF for species Danio rerio from ensembl release-113 https://ftp.ensembl.org/pub/release-113/gtf/danio_rerio/
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to GTF file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.zebrafish_ensembl113_gtf(datapath='.')
- oggmap.datasets.zebrafish_ensembl113_orthomap(datapath='.')
Pre-calculated and gene ID matched orthomap for Danio rerio extracted from OrthoFinder results:
OrthoFinder results for all translated coding sequences (CDS) from Ensembl release-113 (keeping only longest isoforms) and Xtropicalisv9.0.Named.primaryTrs.pep.fa from www.xenbase.org.
All files can be obtained from here: https://doi.org/10.5281/zenodo.7242263
Gene ID matching was done using the following GTF file for species Danio rerio from ensembl release-113: https://ftp.ensembl.org/pub/release-113/gtf/danio_rerio/
- Parameters:
datapath (str) – Path to safe dataset.
- Returns:
Path to Orthomap file.
- Return type:
str
Example
>>> from oggmap import datasets >>> datasets.zebrafish_ensembl113_orthomap(datapath='.')