PyMINEr is a python package for analyzing scRNAseq data (although, really you could use it for anything that's a 2D matrix!). To install it, you'll first need to install python3 (preferably >=3.7). You can do this over at anaconda, or you could install it at the command line.
python3 -m pip install bio-pyminer
Sometimes getting all the right dependencies installed can be a pain & the dependencies of one program can break another. To get around this, you can use Docker. There are plenty of tutorials out there on this. Once you're familiar with how to use docker though, you can pull the latest PyMINEr docker image like so:
docker pull scottyler89/pyminer
docker run -it --name first_try -v <path_to_data>:/data scottyler89/pyminer
Then, when you're finished with using PyMINEr, you can log out of the docker image by typing "exit" at the command line.
Using PyMINEr with Scanpy
If you're already familiar with Scanpy for scRNAseq analysis, then using PyMINEr with Scanpy should be super easy! All you have to do is read in and process your data however you'd like, then
from pyminer.pyminer import pyminer_analysis
pm_analysis = pyminer_analysis(adata=adata,
analysis_dir = "pyminer_analysis/",
This yields you a pyminer results object that should be very easy to navigate and explore because the help and descriptions are built into the print function!
PyMINEr object: Each of the elements contained within the PyMINEr object is it's own class You can print each of these objects to get the help section on it.
<object>.gene_anno: an object that contains tables annotating the input genes
<object>.clust: This object has lots of info on the clusters, differential expression, pathway analysis, etc. just do print(<object>.clust) to get the details!
<object>.goi: If you used genes of interest, the direcotry of the output plots and the shortest-paths of all other genes to them are located here.
<object>.network: The co-expression network and details on gene-module usage, module pathway analysis, etc
<object>.ap: Details of the autocrine/paracrine signaling between clusters and pathway analysis of those signaling networks. There is also the PyMINEr website file that provides a walk-through of the results: /home/scott/Downloads/temp/pyminer_analysis/PyMINEr_summary.html
A basic example of using PyMINEr
Here I'll walk you through a basic example of how to use PyMINEr with a scRNAseq dataset using the default parameters. Think of this as the "Hello World" for using PyMINEr.
The input file is: here
Genes of interest: here
Note that typically, the expression input you give PyMINEr should be the log-transformed and normalized expression matrix.
A list of the gProfiler accepted species codes is listed here: https://biit.cs.ut.ee/gprofiler/page/organism-list
Interpreting the Results
Here I'll give your walk-through on how to interpret the PyMINEr results.
PyMINEr for Large Datasets
Sometimes you've got more data than your computer can actually handle all at once. To address this issue, we have a script that will convert your file to a PyMINEr compatible HDF5 file. I'll show you how to do it here. As a basic example, we'll start out with the same input file as before. I'll also show you how to use the genes of interest option, using this file.
To convert a tab-delimited text file into a PyMINEr compatible hdf5 file, type:
This makes the hdf5 file as well as the corresponding row and column annotation text files which need to get fed into pyminer with the arguments: -ids <ID_list.txt> -cols <column_IDs.txt>
Using custom cell types/sample groups
If you're running PyMINEr on a traditional dataset with a priori groups (i.e.: WT vs KO, etc) - you can provide these groupings to PyMINEr. If you're doing scRNAseq, but want to use an algorithm not included in PyMINEr for cell type identification, you can provide those gropuings here as well. Above is the tutorial for how to do that.