MANAclust: unsupervised multi-omic/clinical clustering

Merged Affinity Network Association clustering

MANAclust is a python based command-line tool that integrates multi-omics and clinical data for unsupervised clustering. Check out the paper for more details!

How to install MANAclust

You can install MANAclust from pip with the command line below:

Or you can download the source below, and intstall following the instructions on the repository site.

Install

Get the code

Get the code!

MANAclust is freely available under the AGPLv3 license. It can be obtained here.

How do I use it?

Basic "Hello World" example

To try it out, you can download a dummy cohort here! (It's also found in the "test" directory of the respository: "Get the code"), The files are labeled as either categorical or numeric for clarity. You can use MANAclust with the following command line:

Input

MANAclust can take as input numeric, categorical, or mixed categorical/numeric data files in tab-delimited format. MANAclust will automatically match the correct sample/subject IDs with each other across datasets; they don't have to be in the same order, and each sample doesn't even have to be in each dataset because MANAclust is fully missing-data compatible.

Categorical Datasets

Categorical datasets can be anything that doesn't fit nicely as floating-point numbers - this could be genetic data, environmental surveys, medical records, lab test results, etc... The format for categorical data is subject/sample IDs in the first column and the variables in each of the following columns, where each row corresponds to a subject/sample. These are fed in after the "-cat" flag at the command line. If there are some missing data in your dataset, you can specify which text indicates missing data with the "-md" flag.

Numeric Datasets

Numeric datasets are in traditional 'omics' form. Tab-delimited text files with the features (genes, loci, bacteria ID, etc) in the rows, where the first column corresponds to their IDs. The first row corresponds to the sample ID header; i.e.: each column after the first is a sample's expression, abundance, etc. These are fed in after the "-num" flag at the command line.

Test Datasets

If there are datasets (categorical or numeric), that you DON'T want to be used for clustering, but you do want analyzed, to see if the clusters identified differ by this test dataset, you can feed these in with the "-test_num" and "-test_cat" flags.

Output

MANAclust can take as input numeric, categorical, or mixed-datasets, (numeric vars are digitized and treated as categorical in this case). What is output are

Categorical and numeric feature selection
Affinity matrix construction and integration
Unsupervised clustering for joint multi-omic cluster identification
Consensus group identification for all input datasets
Identification of differential abundance of all features for all datasets across final clusters and consensus groups
Identification of the consensus groups that are significantly concordant and discordant with one another

Understanding the Output

To help you understand the output in more detail, MANAclust creates an html web-page file called "MANAclust_summary.html" that is created in the main output directory. This will give you a walkthrough of all the results and figures, explaining what they all mean.

How do I use it?