Merged Affinity Network Association clustering
MANAclust is a python based command-line tool that integrates multi-omics and clinical data for unsupervised clustering. Check out the paper for more details!
How do I use it?
Basic "Hello World" example
In the package (found in "Get the code"), there is a directory called "test" - this contains some files that can be used as an example input. The files are labeled as either categorical or numeric for clarity. You can use MANAclust with the following command line:
MANAclust can take as input numeric, categorical, or mixed categorical/numeric data files in tab-delimited format. MANAclust will automatically match the correct sample/subject IDs with each other across datasets; they don't have to be in the same order, and each sample doesn't even have to be in each dataset because MANAclust is fully missing-data compatible.
Categorical datasets can be anything that doesn't fit nicely as floating-point numbers - this could be genetic data, environmental surveys, medical records, lab test results, etc... The format for categorical data is subject/sample IDs in the first column and the variables in each of the following columns, where each row corresponds to a subject/sample. These are fed in after the "-cat" flag at the command line. If there are some missing data in your dataset, you can specify which text indicates missing data with the "-md" flag.
Numeric datasets are in traditional 'omics' form. Tab-delimited text files with the features (genes, loci, bacteria ID, etc) in the rows, where the first column corresponds to their IDs. The first row corresponds to the sample ID header; i.e.: each column after the first is a sample's expression, abundance, etc. These are fed in after the "-num" flag at the command line.
If there are datasets (categorical or numeric), that you DON'T want to be used for clustering, but you do want analyzed, to see if the clusters identified differ by this test dataset, you can feed these in with the "-test_num" and "-test_cat" flags.
MANAclust can take as input numeric, categorical, or mixed-datasets, (numeric vars are digitized and treated as categorical in this case). What is output are
Categorical and numeric feature selection
Affinity matrix construction and integration
Unsupervised clustering for joint multi-omic cluster identification
Consensus group identification for all input datasets
Identification of differential abundance of all features for all datasets across final clusters and consensus groups
Identification of the consensus groups that are significantly concordant and discordant with one another
Understanding the Output
To help you understand the output in more detail, MANAclust creates an html web-page file called "MANAclust_summary.html" that is created in the main output directory. This will give you a walkthrough of all the results and figures, explaining what they all mean.