Each cell intrinsically can have thousands of dimensions, i.e. cell cycle, gene regulation, splicing, velocity etc. For computational conveniences, we tend to reduce the dimensionality for single cell clustering to identify common or rare cell types/subtypes. Although this conversion into low dimension space makes it computationally feasible for us to probe OMICs based cell clustering, loss of information is a real challenge that may not recapitulate optimum cellular composition for a given tissue.
Most recent breakthroughs studies from single cell OMICs mostly used such reduction techniques (i.e. principal component analysis) and applied tSNE or UMPA clustering. These studies basically captured the strong detectable cell clusters with high variance, leaving out the rare cell clusters or subtypes. This low dimension reduction is not the only complexity, for clustering, the number of cell type, the number of cells in a cell type or the cell subtype and their underlying regulatory features are all unknown. These unknown parameters creates an exponential search space and makes clustering problem NP-Hard, meaning there may not be any polynomial time algorithm to provide a guaranteed optimum solution. This is why, reduction is a convenient way to make the search space narrow enough to have a sub optimum clustering.
Clustering in low dimensions can detect signals that were preserved through major variances between cells. This in turn will lose underlying small variances that is crucial for subtypes and rare cell types detections. Another major problem is the interpretation of clusters which current solutions lack. It gives us no molecularly relevant information why clusters are different or similar. Clustering in this case seems like more of an optimization rather than a machine learning pattern recognition type of problem. It is time to go deep and find unsupervised clustering solutions using high dimension OMICs data applying different branches of artificial intelligence algorithms.