A new paper from our lab revisited the question of how well mRNA levels reflect protein variances across different tumors and normal tissues using CPTAC data.
The levels of mRNA in the cell is very commonly measured as a readout of genome output. In the last few years, there has been an increasing realization among scientists of the imperfect correlation between mRNA and protein levels. Although a robust correlation exists between protein and mRNA across different genes (i.e., highly abundant mRNAs tend to produce highly abundant proteins within a cell), there is often only modest correlation between the fold-change of a particular mRNA and the corresponding change of the protein it produces upon perturbation or disease.
This unexplained protein variance may be attributed, in part, to the many post-transcriptional and post-translational regulatory mechanisms that modulate protein concentration in the cell. For example, large multi-protein complexes are known to buffer gene copy number changes, because the subunits that are surplus to the stoichiometry of assembly cannot fold properly, and become degraded by the cell. Other mechanisms such as gene-specific translation rates of mRNA also contribute to the biophysical and biological barriers that uncouple the mRNA and protein layers of biological information. These mechanisms are usually not directly revealed from transcriptomics data that aim tp measure differentially expressed mRNA. As a result, there is a need to examine critically mRNA- protein relationship, and the refinement of methods to predict protein changes from mRNA data is a relevant goal to many fields of biomedical research.
In a recent paper, we built on prior work and trained multiple machine learning models to predict the across-sample protein variance from RNA-seq data. We saw huge gene-wise differences in predictability. We found that up to 1/3 of proteins are poorly predicted by mRNA. Surprisingly, we saw many proteins show very poor correlation with their cognate mRNA but instead a strong correlation with another transcript, which are usually but not always their known protein-protein interaction partners. The data suggests degradation of supernumerary interactors is a driver of protein levels. While this was known for large complexes, this phenomenon is widespread and affects many small stable complexes incl. propionyl-CoA carboxylase, mito. calcium uniporter, calcineurin, etc.
Importantly, we found that prior selection of mRNA features, i.e., pre-selecting some genes that may predict the levels of a proteins not only improved protein predictions, but may also help find new protein-level driver genes. For example, using a directed graph model, we predict that the LACTB mRNA may have an outsized effect on mitochondrial ribosome protein abundance. The paper “widespread post-transcriptional regulation of protein abundance by interacting partners” by Himangi Srivastava, Mike Lippincott, Jordan Currie, and the Maggie Lam Lab can be found on PLoS Computational Biology.