The central dogma of molecular biology posits that information flow from DNA to RNA to protein. Proteins are the molecules that carry out the majority of biological functions, but RNAs are considerably easier to measure and hence RNA sequencing (RNA-seq) is broadly used to assess the gene expression state of cells and tissues under various conditions. But how well does RNA predict protein level? Recent studies have now shown that the answers vary. Generally the majority of RNAs tend to predict their protein counterpart quite well, but others less so. The expression level of some RNAs may even be negatively correlated with that of their proteins.
In our recent work, we used a machine learning approach to assess how different RNAs compare to proteins in a large dataset of five cancer types (breast, ovarian, colorectal, lung, and endometrial from the CPTAC consortium). For each gene, we trained an elastic net that when given a sample-specific standardized transcript level, predicts the sample's standardized protein abundance. In total we trained models for over 11,000 genes. On average the correlation coefficients between predicted and empirical protein levels in the test set was 0.37. We showed that these predictions generally hold true even when applied to non-cancerous human tissues from GTEx v8, suggesting the differential protein predictability of transcripts may be bound in part by biophysical and cell biological constraints and may be "transferrable" to other samples.
Understanding the molecular mechanisms of aging hearts and skeletal muscles is an important component in the quest to mitigate and prevent prevalent morbidities in an aging world. We derived a computational workflow that uses how well RNA predicts protein as a rule to filter out potential results. Our rationale is that the results from an RNA seq experiment will be differently useful -- more so if that a target RNA is predictive of its protein level. We found that only about half of the RNA targets in an aging comparison study might reasonably predict protein level changes. The results therefore help highlight a subset of potential targets for follow-up studies (Figure) that may be more informative of downstream mechanisms.
For more, read our paper on Molecular Omics here. A preprint version is also available for free on bioRxiv.