Proteins. Everybody knows what a protein is, right? A string of subunits called amino acids linked together through peptide bonds with a specific 3D structure, I guess this is biology 101. And everyone knows that our human body needs proteins. Without them, we're dead. Without them, there isn't even live as we know it. And before you said something about the RNA world where you argue biochemical reaction can occur with just RNA and no protein, yes I'm aware of this. But its the nature of the diversity of these molecular machines but not nucleic acid that really what makes complex lives possible. Yet, with the age of the genomic evolution, I feel like we are paying too much attention to the 4 digit letters but forget about the other 20 elephants in the room.
Before bashing me about this, let me elaborate. The past 50 years or so we saw an explosion of genetic information. DNA is really what considered separate biology from chemistry. I mean searching for any biology icons for your presentation and I guarantee you the double helix structure of DNA is one of the first things that would pop-up on your Google search (and the other thing maybe some random green leaves). And absolutely, genetic information tells us so much about our selves, about your ancestors, its the recipe for life itself as many have said. It is the origin of many discoveries that won Nobel Prizes over the year, or the most advanced genome editing technique that ever exist so far. I mean look at the literature, there are times and I guess still now where most if not every single Nature and Science and Cell paper is about single-cell sequencing. Big fancy papers with heatmaps and abstract lines that I would never understand.
But...
As much as people think genetics is everything, as a biochemist by training, I think otherwise! Searching through the Protein Database, you can see there are about just over 150,000 different protein structures. Yet, almost half of these proteins we don't even know what they are or what they do! The human genome has about 20,000 to 25,000 protein-coding gene, each gene can encode for about 5 to 6 proteins due to post-transcriptional modification, then add in the effect of single amino acid polymorphism, post-translational modification where certain fragments of the original amino acid chains can be modified, it is estimated that the number of protein species in our human proteome alone can be up to 1,000,000 to 6,000,000 species. So we really only know the function of 50% of the maximally 4% of the proteins in our human proteome. That does not even count for the function of countless other living organisms out there.
CRISPR-Cas9, the revolutionary genome-editing technique that can precisely edit a single nucleotide in our genome, yet one of the core component for it to work is a protein called Cas9. And really from the point of discovery of Cas9, this technique is getting more and more precise and less error-prone not because we make better guide RNA, it's because we discover different types of Cas proteins that do a better job than Cas9!
The DNA, how do they replicate? How they are maintained? It is also the proteins that do the job. DNA polymerase, DNA helicase, DNA ligase. Then what transcribe DNA into RNA? RNA polymerase, then RNA splicing complex or spliceosome. Then what transport the RNA out of the nucleus through the nuclear pore (which is also another protein complex), then what translate it. Ribosomes are largely proteins too. Then how does small interference RNA work? Well, it also relies on a protein complex inside the cells to do its job called the Dicer complex and RNA-induced silencing complex or RISC. These proteins were once Proteins of Unknown Function.
Cancer, like geneticists always said, is a genetic disease. Yes, the core of the disease is at the mutations occurred on the DNA of the cells. But what really drives the cancer to occur, what directly controls the aggressiveness of a cancer cell is actually its protein. It's the protein that makes the cancer proliferate indefinitely. It's also the proteins that prevent the cancer from dying. It's also the proteins that allow the cancer cells to invade and metastasise. Without the proteins, you can get 10,000 mutations on the DNA and I bet you no cancer would form.
People discover these specific genes to be upregulated in disease, this set of genes to be loss during cancer. But stopping there isn't going to cure the disease. There are scientists who actually have to do the work by taking a much closer look at each individual protein, and test to see what is the function of it and how it does what it does. You know, things like how do you know BRCA1 increases your risk of breast cancer if it wasn't for the discovery of it being a protein involved in genome protection? Or what does HER2 do to contribute to breast and ovarian cancer? These studies are vital to some of the most successful chemotherapy out there. I bet they used to be Proteins of Unknown Function too!
If we just stop at all the fancy hits coming out from a genomic screen, we would never truly know why that is the case and if that is ever going to be the case at all. Imagine we still have 98% of the other protein to understand, those Proteins of Unknown Function, then imagine how many possible ways we could have to stop cancer on its track? If evolution decided to keep them, there has to be a reason. Protein scientists and their discovery don't get the praise they deserve.
We have sequenced almost 85,000 genomes, and yet the number of proteomes are still basically at the starting point. It's lucky that all lives share the same genetic code, 4 letters A, T, C, G (maybe Uracil too but like Pluto, it doesn't seem to be included in the gang as much). But with proteins, we have 20 and more, and each has its unique sequence and each has its unique 3D structure and modifications. We barely scratch the surface of what we really know about the proteins.
The purpose of this is not trying to dethrone genetics. I do absolutely believe genetics is advancing our understanding of biology so much, but I think it's time for people to start paying more attention to the details, paying more attention to the workhorses of the cells, to those that actually do the job! There are just so many proteins we don't know their functions or what they do, yet research on them are no where near considered to be as exciting as genomics and such. And the fact that journals always want something of big impact, and where else to go when the universal genetic code is the most general thing you can think of that will impact every research downstream of that? But yet those who study the proteins don't get as much attention (as they should be!). This hampers more risk-taking projects that are curious-driven, that are exploratory-oriented. Where would you find the next CRISPR-Cas9 if you don't pursue the unknown, literally.
Genetics and Biochemistry should the two sides of the same coin and there should be an equal interest and support for both of these area. If we can have a more balanced between the two, one direct and support the other rather than skewed-and-follow-the-trend research, then that would put a smile and a strong encouragement on so many scientists including myself, who are working day and night to try to understand 1 "weird" protein, that otherwise would be considered uninteresting.
Until next time :)
REF:
The Size of the Human Proteome: The Width and Depth
A cell holds 42 million protein molecules, scientists reveal
Proteins of Unknown Function in the Protein Data Bank (PDB): An Inventory of True Uncharacterized Proteins and Computational Tools for Their Analysis
Proteins of Unknown Biochemical Function: A Persistent Problem and a Roadmap to Help Overcome It
The function of many proteins remains uncertain: Blind spots on protein maps quantified
Comments