I’m just beginning to look at these results, a fantastic new source of information and approach to understanding the human genome. Here’s a popular press article and a science article (the source of my embedded quotes) on the same subject.
Like the “god particle”, “junk DNA” is an unfortunate term. The creationists, misunderstanding it, of course, have made much of it. But many lay people have heard the term as well and unless you dig into genomics it’s hard to understand.
After sequencing the entire human genome the hunt was on to find “genes” (aka coding sequences). Previously genes were found in reverse, that is, find a protein in a cell, work backwards to the gene that codes it. From that earlier research it was possible to build data mining programs that knew what a gene “looked like” (certain patterns in the DNA) and thus find all/most the genes in the entire genome, even some that had never been detected before and for which no function (or gene product) is known. This effort resulted in a surprisingly small number genes, hardly enough to code for something as complex as a human being.
These coding regions only comprised about 5% (some measure it as 1.5%, perhaps introns explain some of that discrepency) of the total DNA in the genome, leaving the rest to be referred to as “junk”. Even as this term became common some functions had already been found for some of the junk. Meanwhile much of the junk is in the form a repeated sequences, literally a virus in the DNA that can spread and insert itself many thousands of times. Some of the “junk” also played important structural roles in the chromosomes. But much of the rest still just looked like junk.
The rest of the functional elements in the ENCODE analysis cover other classes of sequence that were thought to be essentially functionless, including introns. “The idea that introns are definitely deadweight isn’t true,” said Birney. Even some repetitive sequences—small chunks of DNA that have the ability to copy themselves and are typically viewed as parasites—are likely to be functional, often containing sequences where proteins can bind to influence the activity of nearby genes. Perhaps their spread across the genome represents not the invasion of a parasite, but a way of spreading control. “These parasites can be subverted sometimes,” Birney said.
Now there is the publication of ENCORE, a project to look more closely at the junk. Fortunately Nature choose to publish the results to the public, instead of the usual hidden behind a paywall that only institutional scientist can access. So there is a vast amount of new information available for amateurs like me to go digest, including one of my favorite interests: the role of histones (not DNA per se, but part of the total structure of chromatin, the stuff of chromosomes).
Writing the genome out as a string of letters invites a common fallacy: that it’s a two-dimensional, linear entity. In reality, DNA is wrapped around proteins called histones like beads on a string. These are then twisted, folded and looped in an intricate three-dimensional way. In this way, distant parts of the genome can actually be physical neighbors, and can affect each other’s activity.
Genes, or coding regions, are specific sequences of DNA composed of ‘codons’, each one of which (well, most, a few are structural) select a specific amino acid to form a polypeptide which then folds into a critical 3D shape and possibly multiplexes with other gene products to form a specific protein. For instance, IIRC, hemoglobin is composed of two each of two subunits and a heme group. These finished proteins basically make life work performing a wide variety of functions.
Every cell contains the same genes (some replication errors accumulate over time and some it isn’t precisely true that every cell has identical DNA). But most genes are not needed in every cell type (the approximately 200 differentiated cell types in a person) so they are not “expressed” (a liver cell doesn’t make the same proteins as a neuron). Furthermore some genes are only needed during certain stages of our development; many function only during embryogenesis and then are inactive in the rest of our life. Furthermore genes are only expressed when there are needed, based on the internal state of the organization, i.e. no digestive enzymes when there is no food to digest, no repair protein unless cells are damaged.
So if every cell has every gene, why don’t all cells create every gene product all the time. The answer is a very complex system of gene regulation. There are many ways this regulation occurs but often the non-coding (i.e. what was known as junk) sections of the DNA play a critical role. This new publication assembles a huge amount of information about all that regulation process and in fact finds regulatory regions of the genome to be far more numerous than the coding regions.
So this will make for some good reading. I’ll have to go back to all the notes I accumulated while in intense study of genomics back when the first sequencing was being completed because there is a huge body of terminology and concepts one must understand before being able to dig, at least deeply, into these new findings. So hopefully I’ll learn something and have some stories to relate, not that anyone needs anything from me, given the experts are the source of the real information, but sometimes having an amateur point out something interesting may be of value to my readers.
Now a brief comment on some creationist “debate”, most of which I can’t recall in detail and much of it was silly so I’m not going back to look again, by the creationists, esp. the IDiots, aka intellect design creationists (yes, they don’t add the ‘creationists’ to their use of the term, but a rose by any other name is, well – it’s still creationism no matter how they try to conceal that from the courts and ID is still vacuous). Junk DNA is easy to explain through evolution; there are all sorts of ways junk DNA can be added to the genome of any creature and not very many ways it can be removed (prokaryotes are the exception, they have very “efficient” and compact genomes with little junk). But if you’re going to use the term ‘intelligent’ in your creationist myths, junk DNA certainly implies a really poor “designer”, or in short, why would a perfect god put all that junk in our DNA. IDiots have a hard time explaining this, so of course, like all good wingnuts, they simply denied it. Now scientists had fallen into a bit of trap by labeling non-coding DNA as ‘junk’, since that implies it has no function and in fact it was always known some of it did have functions, it just didn’t get expressed as proteins. So undoubtedly the IDiots will be jumping with glee with this new Nature study, “see we were right, god has infinite wisdom and you heathen scientists just didn’t know it.’ Of course given the staggering lack of scientific knowledge or misstatement of knowledge by the creationists I suspect they’ll just quote-mine a few bits out of context to apply to their nonsense arguments.
If science were politics some might try to cover up these new findings as inconvenient truths. But in fact since scientists always knew exactly what ‘junk DNA’ meant it was a mystery to be unraveled and of course the findings published, as has now happened. You might wonder, why it’s ten years after the initial human genome sequencing some of this is coming out. Well, IMHO, several reasons: a) a lot of focus on genes before the sequencing plus the development of many bioinformatic tools made that part of the problem easier and so results came out sooner, and, b) since much of the junk DNA actually is junk and junk DNA is 21 times more common it’s a lot of work to try to understand it and so a more directed approach (i.e. hypotheses about what junk DNA might do) had to guide the research. I’ll be able to comment more on this after some study.
So, Dr. Reader, perhaps you too will want to take a look at this site (included in my sidebar). Don’t let terminology get in your way, there are many sources on the net to provide simple and short definitions, sufficient to then get the substance of the article. But this kind of science isn’t for casual reading or quite skimming to get the key points, it does take “study”, but I think you’ll find, as I did, it can be quite fascinating.
The new ENCODE results are vast, reported in 30 central papers in Nature, Genome Biology, and Genome Research, as well as a slew of secondary articles in Science, Cell, and others. And all of the data are freely available to the public.
The pages of printed journals are a poor repository for such a vast trove of data, so the ENCODE team have devised a new publishing model. On the ENCODE portal site, readers can pick one of 13 topics of interest, such as enhancer sequences, and follow them in special “threads” that pull out all the relevant paragraphs from the 30 main papers. “Rather than people having to skim read all 30 papers, and working out which ones they want to read, we pull out that thread for you,” Birney said.
btw: Why does any of this matter? This new view of the genome is more sophisticated and thus more likely to lead to understanding of illness and treatments, which the first sequencing promised as well, but mostly disappointed. In short, it’s more complicated than first thought and now new light is being shed and thus more likely we’re getting close to useful results from studying the genome. For example:
The researchers found that just 12 percent of known SNPs [Single Nucleotide Polymorphisms] lie within protein-coding areas. They also showed that compared to random SNPs, the disease-associated ones are 60 percent more likely to lie within the non-coding but functional regions that ENCODE identified, especially in promoters and enhancers. This suggests that many of these variants are controlling the activity of different genes, and provides many fresh leads for understanding how they affect our risk of disease.