What is genetic function? The ENCODE non-questions
-->

The human genome is 3.1 billion nucleotides long.  If only 1-2% of it codes for proteins, what does the rest do?  And how do we figure that out?  One way is to commit multi-millions of dollars to a project dedicated to doing just that, and that's exactly what's been done by the ENCODE project, which recently published, with much fanfare, the results of years of work by a consortium of over 400 people and numerous labs. The comment by the ENCODE PR spokespeople that got the most attention during all the hoopla when the papers were published was the idea that while  98-99% of the genome was once called 'junk DNA', now it looks like 80% of the genome is in fact functional.

There is a heated, and indeed vitriolic debate about how misrepresentative and even highly wasteful, the ENCODE Megaproject was.  ENCODE is a cute acronym (we need that in science, after all, since much of what we're about is marketing, so we need a brand or trade-mark) for ENCyclopedia Of DNA Elements.  In a nutshell, the project was a large consortium whose objective was to identify as much of the functional elements in genomes as possible.

The interaction of tRNA and mRNA in protein synthesis:
Wikipedia
We know that DNA codes for protein, but that is only about 1-3% of the genome.  Another small fraction is transcribed into a variety of RNA molecules that do things on their own (that is, they don't just get translated into protein).  Examples are transferRNA, ribosomalRNA, and various others. Some of this, called microRNA, is used to affect gene usage, by interfering with messenger RNA and hence protein production.  Protein coding and these RNA other processes and interactions are what in our Mermaid's Tale book we called 'correspondence' codes in the genome, because DNA contains the code for -- corresponds to -- the RNA which is then used elsewhere in the cell. 

Then there are bits of DNA that contain what we called 'recognition' codes.  The DNA sequence is directly recognized by other molecules, such as proteins called transcription factors, that physically bind to the DNA sequence elements and, among other things, cause nearby protein-coding genes to be transcribed into messengerRNA.  These are codes, but they act locally on the DNA itself.

The middle and ends of chromosomes (centromeres and telomeres) contain DNA sequences used for protecting the integrity of the DNA molecule in the potentially hostile chemical environment of the cell, or in the process by which chromosomes are copied when the cell divides.  There are other codes of various kinds in DNA that affect how it is wrapped around proteins so it can fit into the nucleus and so on.

But various studies had shown that much or even most DNA is actually transcribed into RNA molecules of unknown (if any) function.  It is replicable--so not just chance or experimental trash.  Since the function isn't known, it is debatable whether this is truly 'functional' or not.

Meanwhile, 40% or even more of our DNA consists of repeat elements, short sequences that are found scattered all over the genome, and (among other ways) are copied from one location and inserted more or less randomly in some other location.  These relate to various processes, including errors in DNA replication (e.g., microsatellites) or use some viral-related mechanism on rare occasions, but enough over evolutionary time to proliferate in the hundreds of thousands.

By some accounts, especially since it seems to be transcribed into RNA, much or even most of the genome is 'functional'.  Such claims challenge well-established ideas that most of the genome has very little function--what was called 'junk' DNA--and that therefore only a small fraction really matters.

But what is function?
A lively, funny, but quite sharp--some would say vicious--attack on the excited reports of ENCODE by Dan Graur and colleagues was published recently.  First, even though the ENCODE authors, being good scientists, put lots of caveats in the original papers, they were not averse to the super-hyping given the report by the media.  Instead of saying that ENCODE had provided a very useful and accessible  data resource and some thought-provoking data, the usual hype about transformative new findings, mysteries uncovered, etc. was all over the media last year.  Graur et al. blasted such reportage as culpable, or even scientifically naive hype (or, perhaps, bovine droppings).  Indeed, the aspects of genome structure and use that were reported by the project were all to some extent or other already well-known, even if ENCODE provides a more systematic data resource and coverage of them than had been available before.

The controversy involves many different issues, some of them quite technical and methodological, but the core centered around ideas of 'function'.  The project investigators used various methods to find biochemical activity of different kinds to identify aspects of the genome that were functional by that standard.  Thus, for example, if a transcription factor protein stuck to a particular bit of DNA, that was activity and classified as function; it didn't have to be shown to affect a protein-coding gene's expression level.

From an evolutionary point of view, function only matters if it affects reproductive success--or 'fitness' in the Darwinian sense related to natural selection.  Why is this?  It's because if it doesn't affect fitness, then mutations will eventually disrupt the activity but with no loss to the organism's reproduction.  The bit of DNA will, over time, accumulate variation among individuals and between species.  By contrast, a bit of DNA that does have a fitness effect will have much less variation in the population, because mutational disruption will harm the individual, who won't reproduce, taking the variation out with it.  We say that relatively limited variation, or sequence conservation among or within species, indicates evolutionarily important function.  Indeed, even if the bit of DNA did have some function that affected a trait, say body shape, but not in a way that would be screened by natural selection--that is, not in a way that affected fitness--that function would sooner or later be erased by mutation.

In that sense the function might be real but evolutionary unimportant or irrelevant. The idea that one could have function but not be affected by mutation in this way is tantamount, the critics argued, to saying that organized structures could arise just by chance, without being molded by natural selection.  That is hard to justify (actually, there may be such reasons, but they're too much to go into  here).  But it's worth noting that Graur et al. do point out that such function could, under some circumstances, become relevant to natural selection, so that even highly variable bits of DNA may not be unrelated to evolutionary potential.  But looking at it at any given time can't tell you that, and doesn't warrant assigning function in the evolutionary sense to it.

Wasted electrons--debates over angels on pin-heads
This is a debate about many things, but in part centers around orthodoxy.   The discussion is over the question  "What fraction of the human genome is actually 'functional' in these latter senses?"  10%? 80%?  17.654382234887%? 

This is a thoroughly electron-wasting debate (using up electrons via the internet and airwaves), because we know very well, beyond any serious doubt, that the usefully used parts of the genome vary from person to person and, indeed, from cell to cell within each of us!  And if we take a broader evolutionary view, different parts and regions and fractions of genomes will be used over time. 

Among individuals in a species at any given time there are hundreds of dead or partly dead genes, regulatory regions with variable strength transcription factor binding, and so on, all across the genome.  These vary from person to person, as we have clear evidence to prove.  And, have you forgotten the hoopla over copy number variation, the hot recently-new finding that our numbers of genes and other parts of our genomes vary among by the thousands us and between the two copies of the genome each of us carries?

Since everyone differs, it is almost impossible in principle to ask this question of any single individual, because there just isn't enough information and unique observations that can't be tested with the statistical approaches needed to document it (needed for reasons that are not controversial).  Alternatively, we might come to some sort of average functional fraction for a species, but that is rather vague and perhaps misleading--misleading about how DNA functions.  For example, it's been estimated that a high fraction of our 'real' genes (protein-coding or regulatory regions) are individually dispensable if other well-working genes cover the same function.  In that sense, a high fraction even of those 'real' genes are dispensable. 

Whether something has a fitness effect is also a statistical question, since there is always a probabilistic aspect to reproductive success.  The amount of conservation in a DNA region is in principle an indicator of past history of natural selection, but it also involves other factors (population size, mutation rate, and so on), and it is inherently a relative measure.  Assessing what varies enough to be judged not to have a fitness-related function is also a statistical issue, not one with precise criteria.  If the relatively limited variation in protein-coding regions reflect real function, how much more variable reflects no, or 'less' function?  One might say that the evolutionary definition of function, based on more relative variation, is not entirely free of the subjectivity issues that plague the ENCODE definition of fitness based on having some biochemical activity.

This wastes trillions of electrons, because it stimulates the media streams of hot air, capitalizing on the flap, even though the issues themselves are debates over non-questions or even subjective issues, as we have tried to suggest here, that are not clear cut--and of course there is the strong vested interest of the investigators vigorously to defend the over-selling of yet another over-priced mega-project, so its funding won't be cut.  As usual, this electron stream misses much of the interesting and actually scientific aspect of the findings and their ambiguities.

For example, the issues rest on the tacit idea that genome functions can be enumerated at the nucleotide sequence level by essentially assuming that each function is independent of other functions, which is purely a fiction.  But interdependence makes these kinds of issues, that are based on differing criteria of 'function' and relative variation, very tricky.   And the evolutionary argument essentially assumes that non-conservation means no function which is also a misperception of the dynamic control and complexity of genetic mechanisms and evolutionary adaptation.  This is not the place for me to outline my view on that, but I do think that a proper understanding of genomes and their evolution can answer the perceived differences of point of view in the current food fight.

Of course, thinking seriously about evolution is harder, won't please Big Story seeking journalists, and makes less dramatic material for grant applications.  So the food fight is not at all surprising.

Comments 0


EmoticonEmoticon

:)
:(
=(
^_^
:D
=D
-_-
|o|
@@
;)
(y)
:-d
:p
<3
(>o<)