BioEssays Editor: ‘Junk’ DNA Full of Information! Including Genome-Sized Genomic Code – Discovery Institute

Posted: November 20, 2019 at 5:49 am

How many times have we heard it claimed that the vast majority of the human genome is junk and therefore could not have been designed? Even in the face of overwhelming evidence from the ENCODE project and numerous other studies showing that most of our genome has biochemical function, most evolutionists still maintain that our genomes are largely junk. But a few brave scientists, including some rare evolutionists, have been willing to buck that trend.

In a new article at Advanced Science News That Junk DNA Is Full of Information! Andrew Moore, the Editor-in-Chief of the respected biology journal BioEssays, comments on a new BioEssays paper. The paper finds that our DNA contains overlapping layered dual-function pieces of information, including a genomic code that spans virtually the entire genome in order to defin[e] the shape and compaction of DNA into the highly-condensed form known as chromatin. More about that paper in just a moment. It was written by leading Italian biologist Giorgio Bernardi who played a major role in the discovery of isochores. Isochores are important in this story. But for now, lets look at Moores essay. It has something worth mentioning in almost every paragraph.

Moore starts by saying that it should not be surprising that there is more function in the genome than we initially expected:

It should not surprise us that even in parts of the genome where we dont obviously see a functional code (i.e., one thats been evolutionarily fixed as a result of some selective advantage), there is a type of code, but not like anything weve previously considered as such.

From an intelligent design (ID) perspective, Moore is absolutely correct: finding more function in the genome should not surprise us. But Moore is not an ID proponent; hes clearly writing from an evolutionary perspective. Even as he describes extensive function in our genome, he frequently adds evolutionary narrative gloss just to remind you what side hes on. But within the evolutionary perspective, his support for mass genomic functionality does not represent the majority. There is a long history of evolutionary biologists predicting that non-protein-coding DNA is largely junk. (See Post-ENCODE Posturing: Rewriting History Wont Erase Bad Evolutionary Predictions.) As one example, in 1980 Francis Crick and Leslie Orgel wrote that Much DNA in higher organisms is little better than junk, and it would be folly in such cases to hunt obsessively for its function. Numerous similar claims have been made over the years.

Though clearly evolution-based, Moores perspective stands out in an important way: it is open to seeing coordinated function across the entire genome. Moore thus proposes an idea with which ID proponents would heartily agree:

And what if it [this other code] were doing something in three dimensions as well as the two dimensions of the ATGC code? A paper just published in BioEssays explores this tantalizing possibility

So there are multiple layers of information in DNA controlling cellular processes that operate in multiple dimensions. Not only that, but as Moore explains, these codes are frequently overlapping within our DNA sequence:

One of the intriguing things about DNA sequences is that a single sequence can encode more than one piece of information depending on what is reading it and in which direction viral genomes are classic examples in which genes read in one direction to produce a given protein overlap with one or more genes read in the opposite direction (i.e., from the complementary strand of DNA) to produce different proteins. Its a bit like making simple messages with reverse-pair words (a so-called emordnilap). For example: REEDSTOPSFLOW, which, by an imaginary reading device, could be divided into REED STOPS FLOW. Read backwards, it would give WOLF SPOTS DEER.

Though highly specified and difficult to produce by chance, overlapping codes are demonstrably present in our DNA. Proponents of intelligent design have long identified overlapping genes as a signature of design. For example, one chapter in the volume Biological Information: New Perspectives argues that Multiple Overlapping Genetic Codes Profoundly Reduce the Probability of Beneficial Mutation. The chapter observes that, DNA sequences are typically poly-functional with overlapping protein-coding sequences which can contribute to multiple overlapping codes simultaneously. But the likelihood of producing such information-rich, tightly constrained sequences by chance is exceedingly low: it is difficult to understand how poly-functional DNA could arise through random isolated mutations.

How do overlapping codes relate to the current situation? Moore explains that these dual-function pieces of information are found throughout our genome where DNA can both encode proteins and simultaneously define a genomic code:

For two distinct pieces of information to be encoded in the same piece of genetic sequence we would, similarly, expect the constraints to be manifest in biases of word and letter usage the analogies, respectively, for amino acid sequences constituting proteins, and their three-letter code. Hence a sequence of DNA can code for a protein and, in addition, for something else. This something else, according to Giorgio Bernardi, is information that directs the packaging of the enormous length of DNA in a cell into the relatively tiny nucleus. Primarily it is the code that guides the binding of the DNA-packaging proteins known as histones. Bernardi refers to this as the genomic code a structural code that defines the shape and compaction of DNA into the highly-condensed form known as chromatin.

This genomic code is thus a genome-wide feature, woven throughout our DNA, including portions of the genome that evolutionists have typically assumed had no function. This code is defined by the GC content of a stretch of DNA the level of base pairs that are guanine-cytosine (hence GC) rather than adenine-thymine. In protein-coding DNA, the third base-pair in codons can often vary from AT/TA to CG/GC without affecting the amino acid being specified. Evolutionists have presumed that the precise nucleotide in this third base pair was irrelevant, so long as the codon was synonymous, and that variation in the third nucleotide represented an unimportant non-functional feature. But Moore explains that the third nucleotide in a codon can have great functional importance apart from merely specifying the amino acid, and could actually help define this genomic code, which overlaps with the protein-code:

Protein-coding sequences are also packed and condensed in the nucleus particularly when theyre not in use (i.e., being transcribed, and then translated into protein) but they also contain relatively constant information on precise amino acid identities, otherwise they would fail to encode proteins correctly: evolution would act on such mutations in a highly negative manner, making them extremely unlikely to persist and be visible to us. But the amino acid code in DNA has a little catch that evolved in the most simple of unicellular organisms (bacteria and archaea) billions of years ago: the code is partly redundant. For example, the amino acid Threonine can be coded in eukaryotic DNA in no fewer than four ways: ACT, ACC, ACA or ACG. The third letter is variable and hence available for the coding of extra information. This is exactly what happens to produce the genomic code, in this case creating a bias for the ACC and ACG forms in warm-blooded organisms. Hence, the high constraint on this additional code which is also seen in parts of the genome that are not under such constraint as protein-coding sequences is imposed by the packaging of protein-coding sequences that embody two sets of information simultaneously.

Moores evolutionary bias is evident here as he repeatedly adds narrative gloss, ascribing functional aspects of our genome to evolution, rather than simply describing the functional nature of DNA and leaving evolution out of it. But the substance of what hes saying identifies function in an aspect of the genome that evolutionists have frequently ignored as junk.

He goes on to explain that this genomic code is not limited to protein-coding sequences, overlapping with the code that specifies protein sequences. The code also persists throughout giant portions of our genome, characterized by repetitive sequences that evolutionary scientists have, again, frequently ignored as junk. Read the following carefully, and try to filter out the gloss. It basically admits that these massive segments of our genome are functional:

But didnt we start with an explanation for non-coding DNA, not protein-coding sequences? Yes, and in the long stretches of non-coding DNA we see information in excess of mere repeats, tandem repeats and remnants of ancient retroviruses: there is a type of code at the level of preference for the GC pair of chemical DNA bases compared with AT. As Bernardi reviews, synthesizing his and others groundbreaking work, in the core sequences of the eukaryotic genome, the GC content in structural organizational units of the genome termed isochores increased during the evolutionary transition between so-called cold-blooded and warm-blooded organisms. And, fascinatingly, this sequence bias overlaps with sequences that are much more constrained in function: these are the very protein-coding sequences mentioned earlier, and they more than the intervening non-coding sequences are the clue to the genomic code. In eukaryotic genomes, the GC sequence bias proposed to be responsible for structural condensation extends into non-coding sequences, some of which have identified activities, though less constrained in sequence than protein-coding DNA. There it directs their condensation via histone-containing nucleosomes to form chromatin.

What we see here is that major portions of our genome, traditionally viewed as junk, are actually full of information in excess of mere repeats, tandem repeats and remnants of ancient retroviruses because there is a type of code at the level of preference for the GC pair of chemical DNA bases compared with AT. The purpose of the code, in short, is to direct DNA-packing in the nucleus.

The genomic code is largely defined by huge GC-biased portions of the genome called isochores. When you hear the word isochore, think of humongous portions of our genome characterized by repetitive sequences of DNA that most evolutionists have typically ignored as junk, but that ID proponents have predicted as probably having function.

Giorgio Bernardis paper in BioEssays provides an extensive discussion of the literature. It shows that isochores have functional importance and that the GC level of isochores defines a vital genomic code. Bernardi explains:

[T]he genomic code, which is responsible for the pervasive encoding and molding of primary chromatin domains (LADs and primary TADs, namely the gene spaces/spatial compartments) resolves the longstanding problems of non-coding DNA, junk DNA, and selfish DNA leading to a new vision of the genome as shaped by DNA sequences.

Bernardis view is that most of the genome is functional, contradicting the typical junk DNA perspective:

By the end of the 1980s, our knowledge of the isochore organization of the human genome had not only rejected what had been called the bean-bag view of the genome, that is, a collection of genes randomly scattered over vast expanses of junk DNA; but it had also indicated that the genome is an integrated structural, functional, and evolutionary system. This view arose from a comparative study of vertebrate genomes, centered on the analysis of their compositional patterns, namely of the compositional distributions of large DNA segments, coding sequences, and introns.

Thus, the presence of GC-rich isochores leads us to reject the junk DNA view. It indicates that the genome is an integrated structural, functional, and evolutionary system. Ignoring Bernardis evolutionary gloss, which wrongly assumes that integrated structural and functional systems can arise by blind evolutionary mechanisms, his statement is exactly what ID theory would expect. Bernardi continues explaining how we know that isochores are functional and carry the genomic code which overlaps with the genetic code:

The functional importance of isochores was already evident in the 1980s because of the correlations of their GC levels with all the genome properties tested. It was later confirmed by investigations carried out in the 1990s. The first indications that the base composition of isochores was under constraint came from the strong correlations between the composition of interspersed repeats, such as the GC-poor LINES and GC-rich SINES, and the composition of the GC-poor and GC-rich isochores, respectively, in which those sequences were located. The next step was the extension of the compositional correlations to genes (exons, introns, codon positions) located in GC-poor and GC-rich isochores, correlations that affect codon usage and amino acid composition of the encoded proteins. These points were subsequently reinforced, leading to the proposal that a genomic code was responsible for the compositional correlations just mentioned. As shown in Table S3, Supporting Information, the genomic code was further extended in the following years to include the sequence distributions, the functional properties associated with GC-poor and GC-rich isochores, and the structure and nuclear location of interphase chromatin.

Only recent investigations showed, however, that the genomic code: 1) is a structural code in that it directly encodes and molds chromatin structures and defines nucleosome binding; 2) is pervasive because it applies to the totality of the genome; 3) overlaps the genetic code and constrains it, by affecting the composition (but not the function) of coding sequences (and contiguous non-coding sequences), codon usage, and amino acid composition of the encoded proteins, as already mentioned.

Moores article, describing Bernardis findings, concludes strikingly:

These regions of DNA may then be regarded as structurally important elements in forming the correct shape and separation of condensed coding sequences in the genome, regardless of any other possible function that those non-coding sequences have: in essence, this would be an explanation for the persistence in genomes of sequences to which no function (in terms of evolutionarily-selected activity), can be ascribed (or, at least, no substantial function).

We may marvel at such complicated structures and ask but do they need to be quite so complicated for their function? Well, maybe they do in order to condense and position parts of the protein in the exact orientation and place that generates the three-dimensional structure that has been successfully selected by evolution. But with a knowledge that the genomic code overlaps protein coding sequences, we might even start to become suspicious that there is another selective pressure at work as well

Moore doesnt specify what the other selective pressure is, but clearly he sees the functionally important genomic code as pervasive throughout the genome. So heres what we have: evolutionary scientists proposing that most of our genomes sequence has functional importance because it carries a genomic code, controlling the three-dimensional packing in the nucleus. This code even overlaps with the genetic code in protein-coding DNA. Such a perspective directly contradicts the evolutionary paradigm of a genome flooded with junk.

Why would evolutionary scientists like Moore and Bernardi step outside that paradigm? The answer is simple: Their views are driven by the data. Moore or rather, more power to them!

Photo byAnn Kathrin BoppviaUnsplash.

Read the original:
BioEssays Editor: 'Junk' DNA Full of Information! Including Genome-Sized Genomic Code - Discovery Institute

Related Posts