|This paper was submitted for presentation at the Genetic Programming 1997 Conference (GP-97) held 13-16 July at Stanford University, where Susumu Ohno was the scheduled keynote speaker. Only two of six reviewers recommended accepting it. Maybe that's not bad for a paper that questions the possibility of achieving one of Genetic Programming's highest goals.|
AbstractDarwinism holds that new genes can evolve blindly out of old genes by gene duplication, mutation and recombination under the pressure of natural selection. The strong version of panspermia holds that they cannot arise this way or any other way in a closed system. If a computer model could mimic the creation of new genes by the Darwinian method, it would establish that the process works in principle and strengthen the case for Darwinism in biology. Here we briefly discuss some evidence and arguments for the Darwinian mechanism and some for panspermia. Then we consider three well-known computer programs that undergo evolution, and one other proposal. None of them appears to create new genes. The question remains unanswered.
Evolutionary progress requires new genesIn the four-billion year history of Earth's biosphere, life has evolved from prokaryotic cells to eukaryotic cells, to multicelled plants and animals, to creatures with specialized tissues and organs, etc. This increase in organs, systems, and features has been accompanied by an increase in the size of the genome. Admittedly, overall genome size is not a reliable measure of a species' place on the phylogenetic tree of life. Even similar species can have genomes of quite different sizes. However, if we eliminate from consideration all silent DNA and all redundant copies of genes, the size of the necessary genome has increased as evolution has produced new systems, functions and features. Specifically, the number of necessary genes has increased. It is clear that the evolution of new organs, systems, or features requires new genes.
Two scientific theories give differing accounts of the appearance of new genes in biological evolution on Earth. The more accepted one, by far, is the Darwinian theory, as elaborated by Susumu Ohno. According to Ohno, new genes arise when existing genes are duplicated, undergo "forbidden mutations" as silent DNA, and reemerge as new genes with new functions. The other theory is the strong version of panspermia ("Cosmic Ancestry"). It holds that new genes are introduced into life on Earth by prokaryotes or viruses from outside Earth's biosphere. This introduction may occur immediately before or long before the time when the genes are first expressed on Earth.
DarwinismWhile Ohno's version of the Darwinian account of new genes is not proven by any comprehensive step-by-step account of the evolution of a new gene, it gains support from the existence of genes with similar nucleotide sequences that fall into three categories.
1) There are genes like the ones for cytochrome C that are common across a broad range of eukaryotic life. A large body of evidence indicates that these genes have a low but constant mutation rate and descended from a common original gene for cytochrome C. The many variants of genes for cytochrome C today are almost certainly descendants of a smaller set or even a single ancestral gene. But they do not have new functions, so they are not new genes.
2) There are families of genes with similar nucleotide sequences that have related but different functions, such as the globin genes. Both hemoglobin and myoglobin store oxygen. That these genes evolved on Earth from a common source is widely accepted.
3) There are genes with similar sequence but totally different function. For example, some lens crystallins are similar to metabolic enzymes. Another example is the apparent relationship between the genes for a pancreatic enzyme and an antifreeze protein in Antarctic cod. That the genes for such functionally different proteins have a common evolutionary source on Earth is possible.
Cosmic AncestryThe hard evidence for Cosmic Ancestry as the source for new genes is thin indeed. But it would seem to account for two recent discoveries better than Darwinism does.
1) In August, 1996, Carol J. Bult et al. reported sequencing the complete genome of the archaebacterium M. jannaschii. Among the genes it contains are, surprisingly, five histone genes (Morell). Histones are known to be used by eukaryotes as scaffolding for their complex chromosomal structure. Prokaryotes' chromosomes do not have this structure and do not use histones. The presence of silent eukaryotic genes in a prokaryote is consistent with Cosmic Ancestry, according to which new genes are delivered to the biosphere as silent DNA within prokaryotes or viruses. It is especially interesting that two of the histone genes in the archaebacterium are located on an extra-chromosomal element. This is the logical placement for genes waiting to be exported from a cell. The Darwinian response to this surprising finding is to guess that M. jannaschii's chromosome must be wound around histones after all, like a eukaryotic one (Morell).
2) In October, 1996, Gregory Wray, et al. reported evidence that seven metazoan genes, including cytochrome C, are about twice as old as the first appearance in the fossil record of the metazoa that express the genes. This evidence is consistent with Cosmic Ancestry, according to which new genes for evolutionary advances would necessarily be present before they were expressed — possibly even long before. The Darwinian explanation for metazoan genes twice as old as metazoa requires a major shift of thought which is not supported by any other evidence — metazoa must twice as old as their oldest fossils (Wray, et al.).
Duplicated Genes That Undergo Many MutationsFor a new gene to evolve as Ohno says, an existing gene must be duplicated, become silent for a time, and undergo "forbidden mutations". While silent, a gene cannot be improved or even maintained by natural selection. To arrive at a substantially different nucleotide sequence, we would expect many mutations to be required. These mutations will randomize the original sequence of the gene and it will lose its original meaning. As everyone knows, a random nucleotide strand as long as an average gene has an absurdly high number of possible sequences. If an average gene is 1,000 nucleotides, the number of possible sequences it can have is 4^1000, or about 10^600. The chance of finding any gene currently expressed anywhere in biology in that sequence space, in even 10^50 trials, is less than 10^-500.
To get around this apparent problem, one could maintain that the number of genes which would be functional in biology is very much higher than the approximately 10^12 different genes expressed today (Olomucki), and is high enough that finding one in random sequence space in a realistic number of trials (10^5? 10^10?) becomes likely. A hint of this possibility comes from a 1995 study of random RNA sequences by Eckland et al. They calculated that the chance of finding a class I ribozyme in a sample of 1.4x10^15 random strands, each 220 nucleotides long, is 5x10^-4 or less. Yet they did find one such strand in a sample of that size. From similar results for other ribozymes they conclude that there are "a large number of distinct RNA structures" with the capability to act as ligases.
Genes apparently can tolerate a higher percentage of errors than computer code or written text. Sixty-one codons code for only twenty amino acids, so many of the codons are synonymous. Also, proteins can tolerate amino acid substitutions at many positions without losing their function. If the proportion of functional gene-length strands of DNA to nonfunctional ones is not as low as we actually find in nature — (10^12 different genes) x (10^25 estimated average number of functional alleles each) / (10^600 possible sequences) = 10^-563 conservatively — but is as high as 10^-10, say, this situation would lead to something like a "many worlds" theory of evolution. It would make stasis in evolution difficult to explain.
This situation would make the genetic code different from other codes such as computer code or written text. In these codes, the ratio of meaningful to meaningless sequences (of information content similar to that of an average gene — 1,000 nucleotides = 2,000 bits = 250 bytes) in random sequence space is so low that the chance of finding a meaningful one in only 10^5 or 10^10 or even 10^25 trials is effectively zero. Perhaps the genetic code is so different, but this difference has not been demonstrated.
Duplicated Genes That Undergo Few MutationsIf a duplicated gene suffers only a few mutations, it may retain its original function and soon again become expressed. In fact it could evolve this way without necessarily ever becoming silent. This is a possible Darwinian account for the many variants of cytochrome C, for example. Furthermore, if the right few mutations occur, according to this line of thought, the gene may acquire a new function. This would be a possible Darwinian account for the above-mentioned lens crystallins, for example.
With few mutations, the sequence space to be explored by a strand of 1,000 nucleotides has far fewer members than the 10^600 we calculated above for all possible sequences. The number of different possible sequences resulting from a single nucleotide substitution is only 3,000. The number of different possible sequences resulting from a single recombination event within the gene (the removal and reinsertion elsewhere of a portion of the strand of nucleotides) is also low. For a strand of length n, that number is (1,000-n)^2. For strands of any length the number is less than 10^9. It is not unreasonable to suggest that this kind of evolution occurs. But if this is proposed to be the basis for the evolution of all new genes, then Darwinism must maintain that there are series of expressed genes whose sequences are closely related — one-to-the-next — leading from a small set of original prokaryotic genes to every gene subsequently expressed in biology. This method of gene creation would imply ultra-gradualism in evolution. This line of thought was discussed by Manfred Eigen in 1987, but its development has not been fruitful.
Chandra Wickramasinghe has compared the Darwinian account of evolution to saying that all of world literature came from the book of Genesis by occasional typos and paragraph swapping. The mechanism discussed here is analogous to stipulating that every text along the way was viable as literature. Such gradualistic series have not been shown to be possible in written text or computer programs. Nor have they been shown to exist in biology. If this is how new genes are supposed to evolve, the mechanism remains to be demonstrated.
Computer Programs1) One well-known computer program that purports to mimic evolution is the one by Richard Dawkins that creates "biomorphs". The program generates stick figures that resemble insects, trees, bats, spiders, etc. The figures show a certain amount of variety as they evolve. But the evolution is by artificial selection, and nothing like gene duplication occurs. Instead, only nine or sixteen variables (in different versions) are allowed to wander within narrow ranges. These few variables occupy a tiny fraction of the "genome" that generates the biomorphs, which includes Dawkins's application program and the necessary parts of the computer's operating system. The sequence space explored by Dawkins's program is tightly confined and every member of it is functional. Saying that this process represents evolution is like saying that the song "Happy Birthday" evolves whenever it is sung for a different person. Certainly, nothing analogous to a new gene is created by Dawkins's biomorphs.
2) The program by Tom Ray called Tierra is also well-known. It starts with a species that originally has 80 instructions. The creatures multiply and evolve until the computer's storage capacity is full. From then on the population is controlled by killing off creatures ranking lower on a fitness scale. One common outcome is the evolution of parasitism. Parasitism is known to be important in biological evolution. But the evolution of parasitism does not necessarily require any new genes — the genes of the parasites and hosts already exist beforehand. True, biological genomes that become related in this way may in fact require new genes to make them compatible with each other. But in Tierra, nothing suggests that anything analogous to a new gene is ever created.
4) The fourth computer program is different from the other three because it was intended not to model evolution but to automate the updating of software — and it was never implemented. At the Second Artificial Life Conference held February 5 - 9, 1990, in Santa Fe, New Mexico, Harold Thimbleby of Scotland proposed that updates to existing computer programs could be distributed and installed automatically by computer viruses (Kelly). Such a virus would contain code that would a) recognize a host needing the upgrade, b) provide and install the upgrade, and c) from there, infect other computers also needing the upgrade. Here the computer analog of new genes for evolutionary improvements would be inserted from elsewhere by viruses, as in Cosmic Ancestry. That this process could work in computers was not disputed at the Santa Fe conference. And in biology, that viruses can spread by infection and insert their own genes into their hosts' genome, including the germline, without harm, is already known. Ohno himself mentions that viruses are a possible mechanism behind biological evolution and he says viruses would be the only way to transform whole populations at once (Ohno, p 55).
DiscussionWe have believed since before Darwin that biology does not have a different set of rules from the rest of science. If Darwinian evolution works, it should be possible to mimic the process in software. By whatever mechanism, Ohno's or other, computers should be able to mimic what biological evolution has done. In the discussion above, we have focused on the creation of new genes that code for new functions.
More broadly, if a software model of evolution is possible, ordinary personal computers should be able to evolve wholly new, unexpected features that are somehow advantageous to them or their software. For example, computers might acquire the ability to activate other helpful programs, network with other computers, use the telephone, identify and disarm harmful viruses, automatically backup themselves, survive crashes, etc. All of these improvements would require new computer code. Since computer programs are transferred constantly, and duplicated, and mistakes are inserted occasionally, just as in biology, the opportunity for existing computer programs to evolve by the Darwinian method is already in place.
Of course, in the marketplace, computers have acquired these and many other new abilities, but not in a closed system. To mimic Darwinian evolution, they would have to evolve improvements without input from programmers, starting with only programs already available. To suggest that computers ever might evolve significant improvements this way seems farfetched. Why? Can computers, without the input of new code, write for themselves any programs with fundamentally new meaning? Is there any example of an improvement to personal computers that was written by the unguided random duplication, mutation and recombination of existing code? Or, is the Darwinian account of the evolution of biological improvements equally farfetched?
Returning to the narrower original question, can any computer model of Darwinian evolution produce the analog of new genes? If not, perhaps we should wonder if the Darwinian mechanism is sufficient to produce new genes on Earth, or whether another source for them is necessary.
8 Aug 2008: Something is missing in our understanding of how evolution produced complex creatures.
3 Apr 2008: What is the origin of eukaryotic RNA polymerases?
28 Aug 2007: Varying environments can speed up evolution.
16 Sep 2005: ...All the genes for building those complex animals existed long before [the Cambrian] explosion — Lewis Wolpert
14 Nov 2004: The birth of a new gene unique to apes and humans....
2003, Oct 21: George Dyson's Darwin Among the Machines is the subject of a reply from Doug Early.
2003, May 11: Computer model evolves complex functions?
2003, March 3: What Evolution Is, by Ernst Mayr.
2003, January 26: "Evolving Inventions" in Scientific American.
2003, January 17: Duplicated genes serve backup functions.
2002, December 31: No evolutionary progress in a closed system!
Do Tierran Programs Dream of Darwinian Dynamics?, by N.A. Johnson, HMSBeagle, 31 August 2001.
Long and Thornton; Zhang et al.; Lynch and Conery, "Gene Duplication and Evolution"[text], p 1551 v 293 Science, 31 August 2001.
Dennis Overbye, "Time of Growing Pains for Information Age"[text], The New York Times, 7 August 2001. "[Seven] scientists ...could not even agree on useful definitions of their field's most common terms, like 'information' and 'complexity'...."
Claus O. Wilke et al., "Evolution of digital organisms at high mutation rates leads to survival of the flattest"[abstract, info], p 331-333 v 421 Nature 19 July 2001. "...Competition ...favour[ed] the genotype with the lower replication rate. These genotypes, although they occupied lower fitness peaks, were located in flatter regions of the fitness surface and were therefore more robust with respect to mutations."
Survival Of The Flattest, SpaceDaily.com, 23 July 2001.
Rodney Brooks, "The relationship between matter and life,"p 409-411 v 409 Nature Insight (Nature Publishing Group Supplement), 18 January 2001: "What's going wrong?"
Play It As It Learns by Steven Johnson, FEED, 2000.
2000, November 12: Evolution by Gene Duplication? A database analysis.
Ordinary miracles re: Michael Brooks, New Scientist, 6 May 2000. (Another model with cellular automata.)
Letting Nature Contribute To Computer Programming re: James A. Foster, UniSci, 11 June 1999.
1999, August 12: New computer model of evolution
1998, December 11: The genome of a tiny worm contains many genes with no similarity to previously known genes.
BibliographyBult, Carol J. et al. "Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii" p 1058 v 273 Science. 23 August 1996.
Dawkins, Richard. The Blind Watchmaker. W.W. Norton and Company, Inc. 1987.
Eckland, Eric H.; Jack W. Szostak and David P. Bartel. "Structurally Complex and Highly Active RNA Ligases Derived from Random RNA Sequences" p 364 v 269 Science. 21 July 1995.
Eigen, Manfred. "New Concepts for Dealing with the Evolution of Nucleic Acids"p 307-320, Cold Spring Harbor Symposia on Quantitative Biology, Volume II: Evolution of Catalytic Function. Cold Spring Harbor Laboratory. 1987.
Kelly, Kevin. "Designing Perpetual Novelty: Selected Notes from the Second Artificial Life Conference" pp 1-44 Doing Science, John Brockman, ed. Prentice Hall Press 1991.
Koza, John R. "Genetic Evolution and Coevolution of Computer Programs" pp 603-630 Artificial Life II, Christopher G. Langton et al. eds. Addison-Wesley Publishing Company 1992.
Morell, Virginia. "Life's Last Domain" p 1043 v 273 Science. 23 August 1996.
Ohno, Susumu. Evolution by Gene Duplication. Springer-Verlag 1970.
Olomucki, Martin. The Chemistry of Life. McGraw-Hill, Inc. 1993.
Ray, Thomas S. "An Approach to the Synthesis of Life" pp 371-408 Artificial Life II, Christopher G. Langton et al. eds. Addison-Wesley Publishing Company 1992.
Wickramasinghe, Chandra. "Evidence in the Trial at Arkansas, December, 1981," http://www.panspermia.org/chandra.htm.
Wray, Gregory; Jeffrey S. Levinton and Leo H. Shapiro. "Molecular Evidence for Deep Precambrian Divergences Among Metazoan Phyla" p 568 v 274 Science. 25 October 1996.