John Koza's Publications: Year Index:
Koza, John R. 1994a. Genetic Programming II: Automatic Discovery of
It is often argued that the process of solving complex problems can be automated by first decomposing the problem into subproblems, then solving the presumably simpler subproblems, and then assembling the solutions to the subproblems into an overall solution to the original problem. The overall effort required to solve a problem can potentially be reduced to the extent that the decomposition process uncovers subproblems that are disproportionately easy to solve and to the extent that regularities in the problem environment permit multiple use of the solutions to the subproblems. Sadly, conventional techniques of machine learning and artificial intelligence provide no effective means for automatically executing this alluring three-step problem-solving process on a computer.
describes a way to automatically implement this three-step problem-solving process by means the recently developed technique of automatically defined functions in the context of genetic programming. Automatically defined functions enable genetic programming to define useful and reusable subroutines dynamically during a run. This new technique is illustrated by solving, or approximately solving, example problems from the fields of Boolean function learning, symbolic regression, control, pattern recognition, robotics, classification, and molecular biology. In each example, the problem is automatically decomposed into subproblems; the subproblems are automatically solved; and the solutions to the subproblems are automatically assembled into a solution to the original problem. Leverage accrues because genetic programming with automatically defined functions repeatedly uses the solutions to the subproblems in the assembly of the solution to the overall problem. Moreover, genetic programming with automatically defined functions produces solutions that are simpler and smaller than the solutions obtained without automatically defined functions.
Koza, John R. 1994b. Genetic Programming II
Videotape: The Next Generation.
The book Genetic Programming II: Automatic Discovery of Reusable Programs extends the results of John Koza's groundbreaking work on programming computers by means of natural selection, described in this first book, Genetic Programming. This videotape provides an explanation of automatically defined functions, the hierarchical approach to problem solving by means of genetic programming with automatically defined functions, and a visualization of computer runs for many of the problems discussed in Genetic Programming II. These problems include symbolic regression, the parity problem, the lawnmower problem, the bumblebee problem, the artificial ant, the impulse response problem, the minesweeper problem. the letter recognition problem, the transmembrane problem, and the omega loop problem.
Koza, John R. 1994c. Spontaneous emergence of self-replicating and
evolutionarily self-improving computer programs. In Langton, Christopher G.
(editor). Artificial Life III, SFI Studies in the Sciences of Complexity. Volume
This ALIFE-92 paper reports on the spontaneous emergence, from a primordial ooze of primitive computational elements that are only capable of agglomerating themselves to one another, of computer programs exhibiting the ability to asexually reproduce, to reproduce by combining parts from two parents, and to improve their performance through evolution.
Click here for a PDF file of this ALIFE-1992 conference paper (published in 1994)
Koza, John R. 1994d. Evolution of a computer program for classifying protein
segments as transmembrane domains using genetic programming. In Altman, Russ,
Brutlag, Douglas, Karp, Peter, Lathrop, Richard, and Searls, David (editors). Proceedings
of the Second International Conference on Intelligent Systems for Molecular
The recently-developed genetic programming paradigm is used to evolve a computer program to classify a given protein segment as being a transmembrane domain or non-transmembrane area of the protein. Genetic programming starts with a primordial ooze of randomly generated computer programs composed of available programmatic ingredients and then genetically breeds the population of programs using the Darwinian principle of survival of the fittest and an analog of the naturally occurring genetic operation of crossover (sexual recombination). Automatic function definition enables genetic programming to dynamically create subroutines dynamically during the run. Genetic programming is given a training set of differently-sized protein segments and their correct classification (but no biochemical knowledge, such as hydrophobicity values). Correlation is used as the fitness measure to drive the evolutionary process. The best genetically-evolved program achieves an out-of-sample correlation of 0.968 and an out-of-sample error rate of 1.6%. This error rate is better than that reported for four other algorithms reported at the First International Conference on Intelligent Systems for Molecular Biology. Our genetically evolved program is an instance of an algorithm discovered by an automated learning paradigm that is superior to that written by human investigators.
Click here for a PDF file of this ISMB-1994 conference paper
Koza, J. R. 1994e. Evolution of a subsumption architecture that performs a
wall following task for an autonomous mobile robot via genetic programming. In
Hanson, Stephen Jose, Petsche, Thomas,
The goal in automatic programming is to get a computer to perform a task by telling it what needs to be done, rather than by explicitly programming it. This paper considers the task of automatically generating a computer program to enable an autonomous mobile robot to perform the task of following the wall of an irregular shaped room. A human programmer has written such a program in the style of the subsumption architecture. The solution produced by genetic programming emerges as a result of Darwinian natural selection and genetic crossover (sexual recombination) in a population of computer programs. This evolutionary process is driven by a fitness measure which communicates the nature of the task to the computer.
Click here for a PDF file of this chapter in Hanson, Petsche, Kearns, and Rivest book
Koza, J. R. 1994f. Recognizing patterns in protein sequences using
iteration-performing calculations in genetic programming. Proceedings of
the First IEEE Conference on Evolutionary Computation. IEEE Press.
This paper uses genetic programming with automatically defined functions (ADFs) for the dynamic creation of a pattern-recognizing computer program consisting of initially-unknown detectors, an initially-unknown iterative calculation incorporating the as-yet-undiscovered detectors, and an initially-unspecified final calculation incorporating the results of the as-yet-unspecified iteration. The program's goal is to recognize a given protein segment as being a transmembrane domain or non-transmembrane area of the protein. Genetic programming with automatic function definition is given a training set of differently-sized mouse protein segments and their correct classification. Correlation is used as the fitness measure. Automatic function definition enables genetic programming to dynamically create subroutines (detectors). A restricted form of iteration is introduced to enable genetic programming to perform calculations on the values returned by the detectors. When cross-validated, the best genetically-evolved recognizer for transmembrane domains achieves an out-of-sample correlation of 0.968 and an out-of-sample error rate of 1.6%. This error rate is better than that recently reported for five other methods.
Click here on this ICEC-1994 conference paper
Koza, John R. 1994g. Automated discovery of detectors and iteration-performing calculations to recognize patterns in protein sequences using genetic programming. Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press. Pages 684-689.
This paper describes an automated process for the dynamic creation of a pattern-recognizing computer program consisting of initially-unknown detectors, an initially-unknown iterative calculation incorporating the as-yet-uncreated detectors, and an initially-unspecified final calculation incorporating the results of the as-yet-uncreated iteration. The program's goal is to recognize a given protein segment as being a transmembrane domain or non-transmembrane area. The recognizing program to solve this problem will be evolved using the recently-developed genetic programming paradigm. Genetic programming starts with a primordial ooze of randomly generated computer programs composed of available programmatic ingredients and then genetically breeds the population using the Darwinian principle of survival of the fittest and the genetic crossover (sexual recombination) operation. Automatic function definition enables genetic programming to dynamically create subroutines (detectors). When cross-validated, the best genetically-evolved recognizer achieves an out-of-sample correlation of 0.968 and an out-of-sample error rate of 1.6%. This error rate is better than that recently reported for five other methods.
Click here for a PDF file of this CVPR-1994 conference paper
Koza, John R. 1994h. Genetic programming as a means for programming computers by natural selection. Statistics and Computing. Volume 4. Pages 87-112.
Many seemingly different problems in machine learning, artificial intelligence, and symbolic processing can be viewed as requiring the discovery of a computer program that produces some desired output for particular inputs. When viewed in this way, the process of solving these problems becomes equivalent to searching a space of possible computer programs for a highly fit individual computer program. The recently developed genetic programming paradigm described herein provides a way to search the space of possible computer programs for a highly fit individual computer program to solve (or approximately solve) a surprising variety of different problems from different fields. In genetic programming, populations of computer programs are genetically bred using the Darwinian principle of survival of the fittest and using a genetic crossover (sexual recombination) operator appropriate for genetically mating computer programs. Genetic programming is illustrated via an example of machine learning of the Boolean 11-multiplexer function, symbolic regression of the econometric exchange equation from noisy empirical data, the control problem of backing up a tractor-trailer truck, the classification problem of distinguishing between two intertwined spirals., and the robotics problem of controlling an autonomous mobile robot to find a box in the middle of an irregular room and move the box to the wall.
Hierarchical automatic function definition enables genetic programming to define potentially useful functions automatically and dynamically during a run - much as a human programmer writing a complex computer program creates subroutines (procedures, functions) to perform groups of steps which must be performed with different instantiations of the dummy variables (formal parameters) in more than one place in the main program. Hierarchical automatic function definition is illustrated via the machine learning of the Boolean 11-parity function.
Click here for a PDF file of long version of this Statistics and Computing journal paper
Click here for a PDF file of short version of this Statistics and Computing journal paper
Koza, John R. 1994i. Architecture-altering operations for evolving the architecture of a multi-part program in genetic programming. Stanford University Computer Science Department technical report STAN-TR-CS-94-1528. October 21, 1994.
Previous work described a way to evolutionarily select the architecture of a multi-part computer program from among preexisting alternatives in the population while concurrently solving a problem during a run of genetic programming. This report describes six new architecture-altering operations that provide a way to evolve the architecture of a multi-part program in the sense of actually changing the architecture of programs dynamically during the run.
The new architecture-altering operations are motivated by the naturally occurring operation of gene duplication as described in Susumu Ohno's provocative 1970 book Evolution by Means of Gene Duplication as well as the naturally occurring operation of gene deletion.
The six new architecture-altering operations are branch duplication, argument duplication, branch creation, argument creation, branch deletion and argument deletion.
A connection is made between genetic programming and other techniques of automated problem solving by interpreting the architecture-altering operations as providing an automated way to specialize and generalize programs.
The report demonstrates that a hierarchical architecture can be evolved to solve an illustrative symbolic regression problem using the architecture-altering operations.
Future work will study the amount of additional computational effort required to employ the architecture-altering operations.
Click here for a PDF file of this Stanford University technical report
Click here for Post Script (PS) file of this Stanford University technical report
Koza, John R. 1994j. Scalable learning in genetic programming using automatic
function definition. In Kinnear, Kenneth E. Jr. (editor). Advances in
This chapter uses three differently sized versions of an illustrative problem that has considerable regularity, symmetry, and homogeneity in its problem environment to compare genetic programming with and without the newly developed mechanism of automatic function definition. Genetic programming with automatic function definition can automatically decompose a problem into simpler subproblems, solve the subproblems, and assemble the solutions to the subproblems into a solution to the original overall problem. The solutions to the problem produced by genetic programming with automatic function definition are more parsimonious than those produced without it. Genetic programming requires fewer fitness evaluations to yield a solution to the problem with 99% probability with automatic function definition than without it.
When we consider the three differently sized versions of the problem we find that the size of the solutions produced without automatic function definition can be expressed as a direct multiple of problem size. In contrast, the average size of solutions with automatic function definition is expressed as a certain minimum size representing the overhead associated with automatic function definition; however, there is only a very slight increase in the average size of the solutions with problem size. Moreover, the number of fitness evaluations required to yield a solution to the problem with a 99% probability grows very rapidly with problem size without automatic function definition, but this same measure grows only linearly with problem size with automatic function definition.
Click here for PDF file of chapter 5(Automatically Defined Functions) of the AiGP-1 book
Koza, John R. 1994k. Introduction to genetic programming. In Kinnear,
Kenneth E. Jr. (editor). Advances in Genetic Programming.
This chapter provides an introduction to genetic algorithms, the LISP programming language, genetic programming, and automatic function definition. This chapter also outlines additional sources of information about genetic algorithms and genetic programming.
Click here for PDF file of chapter 1 (Introduction) of the AiGP-1 book
Koza, John R. 1994-L. Evolution of emergent cooperative behavior using
genetic programming. In Paton, Ray (editor). Computing with Biological
The recently developed genetic programming paradigm provides a way to genetically breed a computer program to solve a wide variety of problems. Genetic programming employs Darwinian survival and reproduction of the fittest and the genetic crossover (sexual recominbation) operation to evolve progressively better solutions to problems. In this chapter, a computer program that governs the behavior of a group of independently acting agents is genetically evolved to solve the "painted desert" problem. The evolved program exhibits emergent and cooperative behavior.
Click here for a PDF file of this chapter in the Paton book
· Koza, John R., Andre,
David, and Tackett, Walter Alden. Simultaneous
Evolution of the Architecture of a Multi-Part Program to Solve a Problem Using
Architecture Altering Operations.
· The home page of Genetic Programming Inc. at www.genetic-programming.com.
· For information about the field of genetic programming and the field of genetic and evolutionary computation, visit www.genetic-programming.org
· For information about John Koza’s course on genetic algorithms and genetic programming at Stanford University
· Information about the 1992 book Genetic Programming: On the Programming of Computers by Means of Natural Selection, the 1994 book Genetic Programming II: Automatic Discovery of Reusable Programs, the 1999 book Genetic Programming III: Darwinian Invention and Problem Solving, and the 2003 book Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Click here to read chapter 1 of Genetic Programming IV book in PDF format.
· 3,440 published papers on genetic programming (as of November 28, 2003) in a searchable bibliography (with many on-line versions of papers) by over 880 authors maintained by William Langdon’s and Steven M. Gustafson.
· For information on the Genetic Programming and Evolvable Machines journal published by Kluwer Academic Publishers
· For information on the Genetic Programming book series from Kluwer Academic Publishers, see the Call For Book Proposals
· For information about the
annual 2005 Genetic and
Evolutionary Computation (GECCO) conference (which includes the annual
GP conference) to be held on June 25–29, 2005 (Saturday – Wednesday) in
Last updated on August 21, 2004