Code generation by genetic algorithms

11,957

Solution 1

If you are sure you want to do this, you want genetic programming, rather than a genetic algorithm. GP allows you to evolve tree-structured programs. What you would do would be to give it a bunch of primitive operations (while($register), read($register), increment($register), decrement($register), divide($result $numerator $denominator), print, progn2 (this is GP speak for "execute two commands sequentially")).

You could produce something like this:

progn2(
  progn2(
    read($1)
    while($1
      progn2(
        while($1
          progn2( #add the input to the total
            increment($2)
            decrement($1)
          )
        )
        progn2( #increment number of values entered, read again
          increment($3)
          read($1)
        )
      )
    )
  )
  progn2( #calculate result
    divide($1 $2 $3)
    print($1)
  )
)  

You would use, as your fitness function, how close it is to the real solution. And therein lies the catch, that you have to calculate that traditionally anyway*. And then have something that translates that into code in (your language of choice). Note that, as you've got a potential infinite loop in there, you'll have to cut off execution after a while (there's no way around the halting problem), and it probably won't work. Shucks. Note also, that my provided code will attempt to divide by zero.

*There are ways around this, but generally not terribly far around it.

Solution 2

It can be done, but works very badly for most kinds of applications.

Genetic algorithms only work when the fitness function is continuous, i.e. you can determine which candidates in your current population are closer to the solution than others, because only then you'll get improvements from one generation to the next. I learned this the hard way when I had a genetic algorithm with one strongly-weighted non-continuous component in my fitness function. It dominated all others and because it was non-continuous, there was no gradual advancement towards greater fitness because candidates that were almost correct in that aspect were not considered more fit than ones that were completely incorrect.

Unfortunately, program correctness is utterly non-continuous. Is a program that stops with error X on line A better than one that stops with error Y on line B? Your program could be one character away from being correct, and still abort with an error, while one that returns a constant hardcoded result can at least pass one test.

And that's not even touching on the matter of the code itself being non-continuous under modifications...

Solution 3

Well this is very possible and @Jivlain correctly points out in his (nice) answer that genetic Programming is what you are looking for (and not simple Genetic Algorithms).

Genetic Programming is a field that has not reached a broad audience yet, partially because of some of the complications @MichaelBorgwardt indicates in his answer. But those are mere complications, it is far from true that this is impossible to do. Research on the topic has been going on for more than 20 years.

Andre Koza is one of the leading researchers on this (have a look at his 1992 work) and he demonstrated as early as 1996 how genetic programming can in some cases outperform naive GAs on some classic computational problems (such as evolving programs for Cellular Automata synchronization).

Here's a good Genetic Programming tutorial from Koza and Poli dated 2003.

For a recent reference you might wanna have a look at A field guide to genetic programming (2008).

Solution 4

Since this question was asked, the field of genetic programming has advanced a bit, and there have been some additional attempts to evolve code in configurations other than the tree structures of traditional genetic programming. Here are just a few of them:

  • PushGP - designed with the goal of evolving modular functions like human coders use, programs in this system store all variables and code on different stacks (one for each variable type). Programs are written by pushing and popping commands and data off of the stacks.
  • FINCH - a system that evolves Java byte-code. This has been used to great effect to evolve game-playing agents.
  • Various algorithms have started evolving C++ code, often with a step in which compiler errors are corrected. This has had mixed, but not altogether unpromising results. Here's an example.
  • Avida - a system in which agents evolve programs (mostly boolean logic tasks) using a very simple assembly code. Based off of the older (and less versatile) Tierra.

Solution 5

The language isn't an issue. Regardless of the language, you have to define some higher-level of mutation, otherwise it will take forever to learn.

For example, since any Ruby language can be defined in terms of a text string, you could just randomly generate text strings and optimize that. Better would be to generate only legal Ruby programs. However, it would also take forever.

If you were trying to build a sorting program and you had high level operations like "swap", "move", etc. then you would have a much higher chance of success.

In theory, a bunch of monkeys banging on a typewriter for an infinite amount of time will output all the works of Shakespeare. In practice, it isn't a practical way to write literature. Just because genetic algorithms can solve optimization problems doesn't mean that it's easy or even necessarily a good way to do it.

Share:
11,957
dfens
Author by

dfens

on photo me getting euthanized

Updated on June 03, 2022

Comments

  • dfens
    dfens almost 2 years

    Evolutionary programming seems to be a great way to solve many optimization problems. The idea is very easy and the implementation does not make problems.

    I was wondering if there is any way to evolutionarily create a program in ruby/python script (or any other language)?

    The idea is simple:

    1. Create a population of programs
    2. Perform genetic operations (roulette-wheel selection or any other selection), create new programs with inheritance from best programs, etc.
    3. Loop point 2 until program that will satisfy our condition is found

    But there are still few problems:

    1. How will chromosomes be represented? For example, should one cell of chromosome be one line of code?
    2. How will chromosomes be generated? If they will be lines of code, how do we generate them to ensure that they are syntactically correct, etc.?

    Example of a program that could be generated:

    Create script that takes N numbers as input and returns their mean as output.

    If there were any attempts to create such algorithms I'll be glad to see any links/sources.