Genetic Algorithms - Part I
Genetic algorithms (or GAs) are an Artificial Intelligence technique based upon the standard model of genetics and evolution. They are part of the growing field of Evolutionary Algorithms that have been found to be very effective at 'general adaptation' - that is, they are easily adaptable to a wide range of problems.
GAs can be a very useful tool for searching through a 'problem space' - a theoretical space containing all possible combinations of input data. Take, for example, the problem of designing a colour scheme for a website.
If the site in question is to have 3 colours (background, foreground and text colour) and each of those colours is defined, as HTML colours usually are, by a 6 digit hex code (e.g. #123ABC) then that means that each colour can take on one of 16777215 possible values (16777215 is the decimal equivalent to the largest 6 digit hex number #FFFFFF).
This in turn determines that the total number of possible combinations of colours is
16777215 * 16777215 * 16777215 = 4722365638444765413375
or 4.722 * 10^21 colour combinations.
I think my maths is correct there - even if it's not you get the idea - it's a vast number of colour combinations.
The question now arises - How to choose which one of these possible combinations is the one to use for the website.
As it happens this is exactly the type of problem that Genetic Algorithms are useful for. We know exactly what our range of inputs is going to be (three 6 digit hex numbers - we can even join them together to create just one 18 digit hex number) - we just need to be able 'home in' on the one set of values that's right for the website.
How a GA Works
There are a number of elements that are found in all Genetic Algorithms
Genetic Representation - To make a particular problem suitable for use with a GA we first need to decide how that problem is to be represented by a 'genome'. In many cases a genome consists of a string of binary bits (e.g. '1001101') where each bit specifies if some aspect of a solution is present in that instance - this is known as 'direct encoding'. We can use a similar approach to our colour scheme example and use genomes that are made up of 18 hexadecimal digits (3 colours * 6 hex digits per colour). So some examples of possible solutions to the problem would be:
Selection - Every GA needs a way of determining which genome is 'the best'. Many GAs use an automated system called a 'fitness function' that determines how good each and every solution is. Fitness functions are useful in cases where possible solutions all have certain properties in common. In our colour scheme example we cannot be so specific about what constitutes an acceptable solution so instead we can use our own personal preference to select which solution we consider to be 'the best colour scheme' from the range of possibilities.
Variation - Once the best solution from the population has been selected it is then used as the basis of the next generation. Since we do not want exact copies of our best solution to be used we must choose a way of varying the information stored in the genome with an aim to producing slightly better solutions. There are many options available to us to do this - for example:
In the next instalment I'll use the colour scheme selection example above to illustrate how to program a simple Genetic Algorithm using PHP.
GAs can be a very useful tool for searching through a 'problem space' - a theoretical space containing all possible combinations of input data. Take, for example, the problem of designing a colour scheme for a website.
If the site in question is to have 3 colours (background, foreground and text colour) and each of those colours is defined, as HTML colours usually are, by a 6 digit hex code (e.g. #123ABC) then that means that each colour can take on one of 16777215 possible values (16777215 is the decimal equivalent to the largest 6 digit hex number #FFFFFF).
This in turn determines that the total number of possible combinations of colours is
16777215 * 16777215 * 16777215 = 4722365638444765413375
or 4.722 * 10^21 colour combinations.
I think my maths is correct there - even if it's not you get the idea - it's a vast number of colour combinations.
The question now arises - How to choose which one of these possible combinations is the one to use for the website.
As it happens this is exactly the type of problem that Genetic Algorithms are useful for. We know exactly what our range of inputs is going to be (three 6 digit hex numbers - we can even join them together to create just one 18 digit hex number) - we just need to be able 'home in' on the one set of values that's right for the website.
How a GA Works
There are a number of elements that are found in all Genetic Algorithms
Genetic Representation - To make a particular problem suitable for use with a GA we first need to decide how that problem is to be represented by a 'genome'. In many cases a genome consists of a string of binary bits (e.g. '1001101') where each bit specifies if some aspect of a solution is present in that instance - this is known as 'direct encoding'. We can use a similar approach to our colour scheme example and use genomes that are made up of 18 hexadecimal digits (3 colours * 6 hex digits per colour). So some examples of possible solutions to the problem would be:
368CD00358B24FCC02In each cycle of the genetic algorithm a set of genomes is created (known as a 'population' or 'generation') and one or more of these is selected to be the basis for the next generation.
FA1184E6D7A17D317A
027EC381AC6341ADF6
Selection - Every GA needs a way of determining which genome is 'the best'. Many GAs use an automated system called a 'fitness function' that determines how good each and every solution is. Fitness functions are useful in cases where possible solutions all have certain properties in common. In our colour scheme example we cannot be so specific about what constitutes an acceptable solution so instead we can use our own personal preference to select which solution we consider to be 'the best colour scheme' from the range of possibilities.
Variation - Once the best solution from the population has been selected it is then used as the basis of the next generation. Since we do not want exact copies of our best solution to be used we must choose a way of varying the information stored in the genome with an aim to producing slightly better solutions. There are many options available to us to do this - for example:
Mutation - involves changing one element (sometimes more) of the genome by a very slight amount. This will lead to very small variations of the original genome and is good for 'fine tuning' a solution.
Crossover - involves swapping large 'chunks' of the genome with one another. This leads to much greater variation and makes broad changes to the original genome, however, the resulting variation may bear little or no resemblance to the original genome.Mixing both of these techniques allows us to provide a range of variations. It is this process of continual variation that makes the Genetic Algorithm so powerful when searching for alternative solutions to a problem.
In the next instalment I'll use the colour scheme selection example above to illustrate how to program a simple Genetic Algorithm using PHP.