Though this project uses an RC helicopter, it’s merely a vessel to demonstrate a fascinating machine learning algorithm developed by two Cornell students – [Akshay] and [Sergio]. The learning environment is set up with the helicopter at its center, attached to a boom. The boom restricts the helicopter’s movement down to one degree of motion, so that it can only move up from the ground (not side to side or front to back).
The goal is for the helicopter to teach itself how to get to a specific height in the quickest amount of time. A handful of IR sensors are used to tell the Atmega644 how high the helicopter is. The genius of this though, is in the firmware. [Akshay] and [Sergio] are using an evolutionary algorithm adopted from Floreano et al, a noted author on biological inspired artificial intelligences. The idea is for the helicopter to create random “runs” and then check the data. The runs that are closer to the goal get refined while the others are eliminated, thus mimicking evolutions’ natural selection.
We’ve seen neural networks before, but nothing like this. Stay with us after the break, as we take this awesome project and narrow it down so that you too can implement this type of algorithm in your next project.
Consider the image above. The goal is for the helicopter to start at Point A, go to Point C and hover. Allotted time is 10 seconds per run. It has to teach itself how to do this and do it as quickly as possible. Remember, it knows where these points are via IR sensors. [Akshay] and [Sergio] developed an equation using a piecewise function to determine which runs were closest to Point C for the longest amount of time.
Each of the points in the above equation is known via a voltage from the IR sensors, with Point A being 0.1 volts and Point D being 3.7 volts. The equation is designed to give the greatest value for the longest time spent at Point C. This value is known as a Fitness Value.
A neural network is used to determine at what level the throttle should be at to achieve the highest Fitness Value. This network is apart the Evolutionary Algorithm that runs in the firmware. Basically, it starts off with random values that generate random levels of throttle. The values that achieve the highest Fitness Value get ‘mutated’, while the others are discarded.
The mutations in the values are done at random, and the process repeats. In the end, the firmware learns the best throttle levels to achieve the goal of being at Point C for the longest time in the allotted 10 seconds.
Be sure to check out this linked project for full details on these mutations are carried out in the source.