Robots That Learn by Itself to avoid Obstacle by Mimicking Evolution

News

Introduction

Many computing systems are being modeled by imitating the biological behavior in nature because of its adaptive behavior. In the context of robot, the adaptive behavior enables the robot to operate with minimum or no supervision by human in a changing environment. This behavior is important when the robot must operate in an environment that is unsafe for human, not clearly known, and having limited communication link. Examples are the outer space and the deep sea.

Two popular algorithms modeling the biological brain and the theory of evolution are the artificial neural network (ANN) [1] and the genetic algorithm (GA) [2] respectively. The artificial neural network models the connections of neuron in the brain which affect the behavior of the individual. The genetic algorithm models the Darwin’s evolutionary theory where the organism with the best genetic traits will survive the natural selection from one generation to another. Generally, the ANNs are trained in the supervised learning scheme where the network is given a set of inputs and expected outputs. Algorithm such as the backpropagation is used to adjust the strength of connections between neurons when the actual outputs of the network are not the same as the expected outputs.

The supervised learning scheme requires a comprehensive training set to train the ANN. Such training set may be difficult or even impossible to be completely obtained. The more complex the behavior and environment, the more difficult it is to obtain the training set. Moreover, the backpropagation algorithm is a gradient-based optimization algorithm, which is prone to be trapped in local minimas of the search space.

The evolutionary neurocontroller, summarized in [3-4], offers an alternative for training neurocontroller, an artificial neural network for control purpose. The training involves the evolution of a population of neurocontrollers using the genetic algorithm. The cromosome of each neurocontroller is the string of all connection weights. Instead of training a single neurocontroller, many neurocontrollers compete for survival for several generations based on their fitness value. Instead of specifying all the reactions for all situations, as in the supervised learning, the fitness function measures the performance of each neurocontroller in accordance to the desired behavior. At each generation the neurocontrollers are subjected to elimination of the lowest fitness individuals, crossover between individuals, and random mutation. The evolution continues for generations until a neurocontroller with fitness value exceed a threshold value is found.

This approach of learning is categorized as the reinforcement learning where the neurocontroller is just given a set of general rules to achieve. The actual strategy to achieve that desired behavior is learned by the neurocontroller through interaction with the environment. Furthermore this learning algorithm is less prone of being trapped in the local minimas.

This article discusses a variation of the evolutionary neurocontroller in [5] in which the proposed neurocontroller is consisted of simply binary weights, hence the cromosome is binary. The proposed neurocontrollers are evolving in computer simulation, and the best neurocontroller is implemented in an actual mobile robot. Analysis of the adaptive behavior of the neurocontroller is carried out in the simulation and an actual mobile robot.

The Mobile Robot

The mobile robot to be controlled is a wheeled mobile robot having two wheels and a caster to balance the robot. Each wheel has a diameter of 6.5 cm and it is driven by a DC motor that can rotate in forward and backward direction. The robot is 15 cm in length and 15 cm in wide and the two wheels are aligned to the back of the robot so that the turning radius of the robot is 13.5 cm from the center of the wheel axis. Details of the components arrangements and the photograph of the actual mobile robot are shown in Fig.1. The placement of components in the mobile robot are arranged to impose the same load to the two wheels.

Fig.1. The arrangement of components in the mobile robot (a) under the base layer, (b) above the base layer, (c) above the sensor and processing layer, and (d) the photograph of the actual mobile robot.

The robot is equipped with eight ultrasonic sensors distributed around the robot so that each two sensor’s headings are deviated by 45 degrees. The sensors are aligned so that one sensor is heading straight to the forward direction of the mobile robot, hence there is each one sensor aligned to the back, left, and right sides of the robot. Each sensor can measure the distance to an object 2 cm to 3 m from the sensor.

An ATMega8 microcontroller of the Atmel AVR family is used in the mobile robot to collect information from all 8 ultrasonic sensors and to control the two DC motors. All the sensors are connected to the microcontroller through an I2C bus. The DC motors are controlled through the appropriate current drivers using pulse width modulation signal from the microcontroller. The best neurocontroller found from the simulation of evolution is implemented in the microcontroller as feed forward network without learning capability.

The Arena

The arena is simply a 2×2 m wooden floor and bordered with 25 cm high wooden walls at the four edges. The obstacles are paper cubes, each sized 15 cm. These cubes can be moved to any location inside the arena during hardware evaluation of the best neurocontroller.

Fig.2. Photograph of the actual arena with five obstacles

Each neurocontroller is a single-layer perceptron consisted of eight input nodes and two output nodes. The eight input nodes each receive information from an ultrasonic sensor, and the two output nodes control the two DC motors.

Fig.3. Construction of each neurocontroller

The neurocontroller’s inputs, weights, and outputs are all 1-bit binary values. Distance value from each sensor are binary 0 and 1 with a threshold distance of 16 cm. The weights are 1-bit binary values that model the connectivity of synapses from an input node to an output node. The activation function is a threshold function that instructs the wheel to rotate forward when the internal activity of the neuron is zero and rotate backward when it is larger than zero. Combining the output of all two nodes in the output layer producing four possible movements of the mobile robot, listed in Table 1.

Table 1. Possible Movements of the Mobile Robot

Left Wheel	Right Wheel	Movement
0	0	Forward
0	1	Rotate right
1	0	Rotate left
1	1	Backward

The Evolution of Neurocontrollers

Simulation of the evolution is performed in computer. The mobile robot kinematic is modeled according to the dimensions of the actual mobile robot, while the dynamic is assumed to be ideal, such as zero mass and zero friction. Each sensor in the simulation is assumed to have zero angle of view, hence it only measures the distance of obstacle straight in from of the sensor. The actual sensors however have field of view of a few tens of degrees. The arena is modeled as a two-dimension plane with line walls and square obstacles. The mobile robot velocity vectors can only be parallel to the plane. The arena’s walls and the obstacles are modeled as rigid static objects during evolution.

The Chromosomes

For each generation there are one hundred neurocontrollers which correspond to one hundred chromosomes. Each chromosome is consisted of 16 genes, which are all the weights in a neurocontroller, and as explained, these genes have binary value. The arrangement of weights in the chromosome is shown in Fig.4a, while the corresponding placement of sensors is shown in Fig.4b, where the arrow shows the direction of forward movement of the mobile robot.

Fig.4. (a) The arrangement of weights in chromosome and (b) the corresponding sensors in mobile robot

The Fitness Function

Fitness function of the neurocontroller is designed to appreciate neurocontroller that stay away from obstacles (including walls), while consistently move forward or consistently move backward. Therefore, there are two sub-behaviors that are valued in the fitness function which are the obstacle avoidance sub-behavior and the straight movement sub-behavior. In the obstacle avoidance sub-behavior, whenever any sensor reports a distance of 16 cm or less from an object than the fitness value will be reduced. In the straight movement sub-behavior, whenever the two outputs are all ones (straight backward) or all zeros (straight forward), then the fitness value will be larger but with different sign. Therefore a robot with more consistent forward or backward movement during a generation has better fitness value when accumulated.

Fitness of each individual neurocontroller at each generation is calculated by simulating mobile robot for 1000 steps of movement. In the simulation each step lasts for 0.2 second, and the simulated forward/backward speed of the mobile robot is 3.75 cm per second, therefore at each step the robot can move forward/backward as far as 0.75 cm. Theoretically at each generation the robot can move 7.5 m in a straight trajectory, and if the trajectory is nowhere near any obstacle then the fitness value is one for that generation. The fitness value of one is impossible however, considering the dimension of the arena in the simulation.

The Genetic Algorithm

The simulated evolution lasts for 100 generations, starting with 100 neurocontrollers with random chromosomes. The square obstacles are sparsely placed at the beginning of the evolution and their positions are static throughout the evolution. The starting position of mobile robot in each generation is alternating between four designated positions to ensure adaptive behavior of the neurocontroller. At each generation neurocontrollers are running in the simulated arena in sequence, and at the end of the generation, after all neurocontroller has had their turn, the fitness values of all neurocontrollers are calculated. From one generation to the next generation there are elimination based on the neurocontroller’s fitness values, crossover, and random mutation of the neurocontrollers.

The elimination method used is the truncation method, where some of the neurocontrollers that have lower fitness values are eliminated and replaced with the same number of neurocontrollers with random chromosomes. The elimination rate in the simulation is 20%, which means that 20% lowest fitness neurocontrollers will be replaced during the transition to new generation.

Results and Discussion

After 100 generations of evolution in the simulation, one best neurocontroller from all generations is chosen for behavior evaluation.

Fig.5. Trajectory of the best neurocontroller after evolution when (a) bounded by walls only and (b) by additional five square obstacles inside the arena

Fig. 5 shows the simulator GUI along with the simulated arena at the left side. The GUI supports the initialization, evolution, and testing of the neurocontrollers. In the simulated arena the last position of the mobile robot is pointed by the drawing of the mobile robot. The line behind the robot illustrates the past trajectory of the mobile robot, where the end of the line points to the starting position of the mobile robot. Square obstacles can be placed in any position inside the simulated arena.

In the first evaluation, Fig. 5a, there is no other obstacle but the four walls at each side of the arena. Regardless of its initial position, the mobile robot developed the obstacle avoidance behavior. It can be seen from both evaluations that the mobile robot tends to move straight forward in order to increase its roaming distance. In the second evaluation, Fig. 5b, obstacle avoidance behavior is also shown by the mobile robot when there are square obstacles inside the arena. The mobile robot rotates away from obstacles only when it is close to the obstacles. It is consistent with the thresholding function at the input layer and the fitness function. However it managed to develop moving forward behavior, differentiating the role of the front and rear sensors so that it will not rotate when only the rear sensors are obstructed. In both evaluations the mobile robot never move backward, this is because backward movement reduce the fitness obtained by forward movement.

To analyze more detail into the cause of the mentioned behaviors, we can look into the chromosome of the best neurocontroller in Fig. 6.

Fig. 6. Chromosome of the best neurocontroller

It can be seen that the weights for sensor 4, 5, and 6 are all zero. These three sensors are the three rear sensors as shown in Fig. 4b. The evolution has discovered that the three sensors are actually not important for the obstacle avoidance behavior, and they can actually be removed from the actual mobile robot without disrupting the obstacle avoidance behavior. The explanation is that the mobile robot does not have to move backward to avoid obstacle, it can always rotate away from obstacle.

Further inspection into the connections of the other five sensors shows that all connections to the right motor is zero and all connections to the left motor is one. That means these values of weights will force the mobile robot to rotate left whenever any of the five sensors are obstructed. This behavior completes the previous analysis showing that the robot will always rotate left until no obstacle is obstructing any of the five sensors at the front and sides, and then continue with forward movement to move away from the obstacle.

Behavior of the best neurocontroller in the actual mobile robot and arena is in accordance with the behavior in the simulation with some adjustments. The rotation speeds of the two wheels must be calibrated to assure that the robot’s movement is reasonably straight during forward or backward movement. The movement speed of the robot must be slow enough so that during a distance reading period the robot does not move too far. In the evaluation the actual straight movements speed is 5 cm per second. The sequence of distance readings by the ultrasonic sensors must be controlled to minimize interference between the ultrasonic sensors, while minimizing delay between sensor readings. Furthermore the thresholds of the input neurons must be adjusted differently for different sensor directions. Front and rear sensors have larger threshold values compared to the side sensors in order to compensate the larger displacement of straight movements.

Conclusions

An approach of evolvable neurocontroller algorithm with binary weights is proposed and presented. The neurocontroller is a simple single layer perceptron with eight input neurons corresponds to eights distance sensors and two output neurons corresponds to two dc motor. The fitness function of the genetic algorithm is a simple rule to encourage obstacle avoidance sub-behavior and consistent straight movement sub-behavior.

Although the chromosome is simple and only encoding the weight connections, the simulated evolution of neurocontrollers has produced neurocontroller that posses the obstacle avoidance behavior. The best neurocontroller simply control the mobile robot to rotate left whenever the front and sides sensors detect obstacle. Only after the obstacle is at the back of the mobile robot then it move straight forward away from the obstacle. The evolution also found that three rear distance sensors are not important for the mobile robot to have the obstacle avoidance behavior. The best neurocontroller has been implemented to control an actual mobile robot producing equivalent obstacle avoidance behavior with some adjustments to the mobile robot and neurocontroller.

References

McClelland , J., D. Rumelhart, and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Foundations. Cambridge, MA: MIT Press. 1986.
Holland, J. H. Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor. 1975.
Parisi, D., Cecconi, F., Nolfi, S. Econets: Neural networks that learn in an environment. Network, 1, 149-168. 1990.
Xin Yao, Evolving Artificial Neural Networks, in Proceeding of the IEEE, 87(9), 1423-1447, 1999.
Floreano, & F. Mondada, Evolutionary Neurocontrollers for Autonomous Mobile Robots, Neural Networks - Special issue on neural control and robotics: biology and technology, 11(7-8), 1461-1478, 1998.