Self-Driving Cars Project Part 2
Welcome back to the second blog post in this three-part series. In this post, we will cover behavioral cloning and Q-learning.
Behavioral Cloning
To understand behavioral cloning, we must first define what neural networks are. Neural networks are a group of algorithms used to categorize and analyze certain data sets. They were built to “mimic” the human brain and are composed of multiple steps as defined and illustrated below:
- Inputs: the several algorithms that make up these neural networks are fed data
2. Weights: weights are used to “prioritize” any inputs over others
3. Weighted sum: the sum of each of the new inputs
4. Bias: the bias serves to adjust the algorithm
5. Activation function: there are multiple types of activation functions (i.e. ReLU) that are used to moderate the output. These activation functions are nonlinear, meaning that they have a changing slope
Now that we have a good idea of what a neural network is, the definition of behavioral cloning is very simple. Behavioral cloning is the concept used by neural networks to learn from different situations.
In this situation, we use behavioral cloning by inputting ~ 4,000 different actions into a neural network. Accompanying each of these actions is a data set of what the environment looks like. Therefore, the neural network uses this to output an action that the network chooses. Our results looked like this:
While the car was on the road for most of the simulation, to maximize the safety and effectiveness of our neural networks, we created three different Q-learning policies which we will now explain.
Q-Learning Policies
Q-learning policies refer to functions that can be made to determine the quality of the best possible “reward” given a certain action and state. For this simulation, we made 3 distinct Q-leaning policies and tested them.
Epsilon-Greedy Policy — This first Q-learning policy chose a random action every ε (epsilon) intervals. Although this policy allows the simulation to explore more possible actions and rewards it also led the car to drive off the road at various times and was not steady. Here’s a video of what happened:
Road Safety Policy — The sole objective of this second Q-learning policy was to maximize the safety of the people in the simulated vehicle by severely penalizing the neural network’s actions whenever the car would drive off the road. With the implementation of this policy, although the car did not drive off the road, it was very slow:
More-Movement Policy — This third and final Q-learning policy allowed us to penalize the model when it did not accelerate. Therefore, with this policy in place, we were able to get the best results, with a faster, more realistic driving simulation. The vehicle stays on the road for all but one situation:
In conclusion, while each of these policies had their pros and cons, we determined that the More-Movement Policy had the most benefits overall. We created a chart to show this:
In the next, and last, post, we hope to go into further detail on the nitty-gritty aspects of a policy and what Q-learning is, as well as the real-world applications and next steps of this project.
Colin Chu is a Student Ambassador in the Inspirit AI Student Ambassadors Program. Inspirit AI is a pre-collegiate enrichment program that exposes students globally to AI through live online classes. Learn more at https://www.inspiritai.com/