Self-Driving Cars Project Part 3

3 min readDec 14, 2020

Welcome back to the final post of this three-part Self-Driving Cars Project. We will discuss the code that goes into a simple policy and Q-learning, as well as the real-world applications.

First, we chose to create a simple policy so that the simulated self-driving car could make its own actions. To make this policy, we inputted the observation (obs), environment (env), and time stamp (ts). We then use an array made up of three possible factors that could contribute to a given action — direction, gas, and break. Using the three inputted actions, the simulated car is able to distinguish between road (gray) and not road (brown), determine if there is a vehicle in front of the car, and check to see where the road is.

We were prompted to create a policy that would alternate between going forward and turning left. Below, please view the code, the line-by-line explanations, and a video of what the simulation looked liked.

Line 1: The observation, environment, and timestamps are inputted.

Lines 3 and 4: If the timestamp is even, then the direction is at 0 (straight), the gas is 1 (on), and the break is 0 (off).

Lines 5 and 6: If the timestamp is odd, then the direction is at -1 (left), the gas is 0 (off), and the break is 0 (off).

Second, a Q-learning function is defined as a function that determines the quality for the best total reward after making a certain action in a specific state. The input of this function is Q(s,a), the quality of taking action a at state s.

In most cases, we want to choose the action a to which Q(s,a) has the highest score. However, the overarching purpose of this, regardless of the scenario, is that, at state s, we must choose the most optimal action a.

Throughout this project, we came across issues with Q-learning such as unique situations that have never been experienced by the car/simulation. Because of this, we used Deep-Q-learning. We made a “q-approximator,” (in this case a neural network) which attempted to predict the q-values for each action. We were able to execute this idea because, by generating data, the simulation trained itself. While this simulation began with random choices, as the timestamp increased, the accuracy of the decisions increased significantly. Check out Post #2 to find more details and the results of this!

Lastly, before ending any ML (machine learning) and AI (artificial intelligence) project, it is important to discuss real-life applications and next steps. While it may seem obvious, the real-life applications of this project are not just limited to self-driving cars. In fact, automated robots use very similar ML techniques such as the reinforcement learning loop to process unique situations. Some AI light control systems also use similar processes to formulate data on common car-driving paths and predict when to change light signals.

The next steps of this project include perfecting all three Deep-Q-learning policies and allowing these policies to adapt to unique vehicles, improve accuracy for unique situations, and make ethical and moral decisions in real time.

Thanks for reading this three-part series where we discussed the reinforcement loop, the Open AI simulations, behavioral cloning, and different types of ML policies for this Self-Driving Cars Project!

Colin Chu is a Student Ambassador in the Inspirit AI Student Ambassadors Program. Inspirit AI is a pre-collegiate enrichment program that exposes students globally to AI through live online classes. Learn more at https://www.inspiritai.com/.

Written by Colin J. Chu