Researchers from MIT and Microsoft have developed a model that identifies instances where autonomous systems have “learned” from training examples that don’t match what’s actually happening in the real world.
The researchers say that this model could be used by engineers to improve the safety of artificial intelligence (AI) systems, such as driverless vehicles and autonomous robots.
According to the researchers, AI systems that power driverless cars are thoroughly trained in virtual simulations to prepare the vehicle for nearly every event on the road. Sometimes, though, the car makes an unexpected error in the real world because an event occurs that should, but doesn’t, change the car’s behavior.
An example of this that the researchers give is an instance where a driverless car wasn’t trained and doesn’t have the sensors necessary to differentiate between distinctly different scenarios, such as large, white cars and ambulances with red, flashing lights on the road. So if the car is traveling down the highway and an ambulance flicks on its sirens, the car may not know to slow down and pull over, because it does not recognize that the ambulance is different from a big white car.
In a pair of papers, the researchers describe a model that uses human input to uncover these training “blind spots.”
Similar to traditional approaches, the researchers put an AI system through simulation training. A human closely monitors the system’s actions as it acts in the real world, and provides feedback when the system made, or was about to make, any mistakes. Following this, the researchers combine the training data with the human feedback data, and use machine-learning techniques to produce a model that pinpoints situations where the system most likely needs more information about how to act correctly.
The researchers turned to video games to validate their method, as a simulated human corrected the learned path of an on-screen character. The next step is to incorporate the model with traditional training and testing approaches for autonomous cars and robots with human feedback.
“The model helps autonomous systems better know what they don’t know,” explains first author Ramya Ramakrishnan, a graduate student in the Computer Science and Artificial Intelligence Laboratory.
“Many times, when these systems are deployed, their trained simulations don’t match the real-world setting [and] they could make mistakes, such as getting into accidents. The idea is to use humans to bridge that gap between simulation and the real world, in a safe way, so we can reduce some of those errors.”
Researchers note that while some traditional training methods provide human feedback during real-world test runs, this is only to update the system’s actions. These training methods don’t identify blind spots, which could be useful for safer execution in the real world.
To start, the researchers’ approach puts an AI system through simulation training, where it will in essence produce a “policy” that basically maps every situation to the best action it can take in the simulations. Following that, the system will be deployed in the real-world, where humans provide error signals in regions where the system’s actions are unacceptable.
Humans can provide data in a variety of ways, including “demonstrations” and “corrections.” In demonstrations, the human acts in the real world, while the system observes and compares the human’s actions to what it would have done in that situation. So for driverless cars, a human would manually control the car while the system produces a signal if its planned behavior deviates from the human’s behavior. Matches and mismatches with the human’s actions provide noisy indications of where the system might be acting acceptably or unacceptably.
With the other option of a human providing corrections, the human monitors the system as it acts in the real world. A human could sit in the driver’s seat while the autonomous car drives itself along its planned route. If the car’s actions are correct, then the human doesn’t have to do anything, but if the car’s actions are incorrect, the human can take the wheel, which sends a signal that the system was not acting unacceptably in that specific situation.
Once the feedback data from the human is collected, the system has, in essence, a list of situations and, for each situation, multiple labels saying its actions were acceptable or unacceptable. One situation can receive a variety of signals, because the system perceives many situations as identical.
An example of this is when an autonomous car may have cruised alongside a large car many times without slowing down and pulling over. But, in only one instance, an ambulance, which appears exactly the same to the system, cruises by. The autonomous car doesn’t pull over and receives a feedback signal that the system took an unacceptable action.
“At that point, the system has been given multiple contradictory signals from a human: some with a large car beside it, and it was doing fine, and one where there was an ambulance in the same exact location, but that wasn’t fine. The system makes a little note that it did something wrong, but it doesn’t know why,” Ramakrishnan explains.
“Because the agent is getting all these contradictory signals, the next step is compiling the information to ask, ‘How likely am I to make a mistake in this situation where I received these mixed signals?’”
Researchers say that the goal is to have these ambiguous situations labeled as blind spots, but they note that that goes beyond simply tallying the acceptable and unacceptable actions for each situation. If the system performed correct actions nine times out of 10 in the ambulance situation, for instance, a simple majority vote would label that situation as safe.
“But because unacceptable actions are far rarer than acceptable actions, the system will eventually learn to predict all situations as safe, which can be extremely dangerous,” Ramakrishnan says.
With this in mind, the researchers used a machine-learning method called the Dawid-Skene algorithm, which is regularly used for crowdsourcing to handle label noise. The algorithm takes as input a list of situations, each having a set of noisy “acceptable” and “unacceptable” labels, and it then aggregates all the data and uses some probability calculations to identify patterns in the labels of predicted blind spots and patterns for predicted safe situations.
This information allows the algorithm to output a single aggregated “safe” or “blind spot” label for each situation along with a its confidence level in that label.
Researchers not that the algorithm can learn in a situation where it may have, for instance, performed acceptably 90 percent of the time, but the situation is still ambiguous enough to merit a “blind spot.”
In the end, the algorithm produces a type of “heat map,” where each situation from the system’s original training is assigned low-to-high probability of being a blind spot for the system.
“This research puts a nice twist on discovering when there is a mismatch between a simulator and the real world, driving the discovery directly from expert feedback on the agent’s behavior,” says Eric Eaton, a professor of computer and information science, with a focus on robotics, at the University of Pennsylvania.
Eaton adds that the research “has nice potential to permit a robot to predict when it might take an incorrect action in novel situations, deferring instead to an expert (human) operator. The next challenge will be taking these discovered blind spots and using them to improve the robot’s internal representation to better match the real world.”