A new technique compares the reasoning of a machine learning model with that of a human, so that the user can see patterns in the model's behavior.
In machine learning, understanding why a model makes certain decisions is often as important as knowing if those decisions are correct. For example, a machine learning model could correctly predict that a skin lesion is cancerous, but it could have done so using an unrelated signal in a clinical photo.
While there are tools to help experts make sense of a model's reasoning, often these methods only provide information about one decision at a time, and each one needs to be evaluated manually. Models are commonly trained using millions of data inputs, making it nearly impossible for a human to evaluate enough decisions to identify patterns.
Now, researchers from MIT and IBM Research have created a method that allows a user to aggregate, sort and classify these individual explanations in order to quickly analyze the behavior of a machine learning model. Their technique, called Shared Interest, incorporates quantifiable metrics that compare how well a model's reasoning matches that of a human.
Shared Interest could help a user easily discover concerning trends in a model's decision-making; for example, perhaps the model is often confused by distracting irrelevant features, such as background objects in photos. Adding these insights could help the user quickly and quantitatively determine if a model is reliable and ready to be implemented in a real-world situation.
Human-AI alignment
Shared Interest leverages popular techniques that show how a machine learning model made a specific decision, known as prominence methods. If the model is classifying images, the prominence methods highlight areas of an image that are important to the model when he made his decision. These areas are visualized as a type of heat map, called a prominence map, which is often superimposed on the original image. If the model classified the image as a dog and the dog's head is highlighted, that means that those pixels were important to the model when it decided that the image contained a dog.
At one end of the spectrum, this model made the decision for exactly the same reason that a human did, which suggests that AI seeks to assimilate human thinking into decision-making.