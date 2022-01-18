



Posted by Oran Lang and Inbar Mosseri, Software Engineer, Google Research

Neural networks can perform certain tasks very well, but understanding how they reach decisions, for example, which signal in an image is in the model, not another. Determining what to do is often a mystery. Explaining the neural model determination process can have significant social impact in certain areas where human surveillance is important, such as medical image analysis and autonomous driving. These insights can help guide healthcare providers, uncover model bias, provide support to downstream decision makers, and even support scientific discoveries.

Previous approaches to visual explanation of classifiers, such as attention maps (such as Grad-CAM), emphasized which areas in the image affect the classification, but which attributes in those areas classify. It does not explain whether to determine the result. Their color? What is their shape? Another family of methods provides explanations by smoothly converting images between one class and another (such as GANalyze). However, these methods tend to change all attributes at once, making it difficult to separate the individual attributes that affect them.

Presented at ICCV 2021, “Style Description: GAN Training to Explain Classifiers in StyleSpace” proposes a new approach to visually explaining classifiers. Our approach, StylEx, automatically detects and visualizes untangled attributes that affect classifiers. By manipulating these attributes individually, you can examine the effect of each attribute (changing one attribute does not affect the other). StylEx can be applied to a wide range of domains such as animal, leaf, face and retinal images. Our results show that StylEx matches well with semantic attributes, produces meaningful image-specific explanations, and finds attributes that people can interpret as measured by user surveys.

Cat and Dog Classifier Description: StylEx provides the top K detected untangled attributes that describe the classification. As you move each knob, only the corresponding attributes in the image are manipulated, and the other attributes of the subject remain fixed.

For example, to understand the cat and dog classifiers for a particular image, StylEx automatically detects the unraveled attributes and visualizes how the manipulation of each attribute affects the classifier’s probability. Can be converted. The user can then view these attributes and make semantic interpretations of what they represent. For example, in the figure above, “dogs are more likely to open their mouths than cats” (attribute # 4 in the GIF above), “cat’s pupils resemble slits” (attribute # 5), and “cats”. The ears are hard to break “(attribute # 1).

The video below provides a brief explanation of how to do it:

How StylEx works: StyleGAN training to explain the classifier Specify the classifier and the input image to find and visualize the individual attributes that affect the classification. To do this, we utilize the StyleGAN2 architecture, which is known to produce high quality images. Our method consists of two phases.

Phase 1: StylEx training

According to a recent study, StyleGAN2 contains an unraveled latent space called “StyleSpace” that contains the semantically meaningful individual attributes of the images in the training dataset. However, StyleGAN training is classifier-independent and may not represent the attributes that are important to the determination of the particular classifier described. Therefore, we train a generator like StyleGAN to satisfy the classifier and recommend that StyleSpace to accommodate the classifier-specific attributes.

This is achieved by training the StyleGAN generator with two additional components. The first is the encoder, which is trained with reconstruction loss along with GAN. This will make the generated output image visually similar to the input. This allows you to apply the generator to any input image. However, the visual similarity of the images is not sufficient. This is because it may not always be possible to capture subtle visual details (such as medical pathology) that are important to a particular classifier. To ensure this, add classification loss to StyleGAN training. This makes the probability of the generated image classifier the same as the probability of the input image classifier. This ensures that the generated image contains subtle visual details that are important to the classifier (such as medical pathology).

Training StyleEx: Jointly train generators and encoders. Reconstruction loss is applied between the generated image and the original image to maintain visual similarity. A classification loss is applied between the classifier output of the generated image and the classifier output of the original image, ensuring that the generator captures the subtle visual details that are important to the classification.

Phase 2: Extraction of unraveled attributes

When training is complete, search the trained generator’s StyleSpace for attributes that have a significant impact on the classifier. To do this, manipulate each StyleSpace coordinate and measure its impact on classification probabilities. Find the top attributes that maximize the change in the classification probability of a given image. This provides the top K image-specific attributes. By repeating this process for a large number of images per class, you can discover more class-specific attributes of the top K. This tells you what the classifier has learned about a particular class. The end-to-end system is called “StylEx”.

Visual illustration of image-specific attribute extraction: Once training is complete, search for StyleSpace coordinates that most affect the classification probability of a particular image.

StylEx can be applied to various domains and classifiers. Our methods work with different domains and classifiers (binary and multiclass). The following is an example of a class-specific description. In all domains tested, the top attributes found by our method correspond to consistent semantic concepts when interpreted by humans, as verified by human assessment.

For the recognized gender and age classifiers, the top four attributes detected for each classifier are listed below. Our method exemplifies each attribute of multiple images that are automatically selected to best demonstrate that attribute. For each attribute, it flickers between the source image and the attribute manipulation image. How much the attribute manipulation affects the classifier probability is shown in the upper left corner of each image.

The top four automatically detected attributes of the recognized gender classifier. The top four automatically detected attributes of the perceptual age classifier.

Note that our method describes a classifier, not a reality. In short, this method is designed to reveal the image attributes that a particular classifier has learned to utilize from the data. These attributes do not necessarily characterize the actual physical differences between class labels (eg young or old age) in practice. In particular, these detected attributes can reveal classifier training or dataset bias. This is another important advantage of this method. In addition, you can use it to improve the fairness of your neural network, for example, by adding an example to your training dataset to correct for the biases revealed in this way.

Adding classifier loss to StyleGAN training turns out to be important in domains where classification depends on detail. For example, a GAN trained on a retinal image without losing a classifier does not always produce detailed pathological details for a particular disease. Adding a classification loss, GAN produces these subtle medical conditions as a description of the classifier. It is illustrated below for a retinal image classifier (DME disease) and a diseased / healthy leaf classifier. StylEx can find attributes that are consistent with disease indicators such as the well-known marker of retinal DME, “hard exudate” and leaf rot.

The top four automatically detected attributes of the DME classifier for retinal images. The top four automatically detected classifier attributes for diseased / healthy leaf images.

Finally, this method can also be applied to multiclass problems, as shown by the 200-way bird classifier.

The top four are CUB-2011 trained 200-way classifiers that automatically detect attributes for (a) class “Brewer’s blackbird” and (b) class “Kibara mejiro fly”. In fact, StylEx finds the attributes that correspond to the attributes of the CUB classification.

Broader impact and the next step as a whole, we have introduced a new technique that can generate meaningful explanations for a particular classifier in a particular image or class. We believe our approach is a promising step towards detecting and mitigating previously unknown biases in classifiers and datasets, in line with Google’s AI principles. In addition, focusing on multi-attribute-based explanations is key to providing new insights into the previously opaque classification process and supporting the process of scientific discovery. Finally, the GitHub repository contains the GAN Colab and model weights used in this paper.

Acknowledgments The research described in this post was conducted by Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald (as an intern), Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani, and Inbar Mosseri. I was. Thanks to Jenny Huang and Marilyn Zhang for leading the writing process for this blog post. We would also like to thank Reena Jana, Paul Nicholas, and Johnny Soraker for their research papers and ethical reviews of this post.

