{"id":11970,"date":"2023-05-08T08:49:41","date_gmt":"2023-05-08T12:49:41","guid":{"rendered":"https:\/\/blogs.mathworks.com\/deep-learning\/?p=11970"},"modified":"2024-05-29T12:09:12","modified_gmt":"2024-05-29T16:09:12","slug":"explainable-ai-xai-are-we-there-yet","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/deep-learning\/2023\/05\/08\/explainable-ai-xai-are-we-there-yet\/","title":{"rendered":"Explainable AI (XAI): Are we there yet?"},"content":{"rendered":"<h6><\/h6>\r\n<em>This post is from <a href=\"https:\/\/www.linkedin.com\/in\/ogemarques\/\">Oge Marques<\/a>, PhD and Professor of Engineering and Computer Science at FAU.<\/em>\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<blockquote>This is the second post in a 3-post series on <strong>Explainable AI<\/strong> (XAI). The <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2022\/12\/30\/what-is-explainable-ai\/\">first post<\/a> highlighted examples and offered practical advice on how and when to use XAI techniques for computer vision tasks. In this post, we will expand the discussion by offering advice on some of the XAI limitations and challenges that you might find along the path of XAI adoption.<\/blockquote>\r\n<h6><\/h6>\r\nAt this point, you might have heard about Explainable AI and want to explore XAI techniques within your work. But along the way you might also have wondered : Just how well do XAI techniques work? And what should you do if you find the results of XAI aren\u2019t as compelling as you\u2019d like? Below, we explore some of the questions surrounding XAI with a healthy dose of skepticism.\r\n<h6><\/h6>\r\n&nbsp;\r\n<p style=\"font-size: 18px;\"><strong>Are expectations too high?<\/strong><\/p>\r\nXAI \u2013 like many subfields of AI \u2013 can be subject to some of the hype, fallacies, and misplaced hopes associated with AI at large, including the potential for unrealistic expectations.\r\n<h6><\/h6>\r\nTerms such as <em>explainable<\/em> and <em>explanation<\/em> risk falling under the fallacy of matching AI developments with human abilities (the 2021 paper <a href=\"https:\/\/arxiv.org\/abs\/2104.12871\">Why AI is Harder Thank We Think<\/a>\u00a0brilliantly describes four similar fallacies in AI assumptions). This fallacy leads to the unrealistic expectation that we are approaching a point in time where AI solutions will not only achieve great feats and eventually surpass human intelligence, but \u2013 on top of that \u2013 they will be able to explain <em>how<\/em> and <em>why<\/em> AI did what it did; therefore, increasing the level of trust in AI decisions. Once we become aware of this fallacy, it is legitimate to ask ourselves: <em>How much can we <strong>realistically<\/strong> expect from XAI?<\/em>\r\n<h6><\/h6>\r\nIn this post we will discuss whether our expectations for XAI (and the value XAI techniques might add to our work) are too high and focus on answering three main questions:\r\n<h6><\/h6>\r\n<ol>\r\n \t<li><a href=\"#section1\">Can XAI become a proxy for trust(worthiness)?<\/a><\/li>\r\n \t<li><a href=\"#section2\">What limitations of XAI techniques should we be aware of?<\/a><\/li>\r\n \t<li><a href=\"#section3\">Can consistency across XAI techniques improve user experience?<\/a><\/li>\r\n<\/ol>\r\n<h6><\/h6>\r\nLet\u2019s look at these three specific aspects of XAI-related expectations in more detail.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 18px;\"><a name=\"section1\"><\/a><strong>1. Can XAI become a proxy for trust(worthiness)?<\/strong><\/p>\r\nIt is natural to associate an AI solution\u2019s \u201cability to explain itself\u201d to the degree of trust humans place in that solution. However, should explainability translate to trustworthiness?\r\n<h6><\/h6>\r\nXAI, despite its usefulness, falls short of ensuring trust in AI for at least these three reasons:\r\n<h6><\/h6>\r\n<span style=\"color: #c04c0b;\"><strong>A.\u00a0 <\/strong> <\/span>The premise that AI algorithms must be able to explain their decisions to humans can be problematic. Humans cannot always explain their decision-making process, and even when they do, their explanations might not be reliable or consistent.\r\n<h6><\/h6>\r\n<span style=\"color: #c04c0b;\"><strong>B.\u00a0 <\/strong> <\/span> Having complete trust in AI involves trusting not only the model, but also the training data, the team that created the AI model, and the entire software ecosystem. To ensure parity between models and outcomes, best practices and conventions for data standards, workflow standards, and technical subject matter expertise must be followed.\r\n<h6><\/h6>\r\n<span style=\"color: #c04c0b;\"><strong>C.\u00a0 <\/strong> <\/span> Trust is something that develops over time, but in the age where early adopters want models as fast as they can be produced, models are sometimes introduced to the public early with the expectation they will be continuously updated and improved upon in later releases \u2013 which might not happen. In many applications, adopting AI models that stand the test of time by showing consistently reliable performance and correct results over time might be good enough, even without any XAI capabilities.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 18px;\"><a name=\"section2\"><\/a><strong>2. What limitations of XAI techniques should we be aware of?<\/strong><\/p>\r\nIn the fields of image analysis and computer vision, a common interface for showing the results of XAI techniques consists of overlaying the \u201cexplanation\u201d (usually in the form of a\u00a0heatmap\u00a0or\u00a0saliency map) on top of the image. This can be helpful in determining which areas of the image the model deemed to be most relevant in its decision-making process. It can also assist in diagnosing potential blunders that the deep learning model might be making, which produce results that are seemingly correct but in reality the model was looking in wrong place. A classic example is the <a href=\"https:\/\/arxiv.org\/abs\/1602.04938\"><em>husky vs. wolf<\/em>\u00a0image classification algorithm<\/a>\u00a0which in fact was a snow detector (Figure 1).\r\n\r\n&nbsp;\r\n<h6><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-11973 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2023\/04\/husky_vs_wolf.png\" alt=\"Husky vs wolf classifier is a snow detector\" width=\"507\" height=\"277\" \/><\/h6>\r\n<strong>Figure 1<\/strong>: Example of misclassification in a \u201chusky vs. wolf\u201d image classifier due to a spurious correlation between images of wolves and the presence of snow. \u00a0The image on the right, which shows the result of the LIME post-hoc XAI technique , captures the classifier blunder. Source: <a href=\"https:\/\/arxiv.org\/pdf\/1602.04938.pdf\">https:\/\/arxiv.org\/pdf\/1602.04938.pdf<\/a>\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\nThe potential usefulness of post-hoc XAI results has led to a growing adoption of techniques, such as <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ref\/imagelime.html\">imageLIME<\/a>, <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ref\/occlusionsensitivity.html\">occlusion sensitivity<\/a>, <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ref\/gradcam.html\">gradCAM<\/a> for computer vision tasks. However, these XAI techniques sometimes fall short of delivering the desired explanation due to some well-known limitations. Below are 3 examples:\r\n<h6><\/h6>\r\n<p id=\"p0\">\u2003\u2003<span style=\"color: #c04c0b;\"><strong>A.\u00a0 <\/strong> <\/span> XAI techniques can sometimes use similar explanations for correct and incorrect decisions.<\/p>\r\nFor example, in the <a href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2022\/12\/30\/what-is-explainable-ai\/\">previous post in this series<\/a>, we showed that the heatmaps produced by the gradCAM function provided similarly convincing explanations (in this case, focus on the head area of the dog) both when their prediction is correct (Figure 2, top) as well as incorrect (Figure 2, bottom: a Labrador retriever was mistakenly identified as a beagle).\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<h6><img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-12006 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2023\/04\/beagle_gradcam.png\" alt=\"Classify dog breed and visualize heatmap of classification\" width=\"684\" height=\"579\" \/><\/h6>\r\n<h6><\/h6>\r\n<strong>Figure 2<\/strong>: Example of results using gradCAM for the dog breed image classification task.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p id=\"p0\">\u2003\u2003<span style=\"color: #c04c0b;\"><strong>B.\u00a0 <\/strong> <\/span> Even in cases where XAI techniques show that a model is not looking in the right place, that doesn\u2019t necessarily mean that it is easy to know how to fix the underlying problem.<\/p>\r\nIn some easier cases, such as in the husky vs. wolf classification mentioned earlier, a quick visual inspection of model errors could have helped identify the spurious correlation between \u201cpresence of snow\u201d and \u201cimages of wolves.\u201d There is no guarantee that the same process would work for other (larger or more complex) tasks.\r\n<h6><\/h6>\r\n<p id=\"p0\">\u2003\u2003<span style=\"color: #c04c0b;\"><strong>C.\u00a0 <\/strong> <\/span> Results are model- and task-dependent.<\/p>\r\nIn our previous post in this\u00a0 series, we also showed that the heatmaps produced by the gradCAM function in MATLAB provided different visual explanations associated with the same image (Figure 3) depending on the pretrained model and task. Figure 4 shows those two examples and adds a <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ug\/gradcam-explains-why.html\">third example<\/a>, in which the same network (GoogLeNet) was used without modification. A quick visual inspection of Figure 4 is enough to spot significant differences among the three heatmaps.\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-11979 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2023\/04\/labrador.png\" alt=\"Labrador dog\" width=\"174\" height=\"174\" \/>\r\n<h6><\/h6>\r\n<strong>Figure 3<\/strong>: Test image for different image classification tasks and models (shown in Figure 4).\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-12015 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2023\/04\/labrador_gradcam.png\" alt=\"Classification with GoogleNet and visualization of result with heatmap\" width=\"981\" height=\"741\" \/>\r\n<h6><\/h6>\r\n<strong>Figure 4<\/strong>: Using gradCAM on the same test image (Figure 3), but for different image classification tasks and models.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 18px;\"><a name=\"section3\"><\/a><strong>3. \u00a0Can consistency across XAI techniques improve user experience?<\/strong><\/p>\r\nIn order to understand the\u00a0<em>why<\/em>\u00a0and\u00a0<em>how<\/em>\u00a0behind an AI model\u2019s decisions and get a better insight into its successes and failures, an explainable model should be capable of explaining itself to a human user through some type of\u00a0explanation interface\u00a0(Figure 5). Ideally this interface should be rich, interactive, intuitive, and appropriate for the user and task.\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-11985 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2023\/04\/XAI_UI.jpg\" alt=\"User interface for explainable AI\" width=\"624\" height=\"184\" \/>\r\n<h6><\/h6>\r\n<strong>Figure 5<\/strong>: Usability (UI\/UX) aspects of XAI: an explainable model requires a suitable interface.\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\nIn visual AI tasks, comparing the results produced by two or more methods<a href=\"#_ftn1\" name=\"_ftnref1\"><\/a> can be problematic, since the most commonly used post-hoc XAI methods use significantly different explanation interfaces (that is, visualization schemes) in their implementation (Figure 6):\r\n<h6><\/h6>\r\n<ul>\r\n \t<li>CAM (Class Activation Maps), including Grad-CAM, and occlusion sensitivity use a heatmap to correlate hot colors with salient\/relevant portions of the image.<\/li>\r\n \t<li>LIME (Local Interpretable Model-Agnostic Explanations) generates superpixels, which are typicallly shown as highlighted pixels outlined in different pseudo-colors.<\/li>\r\n \t<li>SHAP (SHapley Additive exPlanations) values are used to divide pixels among those that increase or decrease the probability of a class being predicted.<\/li>\r\n<\/ul>\r\n<h6><\/h6>\r\nContinuing to improve the consistency of XAI solutions and interfaces will help in increasing XAI adoption and usefulness.\r\n<h6><\/h6>\r\n<img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-11988 size-full\" src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2023\/04\/XAI_methods.jpg\" alt=\"XAI methods, including Grad-CAM, LIME, and SHAP\" width=\"957\" height=\"1090\" \/>\r\n<h6><\/h6>\r\n<strong>Figure 6<\/strong>: Three different XAI methods (Grad-CAM, LIME, and SHAP) for four different images. Notice how the structure (including border definition), choice of colors (and their meaning), ranges of values and, consequently, meaning of highlighted areas vary significantly between XAI methods. [Source: <a href=\"https:\/\/arxiv.org\/abs\/2006.11371\">https:\/\/arxiv.org\/abs\/2006.11371<\/a>]\r\n<h6><\/h6>\r\n&nbsp;\r\n<h6><\/h6>\r\n<p style=\"font-size: 18px;\"><strong>Takeaway<\/strong><\/p>\r\nIn this blog post, we offered words of caution and discussed some limitations of existing XAI methods. Despite the downsides, there are still many reasons to be optimistic about the potential of XAI, as we will share the next (and final) post in this series.","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/deep-learning\/files\/2023\/04\/beagle_gradcam.png\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>\r\nThis post is from Oge Marques, PhD and Professor of Engineering and Computer Science at FAU.\r\n\r\n&nbsp;\r\n\r\nThis is the second post in a 3-post series on Explainable AI (XAI). The first post... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/deep-learning\/2023\/05\/08\/explainable-ai-xai-are-we-there-yet\/\">read more >><\/a><\/p>","protected":false},"author":194,"featured_media":12006,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[54,9,66],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/11970"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/users\/194"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/comments?post=11970"}],"version-history":[{"count":73,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/11970\/revisions"}],"predecessor-version":[{"id":12226,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/posts\/11970\/revisions\/12226"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media\/12006"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/media?parent=11970"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/categories?post=11970"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/deep-learning\/wp-json\/wp\/v2\/tags?post=11970"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}