Explanatory Capabilities of Large Language Models in Prescriptive Process Monitoring

MarlonDumas 193 views 27 slides Sep 03, 2024
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

Research paper presentation at the 2024 International Conference on Business Process Management (BPM).
Prescriptive process monitoring (PrPM) systems analyze ongoing business process instances to recommend real-time interventions that optimize performance. The usefulness of these systems hinges on ...


Slide Content

Explanatory Capabilities of Large Language Models in Prescriptive Process Monitoring Kateryna Kubrak , Lana Botchorishvili , Fredrik Milani, Alexander Nolte, Marlon Dumas University of Tartu, Narva mnt 18, 51009 Tartu, Estonia 22nd Business Process Management Conference (BPM 2024) 1

Process mining helps to get more insight into the process and identify improvements. Descriptive process monitoring tells you why something has happened . Register application Analyze application Notify customer Cancel application 990 965 458 499 219 196 363 502 954 923 402 350 2 Introduction Approve application 387

Prescriptive process monitoring (PrPM) tells you what to do to avoid/mitigate an undesired outcome . Register application Analyze application Approve application Notify customer Cancel application 990 965 458 499 219 196 363 502 954 923 387 402 350 ! Analysis will take too long, the customer will be notified later than promised. Assign Specialist 1 to task Analyze application . But why would I do that? Explanation 3 Introduction

Previous work has highlighted the challenges of providing understandable and compelling explanations to business users in the context of PrPM systems 1 1 Padella, A., de Leoni, M., Dogan, O., Galanti, R.: Explainable process prescriptive analytics. In: ICPM. pp. 16–23. IEEE (2022) 2 Cambria, E., Malandri, L., Mercorio, F., Mezzanzanica, M., Nobani, N.: A survey on XAI and natural language explanations. Inf. Process. Manag. 60(1), 103111 (2023) 3 Feldhus, N., Ravichandran, A.M., Möller, S.: Mediators: Conversational agents explaining NLP model behavior. CoRR abs/2206.06029 (2022) 4 Introduction The understandability of provided explanations can be enhanced through dialogue- based systems by allowing users to ask questions from different angles 2 Large Language Models (LLMs) are an emerging technology that could facilitate such a dialogue between a system and a user 3 Fixed form explanations are not as user-centric since they consist of information that is suitable for a specific type of user 2

Problem Statement: Enhancing the understandability of explanations for PrPM recommendations Research Objective: To design and evaluate an approach for LLM-based explanations of recommendations generated by PrPM techniques 5 Introduction

Method 6

Method 7

Explainable AI Question Bank (XAIQB) * * Liao, Q.V., Gruen, D.M., Miller, S.: Questioning the AI: informing design practices for explainable AI user experiences. In: CHI. pp. 1–15. ACM (2020) 8 Objectives Definition Full XAIQB Input Output Performance How Why Why not What if How to be that How to still be this Others What kind of data does the system learn from? What kind of output does the system give? How accurate/ precise/reliable are the predictions? What is the system’s overall logic? Why/how is this instance given this prediction? Why/how is this instance not predicted? What would the system predict for a different instance? How should this instance change to get a different prediction? What is the scope change permitted to still get the same prediction?

Objectives Definition 9 Category Questions Ways to explain Prototypical output Data What is the size of the event log? Number of cases in the event log Number of cases in the training and testing datasets The event log consists of {number} of cases. Performance Why should I believe that the predictions are correct? Provide performance metrics for the models (accuracy, precision, recall) The accuracy of recommendations is on average {number}. How How does the system make predictions? Describe the differences between the techniques The tool uses provides three different recommendation types: next best activity, alarm and intervention. […] The intervention is produced using Uplift Modeling package CasualLift to get the CATE and probability of outcome… Output What do the different recommendation types mean? Describe the differences between the algorithms (the techniques they use and how the recommendations differ) An alarm is a type of a recommendation that does not specify an exact action to perform in the given moment, but rather notifies that you should pay attention to the case. * Full mapping is available in supplementary material

Method 10

General contextual information General conversational rules Context Data description Task S imple language (analysts with little ML experience) S pecific answers, no elaboration N o more than two paragraphs of text Seo, W., Yang, C., & Kim, Y. H. (2023). ChaCha: Leveraging Large Language Models to Prompt Children to Share Their Emotions about Personal Events. arXiv preprint arXiv:2309.12244 . Bellan, P., Dragoni, M., & Ghidini, C. (2022, September). Extracting business process entities and relations from text using pre-trained language models and in-context learning. In International Conference on Enterprise Design, Operations, and Computing (pp. 182-199). Cham: Springer International Publishing. Jessen, U., Sroka, M., & Fahland, D. (2023). Chit-Chat or Deep Talk: Prompt Engineering for Process Mining. arXiv preprint arXiv:2307.09909. Ask ChatGPT: what else? Examples Design & Development: Prompting Method 11

Design & Development: Prompting Method 12 * Full prompting method is available in supplementary material Component Text (excerpt) Context [ PrPM tool] uses three algorithms to generate prescriptions for business processes... The [ PrPM tool] workflow involves: Uploading an event log. Defining column types. Setting parameters... The key parameters are: Case Completion: An activity that marks the end of a case, e.g., ’Application completed’... Data description - Description of MongoDB files collection - Description of cases collection General conversational rules When answering, use simple language for the explanations. Do not mention the database or show raw data in your responses. ... E x amples QUESTION: What is the size of the event log? ANSWER: The event log consists of < nr_of_cases > of cases. QUERY: [query example] STEPS: Run the query with function query_db to find the number of cases in this event log. Task Your role is to answer questions about [ PrPM tool] recommendations and query the database for specific case or event log information.

Design & Development: LLM-based chat 13

Design & Development: LLM-based chat 14

Method 15

Evaluation: Setting 12 process analysts Claims management event log Tasked to review a case and recommendations and interact with the chat Post-interview survey Interview participants 16 Goals: (1) Asess users’ perception of presented explanations, (2) Assess users’ interaction with the chat.

Evaluation: Results Questions asked 17 For participants’ questions, coding scheme based on XAIQB: 55% in category "Output" What do specific terms mean? (e.g. "CATE score", "intervention") 18% in category "Why" Why should I prefer one recommendation over another? 12% in category "Others" What documents were supplied in the claim? Every other category 7% or less Coding scheme is available in supplementary material

Evaluation: Results Characteristics to code the answers (explanations) : Coherency - W hether the explanation is internally coherent (how well the parts of it fit together) Hoffman, R.R., Mueller, S.T., Klein, G., Litman , J.: Measures for explainable AI: explanation goodness, user satisfaction, mental models, curiosity, trust, and human-ai performance. Frontiers Comput . Sci. 5 (2023) Nauta , M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., Schlötterer , J., van Keulen, M., Seifert, C.: From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable AI. ACM Comp. Surv . 55(13s), 295:1–295:42 (2023) Zemla , J.C., Sloman , S., Bechlivanidis , C., Lagnado , D.A.: Evaluating everyday explanations. Psychonomic bulletin & review 24, 1488–1500 (2017) Chat's answers 18 Relevancy to the question - Whether the explanation answers the question Completeness - Whether there are gaps in the explanation Correctness - Whether the data in the explanation is correct Compactness - Whether the explanation is repetitive or redundant

Evaluation: Results Chat's answers 19 Coded characteristics results

Evaluation: Results Correctness: The chat sometimes did not query the database but provided a confident answer (i.e. hallucinated) Chat's answers 20 Sometimes the chat got certain data from the database, but incorrectly matched it with a question (e.g., "90.14%" was the accuracy but got reported as probability) Compactness : In some interviews, most explanations were compact, while in others, most were not compact

Evaluation: Results Different approaches to starting the conversation: Interaction 21 Study the case and recommendations, formulate a clarifying question Ask a general question about case performance Ask what is the issue with the case Ask the chat how it can help them

Evaluation: Results Survey Survey results 22

Implications for Research Focusing on bringing more causal aspects into PrPM techniques For the recommendation to amend the claim settlement, the participants wanted to know what exactly to amend 23 * Fahland , D., Fournier, F., Limonad , L., Skarbovsky , I., Swevels , A.J.E.: How well can large language models explain business processes? CoRR abs/2401.12846 (2024) Several participants asked about the potential impact of a recommendation on the case outcome in terms of temporal or monetary value Several participants asked questions about why an action was recommended Such questions could also be addressed by incorporating a causal component into the setup*

Implications for Research Further research on explanations in PrPM context and ways to provide them An experiment could be designed to compare the understandability of LLM explanations with established methods Future researchers can use the guidance of the asked questions to cater to end-user needs 24 Expanding the prototype to encompass process-level insights Functionalities for viewing aggregated data (e.g., total active recommendations, number of recommended cases) to provide a broader process perspective for analysts

Implications for Research Improving correctness of explanations E.g. adding a verification layer Experiment comparing the performance of different LLM models for the correctness of the responses 25

Implications for Practice Add template questions to the chat, thus providing guidance 26 Adjusting the prompt to better respond to the questions from the most asked categories Another approach is to fine-tune the model Adding the capability for the chat to answer general case performance questions This requires either ensuring that the LLM would be able to calculate the e.g. cycle time based on the instructions, or ensuring the underlying tool has this information in addition to the recommendations

Thank you! Kateryna Kubrak [email protected] PhD Student Institute of Computer Science University of Tartu Icons by Freepik on flaticon.com 27