Publications by categories in reversed chronological order.
Color legend: Conference Journal Award Preprint
- Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement LearningBalint Gyevnar*, and Mark Towers*2025
Explanation is a fundamentally human process. Understanding the goal and audience of the explanation is vital, yet existing work on explainable reinforcement learning (XRL) routinely does not consult humans in their evaluations. Even when they do, they routinely resort to subjective metrics, such as confidence or understanding, that can only inform researchers of users’ opinions, not their practical effectiveness for a given problem. This paper calls on researchers to use objective human metrics for explanation evaluations based on observable and actionable behaviour to build more reproducible, comparable, and epistemically grounded research. To this end, we curate, describe, and compare several objective evaluation methodologies for applying explanations to debugging agent behaviour and supporting human-agent teaming, illustrating our proposed methods using a novel grid-based environment. We discuss how subjective and objective metrics complement each other to provide holistic validation and how future work needs to utilise standardised benchmarks for testing to enable greater comparisons between research.
@misc{gyevnar2025objective, title = {Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning}, author = {Gyevnar, Balint and Towers, Mark}, year = {2025}, eprint = {2501.19256}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, url = {}, }
- People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior: Insights from Cognitive Science for Explainable AIIn CHI ’25: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 2025
It is often argued that effective human-centered explainable artificial intelligence (XAI) should resemble human reasoning. However, empirical investigations of how concepts from cognitive science can aid the design of XAI are lacking. Based on insights from cognitive science, we propose a framework of explanatory modes to analyze how people frame explanations, whether mechanistic, teleological, or counterfactual. Using autonomous driving, a complex safety-critical domain, we conduct an experiment consisting of two studies on (i) how people explain the behavior of a vehicle in 14 unique scenarios (N1=54), and (ii) how they perceive these explanations (N2=382). Our main finding is that participants deem teleological explanations significantly better quality than counterfactual ones, with perceived teleology being the best predictor of perceived quality. Based on our results, we argue that explanatory modes are an important axis of analysis when designing and evaluating XAI and highlight the need for a principled and empirically grounded understanding of the cognitive mechanisms of explanation.
@inproceedings{gyevnar2024attribute, title = {People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior: Insights from Cognitive Science for Explainable AI}, author = {Gyevnar, Balint and Droop, Stephanie and Quillien, Tadeg and Cohen, Shay B. and Bramley, Neil R. and Lucas, Christopher G. and Albrecht, Stefano V.}, year = {2025}, publisher = {Association for Computing Machinery}, address = {New York, NY, United States}, booktitle = {CHI '25: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems}, url = {}, location = {Yokohama, Japan}, doi = {10.1145/3706598.3713509} }
- Towards Trustworthy Autonomous Systems via Conversations and ExplanationsBalint GyevnarProceedings of the AAAI Conference on Artificial Intelligence, Mar 2024
Autonomous systems fulfil an increasingly important role in our societies, however, AI-powered systems have seen less success over the years, as they are expected to tackle a range of social, legal, or technological challenges and modern neural network-based AI systems cannot yet provide guarantees to many of these challenges. Particularly important is that these systems are black box decision makers, eroding human oversight, contestation, and agency. To address this particular concern, my thesis focuses on integrating social explainable AI with cognitive methods and natural language processing to shed light on the internal processes of autonomous systems in a way accessible to lay users. I propose a causal explanation generation model for decision-making called CEMA based on counterfactual simulations in multi-agent systems. I also plan to integrate CEMA with a broader natural language processing pipeline to support targeted and personalised explanations that address people’s cognitive biases. I hope that my research will have a positive impact on the public acceptance of autonomous agents by building towards more trustworthy AI.
@article{gyevnar2024towardstrustworthy, title = {Towards Trustworthy Autonomous Systems via Conversations and Explanations}, volume = {38}, url = {}, doi = {10.1609/aaai.v38i21.30395}, number = {21}, journal = {Proceedings of the AAAI Conference on Artificial Intelligence}, author = {Gyevnar, Balint}, year = {2024}, month = mar, pages = {23389-23390}, }
- Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic ReviewIEEE Transactions on Intelligent Transportation Systems, Dec 2024
Artificial Intelligence (AI) shows promising applications for the perception and planning tasks in autonomous driving (AD) due to its superior performance compared to conventional methods. However, inscrutable AI systems exacerbate the existing challenge of safety assurance of AD. One way to mitigate this challenge is to utilize explainable AI (XAI) techniques. To this end, we present the first comprehensive systematic literature review of explainable methods for safe and trustworthy AD. We begin by analyzing the requirements for AI in the context of AD, focusing on three key aspects: data, model, and agency. We find that XAI is fundamental to meeting these requirements. Based on this, we explain the sources of explanations in AI and describe a taxonomy of XAI. We then identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation. Finally, we propose a modular framework called SafeX to integrate these contributions, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.
@article{kuznietsov2024avreview, title = {Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review}, author = {Kuznietsov, Anton and Gyevnar, Balint and Wang, Cheng and Peters, Steven and Albrecht, Stefano V.}, year = {2024}, month = dec, journal = {IEEE Transactions on Intelligent Transportation Systems}, volume = {25}, number = {12}, pages = {19342-19364}, publisher = {IEEE}, doi = {10.1109/TITS.2024.3474469}, url = {}, }
- Causal Explanations for Sequential Decision-Making in Multi-Agent SystemsIn Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, Auckland, New Zealand, Dec 2024
We present CEMA: Causal Explanations in Multi-A gent systems; a framework for creating causal natural language explanations of an agent’s decisions in dynamic sequential multi-agent systems to build more trustworthy autonomous agents. Unlike prior work that assumes a fixed causal structure, CEMA only requires a probabilistic model for forward-simulating the state of the system. Using such a model, CEMA simulates counterfactual worlds that identify the salient causes behind the agent’s decisions. We evaluate CEMA on the task of motion planning for autonomous driving and test it in diverse simulated scenarios. We show that CEMA correctly and robustly identifies the causes behind the agent’s decisions, even when a large number of other agents is present, and show via a user study that CEMA’s explanations have a positive effect on participants’ trust in autonomous vehicles and are rated as high as high-quality baseline explanations elicited from other participants. We release the collected explanations with annotations as the HEADD dataset.
@inproceedings{gyevnar2024causal, author = {Gyevnar, Balint and Wang, Cheng and Lucas, Christopher G. and Cohen, Shay B. and Albrecht, Stefano V.}, title = {Causal Explanations for Sequential Decision-Making in Multi-Agent Systems}, year = {2024}, publisher = {International Foundation for Autonomous Agents and Multiagent Systems}, address = {Richland, SC}, booktitle = {Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems}, pages = {771-779}, numpages = {9}, keywords = {autonomous vehicles, causal explanations, dataset, explainable ai, human-centric xai, multi-agent systems}, location = {Auckland, New Zealand}, series = {AAMAS '24}, }
- Bridging the Transparency Gap: What Can Explainable AI Learn From the AI Act?Balint Gyevnar, Nick Ferguson, and Burkhard SchaferIn 26th European Conference on Artificial Intelligence, Dec 2023
The European Union has proposed the Artificial Intelligence Act which introduces detailed requirements of transparency for AI systems. Many of these requirements can be addressed by the field of explainable AI (XAI), however, there is a fundamental difference between XAI and the Act regarding what transparency is. The Act views transparency as a means that supports wider values, such as accountability, human rights, and sustainable innovation. In contrast, XAI views transparency narrowly as an end in itself, focusing on explaining complex algorithmic properties without considering the socio-technical context. We call this difference the “transparency gap”. Failing to address the transparency gap, XAI risks leaving a range of transparency issues unaddressed. To begin to bridge this gap, we overview and clarify the terminology of how XAI and European regulation – the Act and the related General Data Protection Regulation (GDPR) – view basic definitions of transparency. By comparing the disparate views of XAI and regulation, we arrive at four axes where practical work could bridge the transparency gap: defining the scope of transparency, clarifying the legal status of XAI, addressing issues with conformity assessment, and building explainability for datasets.
@inproceedings{gyevnar2023transparencyGap, title = {Bridging the Transparency Gap: What Can Explainable AI Learn From the AI Act?}, author = {Gyevnar, Balint and Ferguson, Nick and Schafer, Burkhard}, booktitle = {26th European Conference on Artificial Intelligence}, pages = {964--971}, year = {2023}, organization = {IOS Press}, url = {}, doi = {10.3233/FAIA230367} }
- Love, Sex, and AIBalint GyevnarIn AI100 Early Career Essay Competition, Sep 2023
Balint was one of five selected top submissions to the AI100 Early Career Essay Competition by the Stanford Institute for Human-Centered Artificial Intelligence.
The artificial lover has captivated people’s imagination since ancient times. Today, technologies such as affective chatbots, AI-generated imagery, and human-like robots capture the minds, and indeed the bodies, of the amorous. Research interest in the topic has increased in recent years, yet the AI100 study panel remains silent to date on the genuinely promising applications, major ethical issues, and technological roadblocks of AI in love and sex. Now that real Pygmalions and Coppelias are being born into our world, we must look past sensationalised media coverages and sci-fi to ask in earnest about the social, legal, and ethical challenges our society must face if we really are to love artificial intelligence; and whether it should love us back.
@incollection{gyevnar2023loveSexAI, title = {Love, Sex, and AI}, author = {Gyevnar, Balint}, booktitle = {AI100 Early Career Essay Competition}, year = {2023}, month = sep, publisher = {Stanford}, }
- Trustworthy Autonomous Systems Early Career Research Award, Knowledge Transfer TrackBalint GyevnarSep 2023
Balint was awarded £4000 by the UKRI TAS Hub to achieve his vision for more trustworthy autonomous systems (TAS) through explainability and conversations.
- A Human-Centric Method for Generating Causal Explanations in Natural Language for Autonomous Vehicle Motion PlanningIn IJCAI 2022 Workshop on Artificial Intelligence for Autonomous Driving, Sep 2022
Inscrutable AI systems are difficult to trust, especially if they operate in safety-critical settings like autonomous driving. Therefore, there is a need to build transparent and queryable systems to increase trust levels. We propose a transparent, human-centric explanation generation method for autonomous vehicle motion planning and prediction based on an existing white-box system called IGP2. Our method integrates Bayesian networks with context-free generative rules and can give causal natural language explanations for the high-level driving behaviour of autonomous vehicles. Preliminary testing on simulated scenarios shows that our method captures the causes behind the actions of autonomous vehicles and generates intelligible explanations with varying complexity.
@inproceedings{gyevnar2022humanCentric, abbrev = {IJCAI}, title = {A Human-Centric Method for Generating Causal Explanations in Natural Language for Autonomous Vehicle Motion Planning}, author = {Gyevnar, Balint and Tamborski, Massimiliano and Wang, Cheng and Lucas, Christopher G. and Cohen, Shay B. and Albrecht, Stefano V.}, booktitle = {IJCAI 2022 Workshop on Artificial Intelligence for Autonomous Driving}, year = {2022}, url = {} }
- Communicative Efficiency or Iconic Learning: Do acquisition and communicative pressures interact to shape colour-naming systems?Balint* Gyevnar, Gautier* Dagan, Coleman* Haley, Shangmin* Guo, and 1 more authorEntropy, Sep 2022
Language evolution is driven by pressures for simplicity and informativity; however, the timescale on which these pressures operate is debated. Over several generations, learners’ biases for simple and informative systems can guide language evolution. Over repeated instances of dyadic communication, the principle of least effort dictates that speakers should bias systems towards simplicity and listeners towards informativity, similarly guiding language evolution. At the same time, it has been argued that learners only provide a bias for simplicity and, thus, language users must provide a bias for informativity. To what extent do languages evolve during acquisition versus use? We address this question by formally defining and investigating the communicative efficiency of acquisition trajectories. We illustrate our approach using colour-naming systems, replicating a communicative efficiency model based on the information bottleneck problem, and an acquisition model based on self-organising maps. We find that to the extent that language is iconic, learning alone is sufficient to shape language evolution. Regarding colour-naming systems specifically, we find that incorporating learning biases into communicative efficiency accounts might explain how speakers and listeners trade off communicative effort.
@article{gyevnar2022colour, title = {Communicative Efficiency or Iconic Learning: Do acquisition and communicative pressures interact to shape colour-naming systems?}, author = {Gyevnar, Balint and Dagan, Gautier and Haley, Coleman and Guo, Shangmin and Mollica, Frank}, journal = {Entropy}, volume = {24}, number = {11}, pages = {1542}, year = {2022}, publisher = {MDPI}, doi = {10.3390/e24111542}, keywords = {colour-naming systems; communicative efficiency; language evolution; information bottleneck}, }
- Cars that Explain: Building Trust in Autonomous Vehicles through Explanations and ConversationsBalint GyevnarIn “Shape the Future of ITS” Competition, Sep 2022
Balint receveid the 3rd prize in the ”Shape the Future of ITS” Competition by the IEEE Intelligent Transportation Systems Society
@incollection{gyevnar2022carsExplain, title = {Cars that Explain: Building Trust in Autonomous Vehicles through Explanations and Conversations}, author = {Gyevnar, Balint}, booktitle = {``Shape the Future of ITS'' Competition}, publisher = {IEEE Intelligent Transportation Systems Society (ITSS)}, year = {2022}, }
- GRIT: Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous DrivingCillian Brewitt, Balint Gyevnar, Samuel Garcin, and Stefano V. AlbrechtIn IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep 2021
It is important for autonomous vehicles to have the ability to infer the goals of other vehicles (goal recognition), in order to safely interact with other vehicles and predict their future trajectories. This is a difficult problem, especially in urban environments with interactions between many vehicles. Goal recognition methods must be fast to run in real time and make accurate inferences. As autonomous driving is safety- critical, it is important to have methods which are human interpretable and for which safety can be formally verified. Existing goal recognition methods for autonomous vehicles fail to satisfy all four objectives of being fast, accurate, interpretable and verifiable. We propose Goal Recognition with Interpre table Trees (GRIT), a goal recognition system which achieves these objectives. GRIT makes use of decision trees trained on vehicle trajectory data. We evaluate GRIT on two datasets, showing that GRIT achieved fast inference speed and comparable accuracy to two deep learning baselines, a planning-based goal recognition method, and an ablation of GRIT. We show that the learned trees are human interpretable and demonstrate how properties of GRIT can be formally verified using a satisfiability modulo theories (SMT) solver.
@inproceedings{brewitt2021grit, title = {{GRIT:} Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving}, author = {Brewitt, Cillian and Gyevnar, Balint and Garcin, Samuel and Albrecht, Stefano V.}, booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, year = {2021}, pages = {1023-1030}, doi = {10.1109/IROS51168.2021.9636279} }
- Interpretable Goal-based Prediction and Planning for Autonomous DrivingIn IEEE International Conference on Robotics and Automation (ICRA), Sep 2021
We propose an integrated prediction and planning system for autonomous driving which uses rational inverse planning to recognise the goals of other vehicles. Goal recognition informs a Monte Carlo Tree Search (MCTS) algorithm to plan optimal maneuvers for the ego vehicle. Inverse planning and MCTS utilise a shared set of defined maneuvers and macro actions to construct plans which are explainable by means of rationality principles. Evaluation in simulations of urban driving scenarios demonstrate the system’s ability to robustly recognise the goals of other vehicles, enabling our vehicle to exploit non-trivial opportunities to significantly reduce driving times. In each scenario, we extract intuitive explanations for the predictions which justify the system’s decisions.
@inproceedings{albrecht2020igp2, title = {Interpretable Goal-based Prediction and Planning for Autonomous Driving}, author = {Albrecht, Stefano V. and Brewitt, Cillian and Wilhelm, John and Gyevnar, Balint and Eiras, Francisco and Dobre, Mihai and Ramamoorthy, Subramanian}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, year = {2021}, doi = {10.1109/ICRA48506.2021.9560849}, pages = {1043-1049} }