Bálint Gyevnár

PhD student in safe and explainable AI

Hi, I am Bálint. Thanks for checking on my home page!
(My name is pronounced BAH-lint [baːlint])

My primary research area is explainable multi-agent reinforcement learning. I like to describe this as the study of giving interacting AI agents the ability to explain themselves.

I am primarily interested in how we can explain complex emergent behaviour in multi-agent systems (MAS) via the use of counterfactual reasoning, and how this in turn can be used to calibrate trust and verify the safety of MAS.

I also work on bridging the epistemic foundations and research problems of AI ethics and safety to foster cross-disciplinary collaboration.

I am a member of the Autonomous Agents Research Group, supervised by Shay Cohen and Chris Lucas. I was previously supervised by Stefano Albrecht.

news

Aug 10, 2025	I spent a week visiting the Center for Humans and Machines led by Iyad Rahwan at the Max Planck Institute, Berlin.
Jul 30, 2025	I attended the 2025 Human-aligned AI Summer School in Prague, oragnised by the Alignment of Complex Systems Research and the Center for Theoretical Study at Charles University.
Jun 18, 2025	I attended the 2025 Bridging Responsible AI Divides (BRAID) Gathering in Manchester.
Jun 11, 2025	I attended RLDM 2025, the Multi-disciplinary Conference on Reinforcement Learning and Decision Making, in Dublin, where I have presented a poster on Objective Metrics for Explainable RL paper.
Jun 07, 2025	I gave a talk and presented a poster at the 9th Center for Human-Compatible AI Workshop on “AI Safety for Everyone”.
May 26, 2025	New preprint paper titled: Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour.
Mar 21, 2025	I am co-organising a workshop on “Evaluating Explainable AI and Complex Decision-Making” co-located with ECAI ‘25. Call for papers found here.

latest posts

Dec 22, 2024	Love, Sex, and AI
Jun 27, 2022	Blog - Cars that Explain: Building Trust in Autonomous Vehicles through Causal Explanations and Conversations

selected publications

Nat. Mac. Intell.
AI Safety for Everyone

Balint Gyevnar^*, and Atoosa Kasirzadeh^*

Nature Machine Intelligence, Apr 2025

Abs DOI arXiv Bib Code

Narratives about AI safety have recently ignited significant debate, with profound implications for policy decisions and resource allocation in AI development and regulation. Two distinct perspectives have emerged: one views AI safety primarily as a project for minimizing existential threats of advanced AI, while the other sees it as a natural extension of existing technological safety practices, focusing on immediate and concrete risks of current AI systems. This paper conducts a systematic literature review of primarily peer-reviewed research on AI safety to empirically investigate the diversity of studied safety risks associated with AI systems and their corresponding mitigation strategies. Our review shows a vast array of concrete research, including numerous AI system engineering studies, that address safety concerns irrespective of existential risk considerations. This research includes areas like adversarial robustness and interpretability, highlighting the immediate relevance and importance of AI safety for current AI systems. Based on these empirical findings, we argue that primarily tying AI safety research to efforts for minimizing the existential risks of advanced AI presents an overly narrow perspective on the breadth and scope of AI safety research. Instead, the perception of the scope and aims of AI safety research must be more epistemically inclusive, embracing the diverse concerns and aspirations that shape the ongoing discussion about safe AI. We recognize that the focus on existential risks remains a contested topic within the AI community and more broadly, with many researchers actively engaged in safe AI development and deployment holding neutral or dissenting views about existential risks from advanced AI. Consequently, given our findings, we advocate for a conception of AI safety that is pluralistic and incorporates a wider range of safety considerations, motivations, and perspectives.
@article{gyevnar2025AIsafety, title = {AI Safety for Everyone}, author = {Gyevnar, Balint and Kasirzadeh, Atoosa}, year = {2025}, month = apr, journal = {Nature Machine Intelligence}, publisher = {Springer}, address = {New York, NY, United States}, doi = {10.1038/s42256-025-01020-y}, }
RLDM
Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning

Balint Gyevnar^*, and Mark Towers^*

In 2025 Multi-Disciplinary Conference on Reinforcement Learning and Decision Making, Dublin, Ireland, Jun 2025

Abs arXiv Bib

Explanation is a fundamentally human process. Understanding the goal and audience of the explanation is vital, yet existing work on explainable reinforcement learning (XRL) routinely does not consult humans in their evaluations. Even when they do, they routinely resort to subjective metrics, such as confidence or understanding, that can only inform researchers of users’ opinions, not their practical effectiveness for a given problem. This paper calls on researchers to use objective human metrics for explanation evaluations based on observable and actionable behaviour to build more reproducible, comparable, and epistemically grounded research. To this end, we curate, describe, and compare several objective evaluation methodologies for applying explanations to debugging agent behaviour and supporting human-agent teaming, illustrating our proposed methods using a novel grid-based environment. We discuss how subjective and objective metrics complement each other to provide holistic validation and how future work needs to utilise standardised benchmarks for testing to enable greater comparisons between research.
@inproceedings{gyevnar2025objective, title = {Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning}, author = {Gyevnar, Balint and Towers, Mark}, year = {2025}, month = jun, url = {https://arxiv.org/abs/2501.19256}, location = {Dublin, Ireland}, booktitle = {2025 Multi-Disciplinary Conference on Reinforcement Learning and Decision Making}, }
CHI
People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior: Insights from Cognitive Science for Explainable AI

Balint Gyevnar, Stephanie Droop, Tadeg Quillien, Shay B. Cohen, Neil R. Bramley, Christopher G. Lucas, and Stefano V. Albrecht

In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, Apr 2025

Abs DOI arXiv Bib Slides

It is often argued that effective human-centered explainable artificial intelligence (XAI) should resemble human reasoning. However, empirical investigations of how concepts from cognitive science can aid the design of XAI are lacking. Based on insights from cognitive science, we propose a framework of explanatory modes to analyze how people frame explanations, whether mechanistic, teleological, or counterfactual. Using autonomous driving, a complex safety-critical domain, we conduct an experiment consisting of two studies on (i) how people explain the behavior of a vehicle in 14 unique scenarios (N1=54), and (ii) how they perceive these explanations (N2=382). Our main finding is that participants deem teleological explanations significantly better quality than counterfactual ones, with perceived teleology being the best predictor of perceived quality. Based on our results, we argue that explanatory modes are an important axis of analysis when designing and evaluating XAI and highlight the need for a principled and empirically grounded understanding of the cognitive mechanisms of explanation.
@inproceedings{gyevnar2024attribute, title = {People Attribute Purpose to Autonomous Vehicles When Explaining Their Behavior: Insights from Cognitive Science for Explainable AI}, author = {Gyevnar, Balint and Droop, Stephanie and Quillien, Tadeg and Cohen, Shay B. and Bramley, Neil R. and Lucas, Christopher G. and Albrecht, Stefano V.}, year = {2025}, month = apr, publisher = {Association for Computing Machinery}, address = {New York, NY, United States}, booktitle = {Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems}, url = {https://arxiv.org/abs/2403.08828}, location = {Yokohama, Japan}, doi = {10.1145/3706598.3713509} }
AAMAS
Causal Explanations for Sequential Decision-Making in Multi-Agent Systems

Balint Gyevnar, Cheng Wang, Christopher G. Lucas, Shay B. Cohen, and Stefano V. Albrecht

In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, Auckland, New Zealand, May 2024

Abs arXiv Bib HTML Code Poster Slides

We present CEMA: Causal Explanations in Multi-A gent systems; a framework for creating causal natural language explanations of an agent’s decisions in dynamic sequential multi-agent systems to build more trustworthy autonomous agents. Unlike prior work that assumes a fixed causal structure, CEMA only requires a probabilistic model for forward-simulating the state of the system. Using such a model, CEMA simulates counterfactual worlds that identify the salient causes behind the agent’s decisions. We evaluate CEMA on the task of motion planning for autonomous driving and test it in diverse simulated scenarios. We show that CEMA correctly and robustly identifies the causes behind the agent’s decisions, even when a large number of other agents is present, and show via a user study that CEMA’s explanations have a positive effect on participants’ trust in autonomous vehicles and are rated as high as high-quality baseline explanations elicited from other participants. We release the collected explanations with annotations as the HEADD dataset.
@inproceedings{gyevnar2024causal, author = {Gyevnar, Balint and Wang, Cheng and Lucas, Christopher G. and Cohen, Shay B. and Albrecht, Stefano V.}, title = {Causal Explanations for Sequential Decision-Making in Multi-Agent Systems}, year = {2024}, month = may, publisher = {International Foundation for Autonomous Agents and Multiagent Systems}, address = {Richland, SC}, booktitle = {Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems}, pages = {771-779}, numpages = {9}, keywords = {autonomous vehicles, causal explanations, dataset, explainable ai, human-centric xai, multi-agent systems}, location = {Auckland, New Zealand}, series = {AAMAS '24}, }
ECAI
Bridging the Transparency Gap: What Can Explainable AI Learn From the AI Act?

Balint Gyevnar, Nick Ferguson, and Burkhard Schafer

In 26th European Conference on Artificial Intelligence, Sep 2023

Abs DOI arXiv Bib Poster Slides

The European Union has proposed the Artificial Intelligence Act which introduces detailed requirements of transparency for AI systems. Many of these requirements can be addressed by the field of explainable AI (XAI), however, there is a fundamental difference between XAI and the Act regarding what transparency is. The Act views transparency as a means that supports wider values, such as accountability, human rights, and sustainable innovation. In contrast, XAI views transparency narrowly as an end in itself, focusing on explaining complex algorithmic properties without considering the socio-technical context. We call this difference the “transparency gap”. Failing to address the transparency gap, XAI risks leaving a range of transparency issues unaddressed. To begin to bridge this gap, we overview and clarify the terminology of how XAI and European regulation – the Act and the related General Data Protection Regulation (GDPR) – view basic definitions of transparency. By comparing the disparate views of XAI and regulation, we arrive at four axes where practical work could bridge the transparency gap: defining the scope of transparency, clarifying the legal status of XAI, addressing issues with conformity assessment, and building explainability for datasets.
@inproceedings{gyevnar2023transparencyGap, title = {Bridging the Transparency Gap: What Can Explainable AI Learn From the AI Act?}, author = {Gyevnar, Balint and Ferguson, Nick and Schafer, Burkhard}, booktitle = {26th European Conference on Artificial Intelligence}, pages = {964--971}, year = {2023}, month = sep, organization = {IOS Press}, url = {https://ebooks.iospress.nl/doi/10.3233/FAIA230367}, doi = {10.3233/FAIA230367} }