Do LLMs Reason?

There is a lot of excitement about Large Reasoning Models like OpenAI's o1 and Deepseek's R1 — but that excitement has fostered some detailed analysis and critique that seems to cast doubt on the extent to which language models can reason, and indeed whether there is any actual reasoning going on at all.

When reasoning is expressed in language – which everyone agrees is what language models might be doing – the structures that result are arguments. As we start to explore the 'R' in LRMs, we can turn to an area of philosophy, known as argumentation theory, to provide some starting points. As those philosophical and linguistic starting points lead to AI engineering, the result is argument technology.

The Centre for Argument Technology, founded over two decades ago, has been developing AI argumentation systems in domains ranging from broadcast media to geopolitics and from intelligence analysis to healthcare that have gone on to support hundreds of thousands of users around the world.

At ACL2025 the Centre has leveraged its unique datasets and software stacks to explore issues of reasoning in language models and has seven papers focusing on different aspects.

Reasoning in modern AI models is a lynchpin challenge for 2025, and we’re looking forward to chatting about the directions and burning questions in the area at ACL in Vienna.

The Papers

Monday 28 July • 11:00-12:30

CU-MAM: Coherence-Driven Unified Macro-Structures for Argument Mining

📍 Hall 4/5 • Poster Session

Presenter: Debela Gemechu

Uses large- and very-large-scale argumentation structures to model complex reasoning processes in natural language text.

Monday 28 July • 14:00-15:30

Lexical Recall or Logical Reasoning: Probing the Limits of Reasoning Abilities in Large Language Models

📍 Room 1.15-16 • Paper Presentation

Presenter: Henrike Beyer

Uses the domain of logic puzzles to quantify the limits of reasoning in language models and distinguish between memorization and reasoning.

Monday 28 July • 18:00-19:30

Natural Language Reasoning In Language Models: Analysis and Evaluation

📍 Hall 4/5 • Poster Session

Presenter: Debela Gemechu

Harnesses large natural language datasets of argumentation with masking to assess LLM performance on natural reasoning tasks.

Tuesday 29 July • 14:00-15:30

Mining Patterns of Complex Argumentative Reasoning in Natural Language Dialogue

📍 Room 1.62 • Paper Presentation

Presenter: Ramon Ruiz Dolz

Explores complex argumentative reasoning patterns in natural dialogue and their implications for conversational AI systems.

Wednesday 30 July • 11:00-12:30

The Open Argument Mining Framework

📍 Hall 5X • Demo Session

Presenter: Whole ARG-tech Team

Provides an open, extensible set of tools and libraries for processing argument in natural language. Live system demonstration.

Thursday 31 July • 16:00-17:15

Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

📍 Hall B • Argument Mining Workshop

Workshop presentation on re-framing LLMs as tools to exercise our critical thinking skills rather than replacing them.

Thursday 31 July • 16:00-17:15

Practical Solutions for Practical Problems in Deploying Argument Mining Systems

📍 Hall B • Argument Mining Workshop

Presenter: Debela Gemechu

Workshop presentation on practical challenges and solutions in deploying argument mining systems in real-world applications.

The Papers

CU-MAM: Coherence-Driven Unified Macro-Structures for Argument Mining

Lexical Recall or Logical Reasoning: Probing the Limits of Reasoning Abilities in Large Language Models

Natural Language Reasoning In Language Models: Analysis and Evaluation

Mining Patterns of Complex Argumentative Reasoning in Natural Language Dialogue

The Open Argument Mining Framework

Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

Practical Solutions for Practical Problems in Deploying Argument Mining Systems

The Team