What Kind of Reasoning (if any) is an LLM actually doing? On the Stochastic Nature and Abductive Appearance of Large Language Models

Reading time: 5 minute
...

📝 Original Info

  • Title: What Kind of Reasoning (if any) is an LLM actually doing? On the Stochastic Nature and Abductive Appearance of Large Language Models
  • ArXiv ID: 2512.10080
  • Date: 2025-12-10
  • Authors: Luciano Floridi, Jessica Morley, Claudio Novelli, David Watson

📝 Abstract

This article looks at how reasoning works in current Large Language Models (LLMs) that function using the token-completion method. It examines their stochastic nature and their similarity to human abductive reasoning. The argument is that these LLMs create text based on learned patterns rather than performing actual abductive reasoning. When their output seems abductive, this is largely because they are trained on human-generated texts that include reasoning structures. Examples are used to show how LLMs can produce plausible ideas, mimic commonsense reasoning, and give explanatory answers without being grounded in truth, semantics, verification, or understanding, and without performing any real abductive reasoning. This dual nature, where the models have a stochastic base but appear abductive in use, has important consequences for how LLMs are evaluated and applied. They can assist with generating ideas and supporting human thinking, but their outputs must be critically assessed because they cannot identify truth or verify their explanations. The article concludes by addressing five objections to these points, noting some limitations in the analysis, and offering an overall evaluation.

💡 Deep Analysis

📄 Full Content

1 We add this clarification to avoid any confusion. At the time of writing, alternative approaches to language modelling are emerging, such as Byte-Level Models (see the Byte Latent Transformer (BLT) developed by researchers at Meta AI), Large Concept Models (LCMs), Diffusion Models, Neurosymbolic AI systems that integrate formal reasoning with neural networks, or Selective Language Modeling (SLM). Some of them are still based on next-token prediction during the generation phase. The Rho-1 model, for example, uses SLM, which improves data efficiency and performance on specific tasks like complex math problems, though the core inference mechanism remains a form of completion.

2 In this article, we follow the common convention of discussing the GPT series (e.g., to refer to the underlying core models behind ChatGPT, representing OpenAI’s consumer-facing conversational AI products (e.g., . Even if some of our points apply to the commercial products, we trust that the difference is clear enough to avoid confusion. 3 We analyse text-only LLMs. Multimodal models (text-image/audio/video) may share mechanisms but introduce additional factors (e.g., cross-modal alignment), which we leave for future work.

present in the data used to train the models. The result is a compelling illusion of genuine and structured inferential reasoning.

We need to understand how stochastic processes can create outputs that resemble abductive inference, and what this implies for the broader relationship between statistical AI and human cognition and reasoning. In what follows, we explore this relationship. Section 2 defines abduction and inference to the best explanation (IBE), contrasting both with deduction and induction. Section 3 outlines statistical inference and stochastic processes and their connections to abduction. Section 4 explains LLMs’ operational logic as generative models of token distributions rather than symbolic reasoning. Section 5 examines why outputs appear similar to IBE, with examples and failures, e.g., hallucinations. Section 6 considers five objections from both perspectives: either the resemblance is superficial, or LLMs exhibit weak latent reasoning. The conclusion in section 6 argues that LLMs have a stochastic core and an abductive appearance, with implications for safety and for formalising abduction. A final suggestion before we start. Sections 2 and 3 are meant to make the paper selfsufficient. Readers already familiar with abductive reasoning, inference to the best explanation, and their relationship to probabilistic reasoning may wish to skip directly to Section 4, where we turn to the specific analysis of LLMs.

Peirce coined the term “abduction” to describe inference from effect to hypothesised cause (Peirce 1934). In a classic example, coming home to find the lawn wet, you might abduce that it rained earlier. This is not certain (someone might have run a sprinkler), but it provides a plausible explanation. Abduction thus contrasts with deduction (which reasons forward, in this case from cause to effect with certainty) and with induction (which generalises from many wet-lawn observations to a potentially probabilistic rule). Harman (1965) later popularised the closely related idea of Inference to the Best Explanation (IBE): not only do we form explanatory hypotheses, but we also often choose the hypothesis that, if true, would best explain the evidence. 4 IBE can be understood as a form of abduction that adds a comparative evaluation step: multiple candidates are generated, then weighed by criteria such as simplicity, coherence with background knowledge, scope of explanation, and so on. The “best” explanation is then inferred as the most likely to be true. For example, if one finds footprints by the window and the laptop is missing, possible explanations might include “a burglary occurred” or “a friend borrowed it and left through the window.” A reasoner employing IBE would consider which explanation makes better sense of all facts (burglary might also better explain a broken lock, etc.) and tentatively accept that one.

Both abduction and IBE are defeasible types of inference: their conclusions can be wrong, even if the reasoning appears sensible, because new evidence or information can defeat or invalidate them. Imagine a message from a friend apologising for borrowing the laptop. As a result, they lack the truth-preserving guarantee of deduction, but they play an essential role in everyday and scientific reasoning.

In the analytic tradition, IBE is sometimes viewed as a standalone logical rule: infer H if, among competing hypotheses, H would provide the best explanation for evidence E if true. This can even be schematised: from A → B and observing B, infer A as a plausible hypothesis, though not necessarily certain. The logical form is A (hypothesis) implies B (observation); B is observed; therefore, A (tentatively). Clearly, this form is not truth-preserving (many As could imply B), but it is

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut