Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning
In recent years, general-purpose large language models (LLMs) such as GPT, Gemini, Claude, and DeepSeek have advanced at an unprecedented pace. Despite these achievements, their application to finance remains challenging, due to fragmented data sources, intransparent reasoning processes, and weak transferability to business applications. In response, we introduce Fin-R1, a reasoning LLM designed for financial scenarios. With a compact size of 7 billion parameters, Fin-R1 reduces deployment costs while addressing the aforementioned challenges. Its development follows a two-stage pipeline. First, we construct Fin-R1-Data, a high-quality financial dataset consisting of 60,091 chain-of-thought (CoT) samples, distilled and filtered from multiple authoritative benchmarks to ensure consistency and reliability. Second, we train Fin-R1 using Fin-R1-Data through supervised fine-tuning (SFT), followed by reinforcement learning (RL). This stage substantially improves the model’s ability to solve complex financial reasoning tasks, yielding outputs that are both accurate and interpretable. Despite its relatively small parameter scale, Fin-R1 achieves competitive empirical performance across established financial benchmarks and demonstrates practical utility in compliance checking and robo-advisory. Our code is publicly available at https://github.com/SUFE-AIFLM-Lab/Fin-R1, and has already attracted over 700 stars.
💡 Research Summary
Fin‑R1 is a domain‑specific large language model (LLM) designed for financial reasoning, built with a modest 7 billion‑parameter backbone. The authors identify three core obstacles that impede the adoption of general‑purpose LLMs in finance: (1) fragmented and heterogeneous data sources that make knowledge integration difficult, (2) opaque “black‑box” reasoning that conflicts with regulatory demands for traceability, and (3) weak transferability and generalization across rapidly evolving financial scenarios. To address these challenges, they propose a two‑stage pipeline: data construction followed by model training.
Data Construction (Fin‑R1‑Data). The dataset comprises 60,091 high‑quality chain‑of‑thought (CoT) examples in both Chinese and English, organized into four thematic categories—advanced business knowledge, basic business knowledge, professional knowledge, and financial code. Raw material is drawn from open‑source corpora and proprietary examination problems. Generation of initial reasoning traces is performed by the existing model DeepSeek‑R1, which produces candidate CoT outputs. These outputs are then filtered by Qwen2.5‑72B‑Instruct, which evaluates logical consistency, coherence, and domain alignment. This two‑step automatic pipeline yields a bilingual, high‑fidelity CoT corpus that mitigates the scarcity of structured financial reasoning data.
Model Training. The training proceeds in two sequential phases. First, supervised fine‑tuning (SFT) on the Fin‑R1‑Data teaches the model to “think before answering,” explicitly exposing it to human‑readable reasoning steps. Second, the authors introduce Group‑Relative Policy Optimization (GRPO), a variant of reinforcement learning that avoids the costly value‑function learning required by traditional PPO. GRPO generates multiple candidate responses for each input, computes relative advantages within the group, and updates the policy accordingly. A format‑reward term is also incorporated to enforce that the final output retains a clear, step‑by‑step structure.
Empirical Results. On a suite of established financial reasoning benchmarks—including FinQA, among others—Fin‑R1 achieves an average score of 75.2, ranking second overall and outperforming other 7‑billion‑parameter models by more than 17 points. The model demonstrates strong performance in practical tasks such as compliance checking, robo‑advisory, and financial code generation, indicating that the reasoning capabilities translate into real‑world utility.
Strengths and Limitations. The paper’s primary contributions are (i) a systematic solution to data fragmentation through an automated, high‑quality CoT data pipeline, (ii) the integration of transparent reasoning paths directly into the training data, thereby reducing black‑box concerns, and (iii) the use of GRPO to achieve efficient reinforcement learning for reasoning‑intensive tasks. However, the modest model size may limit performance on highly complex quantitative problems (e.g., exotic derivative pricing). The sensitivity of GRPO to group size and candidate selection is not fully explored, and the work lacks a detailed analysis of latency, security, and compliance testing required for production deployment in regulated financial environments.
Conclusion and Future Work. Fin‑R1 showcases that a relatively small LLM, when equipped with a curated CoT dataset and a two‑stage SFT‑plus‑GRPO training regime, can attain competitive reasoning performance while maintaining interpretability. The public release of code and data promotes reproducibility and opens avenues for extensions such as multimodal financial data integration, domain‑specific prompt engineering, and hybrid training with larger models. Given its demonstrated capabilities in compliance verification, risk assessment, and advisory automation, Fin‑R1 represents a promising step toward practical, trustworthy AI assistants for the financial industry.
Comments & Academic Discussion
Loading comments...
Leave a Comment