FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning
This paper presents FinCoT, a structured chain-of-thought (CoT) prompting framework that embeds domain-specific expert financial reasoning blueprints to guide large language models’ behaviors. We identify three main prompting styles in financial NLP (FinNLP): (1) standard prompting (zero-shot), (2) unstructured CoT (free-form reasoning), and (3) structured CoT (with explicitly structured reasoning steps). Prior work has mainly focused on the first two, while structured CoT remains underexplored and lacks domain expertise incorporation. Therefore, we evaluate all three prompting approaches across ten CFA-style financial domains and introduce FinCoT as the first structured finance-specific prompting approach incorporating blueprints from domain experts. FinCoT improves the accuracy of a general-purpose model, Qwen3-8B-Base, from 63.2% to 80.5%, and boosts Fin-R1 (7B), a finance-specific model, from 65.7% to 75.7%, while reducing output length by up to 8.9x and 1.16x compared to structured CoT methods, respectively. We find that FinCoT proves most effective for models lacking financial post-training. Our findings show that FinCoT does not only improve performance and reduce inference costs but also yields more interpretable and expert-aligned reasoning traces.
💡 Research Summary
The paper introduces FinCoT, a novel prompting framework that grounds chain‑of‑thought (CoT) reasoning in expert‑crafted financial workflows. The authors first categorize existing prompting strategies in financial NLP into three groups: (1) Standard Prompting (SP), which simply presents the question to a language model in a zero‑shot fashion; (2) Unstructured CoT (UST‑CoT), which adds a generic “think step‑by‑step” cue but leaves the reasoning completely free‑form; and (3) Structured CoT (ST‑CoT), which imposes explicit tags such as
FinCoT extends ST‑CoT by embedding domain‑specific expert blueprints directly into the prompt. These blueprints are expressed as Mermaid diagrams—a plain‑text graph description language that LLMs can read without additional parsing. Each blueprint encodes a canonical workflow for a particular CFA‑style domain (e.g., portfolio management, derivative pricing, macro‑economic analysis). The blueprint creation pipeline consists of five stages: (1) domain scoping and knowledge aggregation from textbooks, CFA curricula, and industry reports; (2) AI‑assisted retrieval and synthesis of raw material; (3) human‑expert validation to ensure conceptual correctness; (4) iterative refinement of the knowledge into step‑by‑step logical flows; and (5) translation of these flows into Mermaid syntax, which is then inserted as a “Hint” in the FinCoT prompt template. This human‑AI collaboration ensures that the blueprints are both comprehensive and accurate.
The FinCoT prompt has four components: a system message that frames the model as a CFA candidate, a
Experiments are conducted on two families of models. General‑purpose models include Qwen3‑8B‑Base and Qwen2.5‑7B (both pretrained) as well as their instruction‑tuned variants. Finance‑specific models include Fin‑R1 (7 B), DianJin‑R1 (7 B), and Fin‑o1‑8B (8 B). All models are evaluated in a zero‑shot setting across 1,032 multiple‑choice questions drawn from the CFA‑Easy subset of the Flare‑CFA benchmark, covering ten financial domains. Four prompting strategies (SP, UST‑CoT, ST‑CoT, FinCoT) are compared. Accuracy and average output length (tokens) are reported; statistical significance is assessed via paired bootstrap (10 k resamples).
Results show that FinCoT yields the largest gains for models that have not undergone finance‑specific post‑training. Qwen3‑8B‑Base improves from 63.2 % (SP) to 80.5 % (FinCoT), a +17.3 percentage‑point increase, while reducing token count by up to 8.9× compared with ST‑CoT. For the finance‑tuned Fin‑R1, accuracy rises from 65.7 % to 75.7 % (+10 pp) with a modest 1.16× token reduction. Gains are especially pronounced in quantitative domains such as derivative valuation and portfolio optimization. All improvements are statistically significant (p < 0.001). The authors also report that the semi‑reflection inside
The paper discusses limitations: (1) blueprint creation requires expert time, which may be costly for new domains; (2) certain niche domains (e.g., ethics, regulatory compliance) lack ready blueprints, limiting FinCoT’s applicability there; (3) not all LLMs may fully understand Mermaid syntax, potentially introducing noise. Future work is suggested on automated blueprint generation, extending FinCoT to multi‑turn dialogues, and scaling the approach to multilingual or cross‑cultural financial contexts.
In summary, FinCoT demonstrates that injecting expert‑level, structured domain knowledge into prompts can dramatically improve both performance and efficiency of LLMs on financial reasoning tasks. By marrying the interpretability benefits of structured CoT with the precision of human‑crafted financial workflows, FinCoT offers a practical pathway to more reliable, cost‑effective AI assistants for finance professionals.
Comments & Academic Discussion
Loading comments...
Leave a Comment