From Natural Language to Executable Option Strategies via Large Language Models

F R O M N A T U R A L L A N G UA G E T O E X E C U T A B L E O P T I O N S T R A T E G I E S V I A L A R G E L A N G U A G E M O D E L S Haochen Luo 1 , Zhengzhao Lai 2 , Junjie Xu 4 , Y ifan Li 1 , T ang Pok Hin 1 , Y uan Zhang 3 , Chen Liu 1 ∗ 1 City Univ ersity of Hong K ong 2 The Chinese Univ ersity of Hong K ong (Shenzhen) 3 Shanghai Univ ersity of Finance and Economics 4 Univ ersity of Science and T echnology of China A B S T R AC T Large Language Models (LLMs) e xcel at general code generation, yet translating natural-language trading intents into correct option strategies remains challenging. Real-world option design requires reasoning over massi ve, multi-dimensional option chain data with strict constraints, which often ov erwhelms direct generation methods. W e introduce the Option Query Language (OQL) , a domain-speciﬁc intermediate representation that abstracts option markets into high-lev el primitiv es under grammatical rules, enabling LLMs to function as reliable semantic parsers rather than free-form programmers. OQL queries are then v alidated and executed deterministically by an engine to instantiate executable strategies. W e also present a new dataset for this task and demonstrate that our neuro-symbolic pipeline signiﬁcantly improves execution accuracy and logical consistency over direct baselines. 1 I N T RO D U C T I O N In recent years, Lar ge Language Models (LLMs) hav e demonstrated remarkable capabilities in the ﬁnancial domain. While extensi ve research has focused on equity markets, including stock price prediction (Chen & Kaw ashima, 2024; K oa et al., 2024), alpha mining (T ang et al., 2025; W ang et al., 2025; Shi et al., 2025b; Luo et al., 2026), ﬁnancial news sentiment analysis and trading (Araci, 2019; Liu et al., 2025; Xiao et al., 2025), the application of LLMs to ﬁnancial derivatives , particularly options, remains lar gely une xplored. Options are fundamental ﬁnancial instruments, essential for both speculativ e le verage and sophisticated risk management (V ine, 2011). Existing studies on machine learning in this ﬁeld mainly focus on option pricing (Culkin & Das, 2017; De Spiegeleer et al., 2018; Iva s , cu, 2021; Zalani, 2025) and hedging (Ba ´ nka & Chudziak, 2025). By contrast, designing and ex ecuting option trading strategies has been considered the exclusi ve domain of quantitativ e analysts using rigid, programmatic trading systems. In this context, the customers provide the in vestment goals in Natural Language (NL) (e.g., "F ind a delta-neutral Ir on Condor on SPY with low implied volatility rank" ) and the quantitati ve analysts use their expertise to come up with trading strate gies. The rapid dev elopment of intelligent trading systems, especially with the adoption of LLMs, enables the translation of vague natural language intents into rigor ous and structured trading logic, which is a key step tow ard automated option trading. Howe ver , this translation remains challenging. First, traditional Natural Language Understanding models are too limited to interpret complex ﬁnancial concepts such as “Delta-neutral” or “V olatility Ske w . ” Second, although modern LLMs possess strong domain knowledge, directly applying them to raw option chain data is impractical. Option chains contain thousands of contracts across strikes and e xpiries, resulting in high-dimensional inputs that exceed LLM context limits or incur high computational costs. Moreov er , direct text-to-code generation often leads to hallucinations (Agarwal et al., 2024), such as producing in valid tickers or violating strict constraints, which is unacceptable in high-stakes ﬁnancial settings. T o bridge this gap, we propose the Option Query Language (OQL) , a domain-speciﬁc intermediate representation for reliable interaction between LLMs and option markets. Instead of requiring LLMs to process massiv e raw data or generate error-prone Python or SQL code, our framework treats ∗ Corresponding author: chen.liu@cityu.edu.hk 1 the LLM as a semantic parser that con verts natural language queries into concise, syntactically constrained OQL instructions. These instructions are then deterministically ex ecuted by a dedicated compiler ov er the option chain. This neuro-symbolic design reduces context explosion by abstracting market data into high-le vel primiti ves and ensures logical v alidity through grammatical constraints. T o the best of our knowledge, this w ork is the ﬁrst systematic ef fort to enable Natural Language → Option Strategy translation using LLMs. Our main contributions are as follo ws: (1) W e introduce OQL , a domain-speciﬁc query language that encodes complex derivati ves logic (e.g., Greeks, multi-leg structures, and e xpiry/strike relations) into a token-ef ﬁcient yet execution-rigorous format, allo wing LLMs to generate reliable option strategies. (2) W e present the ﬁrst benchmark for this task, including a dataset of 200 div erse option trading instructions and an ev aluation suite that enables fair comparison across LLMs and baseline methods. (3) W e extend the T ext-to-SQL paradigm to a ﬁnancial setting by designing a customized, SQL-like language for option strategy search, demonstrating that structured querying substantially improv es reliability and end-to-end performance. 2 R E L A T E D W O R K 2 . 1 L L M S I N F I N A N C E The application of Large Language Models (LLMs) in ﬁnance has ev olved rapidly from textual analysis to more agentic decision-making. Early works primarily relied on Pre-trained Language Models, such as FinBER T , for sentiment analysis on ﬁnancial texts (Liu et al., 2021). W ith the advent of generati ve models, research attention shifted to ward ﬁnance-speciﬁc foundation models, including BloombergGPT (W u et al., 2023) and FinGPT (Liu et al., 2023), enabling more complicated reasoning ov er ﬁnancial data. Building on these models, a growing number of work explores LLMs for market prediction and various trading tasks. One major application is price and return prediction by incorporating textual information such as ﬁnancial news (Chen & Kawashima, 2024; Guo & Hauptmann, 2024; W ang et al., 2024; K oa et al., 2024). Related studies further in vestigate the capability of LLMs in searching and discovering trading signals, particularly alpha f actors (Li et al., 2024; Shi et al., 2025b; Luo et al., 2025). More recently , several w orks hav e proposed autonomous trading agents that use LLMs as “traders” to rebalance portfolios based on market sentiment and reasoning (Y u et al., 2024; Y ANG et al., 2025; Y u et al., 2025; Li et al., 2025b). T o ev aluate these capabilities, a number of benchmarks and datasets hav e been introduced, covering general ﬁnancial knowledge (Xie et al., 2023; 2024; Nie et al., 2025), numerical reasoning (Chen et al., 2021), trading ability (Li et al., 2025b), signal mining (Anonymous, 2025), and more complex ﬁnancial tasks (Zhang et al., 2025). Howe ver , most existing methods and benchmarks focus on equities . In contrast, the LLM’ s capability in both strategy design and e valuation to deal with trading ﬁnancial derivatives , particularly options, remains largely undere xplored. 2 . 2 L L M S F O R T E X T - T O - Q U E RY G E N E R A T I O N T ranslating natural-language user intent into ex ecutable query logic is commonly formulated as a semantic parsing problem, with T ext-to-SQL as a representati ve task. Beneﬁting from strong code- generation capabilities, LLMs hav e achiev ed strong performance in this setting, as shown by recent benchmarks across div erse domains (Hong et al., 2025; Lee et al., 2022; Gao et al., 2023; Zhang et al., 2024). Beyond standard SQL, recent studies further extend LLM-based query generation to specialized tasks, including query optimization (T an et al., 2025) and edited or augmented SQL-like grammars for domain-speciﬁc logic such as multi-model query (Shi et al., 2025a), motiv ating our OQL framew ork for option strategy querying. 3 P R E L I M I N A R I E S 3 . 1 P R O P E RT I E S O F O P T I O N P RO D U C T An option 1 is a deriv ative contract that grants the holder the right, but not the obligation, to buy or sell an underlying asset at a predetermined price (the strike price ) on or before a speciﬁc date 1 W e consider American options in this work. The European options may only be ex ercised on expiry , our proposed method can deal with them in a similar way . 2 (the expir ation date ). These contracts are categorized as calls (the right to b uy) or puts (the right to sell)The market price of an option reﬂects the market’ s expectation of future volatility , the risk and the rew ard proﬁle associated with the underlying asset. T rading Action Space. In option trading, the fundamental action space includes buy or sell decisions for both call and put options with different strikes and maturities. A single trading strategy can be expressed as a combination of such elementary actions: a t = { ( d i , s i , K i , T i , q i , p i ) } N i =1 , (1) where d i ∈ { +1 , − 1 } denotes the trade direction ( +1 for b uying and − 1 for selling), s i ∈ { call , put } the option type, K i the strike price, T i the expiry time, q i the trading quantity and p i the option premium (i.e., the price paid or received when open position). The re ward or payoff at maturity depends on the realized price S T of the underlying asset. Option Pricing Model. The theoretical value of an option is commonly deriv ed from stochastic models of the underlying asset price. The classical Black–Sc holes–Merton (BSM) Black & Scholes (1973) model assumes that the price S t of a non-di vidend paying underlying asset at time t follows a geometric Bro wnian motion dS t = µS t dt + σ S t dW t ,where µ is the drift, σ is the volatility , and W t denotes a W iener process. Under risk-neutral valuation, the price of European call ( C ) and put ( P ) options are C = S t N ( d + ) − K e − r ( T − t ) N ( d − ) and P = K e − r ( T − t ) N ( − d − ) − S t N ( − d + ) respectiv ely , where V ∈ { C, P } denotes a generic option price and: d + = ln  S t K  + ( r + 1 2 σ 2 )( T − t ) σ √ T − t , d − = d + − σ √ T − t. (2) T able 1: Key option Greeks and their deﬁnitions Greek Deﬁnition Delta Sensitivity of option price to changes in the un- derlying price, ∆ = ∂ V ∂ S . Gamma Rate of change of Delta, Γ = ∂ 2 V ∂ S 2 . V ega Sensitivity to v olatility , ν = ∂ V ∂ σ . Theta Sensitivity to time decay , Θ = ∂ V ∂ t . Rho Sensitivity to the risk-free interest rate, ρ = ∂ V ∂ r . Here K is the strike, T is the expiry date, r is the risk-free rate and N ( · ) is the standard Gaussian cumulative dis- tribution function (CDF). The sensiti vi- ties of V to these underlying factors are captured by the Gr eeks in T able 1. Be- cause future realized v olatility is unob- servable, true V e ga cannot be measured directly . In practice, option pricing re- lies on implied v olatility (IV), inferred from market prices using pricing mod- els Hull & Basu (2016). This creates a gap between theory and real-world e xecution, complicating rob ust option strategy design. 3 . 2 O P T I O N S T R A T E G Y A N D P A Y O FF The terminal payoff of a single option ( d i , s i , K i , T i , q i , p i ) as deﬁned in (1) is calculated by the equations below where S T is the underlying price at maturity T . P i ( S T ) = ( max( S T − K i , 0) , s i = call , max( K i − S T , 0) , s i = put . Therefore, the total return of the strategy as deﬁned in (1) is the sum over legs subtracting the option premium: Π( S T ) = P N i =1 q i d i  P i ( S T ) − p i  . An option strate gy is a multi-leg position formed by combining sev eral call/put contracts on the same (or closely related) underlying assets, to achiev e a target trading intent such as hedging, directional exposure, v olatility trading, or income generation. Figure 1 summarizes our strate gy univ erse as a hierarchical taxonomy grouped by three high-le vel intents: Dir ectional , V olatility , and Income & Hedging Hull & Basu (2016). It also reﬂects structural complexity via the number of le gs, ranging from single-le g positions to common spreads and multi-leg structures (e.g., butterﬂies and condors). The full list is provided in Appendix A. 3 . 3 R E P R E S E N TA T I O N O F O P T I O N C H A I N In practical trading and modeling, option information is represented in a tabular option chain containing discrete contract-le vel attrib utes. At time t , the option chain C t represents the collection of 3 all tradable option contracts for a gi ven underlying asset and is deﬁned ov er strikes, maturities, and contract types: C t =  c t ( K, T , s )   K ∈ K , T ∈ T , s ∈ { call , put }  . Here, K denotes the strike price, T the time to expiry , and s the option type. Each option contract c t ( K, T , s ) is associated with a set of market-observed and model-deri ved attributes: c t ( K, T , s ) =  p t ( K, T , s ) , v t ( K, T , s ) , ∆ t ( K, T , s ) , Γ t ( K, T , s ) , ν t ( K, T , s ) , Θ t ( K, T , s )  , (3) Option Strategy Universe Directional Strategies Bullish Long Call Bull Spread Bearish Long Put Bear Spread V olatility Strategies Long V olatility (Breakout) Straddle Strangle Short Volatility (Range-bound) Iron Condor Butterﬂy Income & Hedging Yield Gen Covered Call Cash-Secured Put Protection Collar Protective Put Figure 1: T axonomy of option strategies in trading. where p t and v t denote the option premium and trading volume, respectiv ely , and (∆ t , Γ t , ν t , Θ t ) are the option Greeks de- ﬁned in T able 1, computed using the Black– Scholes–Merton (BSM) model. This rep- resentation provides a structured snapshot of the market state across strik e, maturity , and contract-type. Due to the combinato- rial expansion of the triplet ( K, T , s ) , the option chain at each time step is inherently high-dimensional. This scale posing a signiﬁcant challenge for direct processing by LLMs. 4 M E T H O D O L O G Y W e propose the Option Query Language (OQL) to bridge the gap between the complex, ﬂexible natural language intent and the ﬁnancial e xecution which requires precision. As illustrated in Figure 2, our system operates via a tw o-stage process: (1) Semantic Parsing , where an LLM translates a user’ s natural language intent x into a structured OQL query z ; and (2) Deterministic Execution , where a specialized engine validates and e xecutes z against massi ve market data to produce the ﬁnal trading strategy y . This explicit separation allows linguistic reasoning and ﬁnancial constraints to be handled independently , improving rob ustness, interpretability , and execution reliability for complex option strategies. 4 . 1 P R O B L E M F O R M U L A T I O N Let D represent the state space of the option market, containing real-time data for underlying assets, option chains, and deriv ed risk metrics (the "Greeks"). The goal of option strategy search is to map a natural language instruction x (e.g., "F ind a delta-neutral iron condor on NVD A..." ) to an ex ecutable subset of contracts y ⊂ D satisfying speciﬁc logical constraints. Directly modeling the probability P ( y | x, D ) using a monolithic LLM is intractable due to the high dimensionality of D and the necessity for logical precision. Consequently , we introduce OQL as a latent intermediate representation z and decompose the problem into: P ( y | x, D ) = X z P θ ( z | x ) · P ϕ ( y | z , D ) (4) Here, P θ ( z | x ) represents the semantic parser based on an LLM parameterized by θ , and P ϕ ( y | z , D ) is the deterministic compiler parameterized by ϕ . This decoupling conﬁnes the LLM’ s probabilistic nature to intent parsing, while ensuring the ﬁnancial ex ecution remains veriﬁable and logical. 4 . 2 O P T I O N Q U E RY L A N G UA G E ( O Q L ) OQL is a declarativ e domain-speciﬁc language designed to represent option strategies as structured symbolic queries. Rather than enumerating procedural steps, OQL speciﬁes structural and quantitati ve constraints ov er strategy components. 4 . 2 . 1 P R I N C I P L E 1 : R O L E - B A S E D A B S T R A C T I O N Each feasible option strategy satisfying the user’ s intent s ∈ S is associated with a ﬁxed role schema R ( s ) = { r 1 , r 2 , . . . , r k } , where each role r i corresponds to a semantically distinct leg in the strategy 4 (e.g., Short Call, Long Put for Risk Re versal). A valid strategy instance y must satisfy a one-to-one assignment between roles and option contracts: y = { ( r i , c i ) | r i ∈ R ( s ) , c i ∈ D } . This design enforces structural validity by construction and prev ents semantically in v alid combinations, such as assigning two calls to a straddle strategy . Moreov er , role-level abstraction enables ﬁne-grained constraint speciﬁcation on individual le gs. The details are shown in T able 5. 4 . 2 . 2 P R I N C I P L E 2 : S C O P E D F I LT E R I N G OQL distinguishes between constraints applied at different semantic scopes. Leg-level constraints are expressed in the WHERE clause and operate on indi vidual option contracts prior to strategy assembly . Formally , for each role r , a candidate set is deﬁned as: C r = { c ∈ D | ψ r ( c ) = true } , where ψ r denotes role-speciﬁc predicates (e.g., moneyness, delta, time-to-e xpiry). Strategy constraints are expressed in the HAVING clause and are applied after assembling candidate strategies. These constraints operate on aggregated properties: Ψ( y ) = true , where Ψ may in volve net Greeks, maximum loss, or rew ard-to-risk ratios. 4 . 2 . 3 P R I N C I P L E 3 : S E M A N T I C S O F T - M AT C H I N G Natural language intents often specify approximate numerical conditions. T o bridge linguistic ambiguity and strict database ﬁltering, OQL introduces an approximate matching operator ∼ . Giv en a numerical attribute a ( c ) and a target value τ , the condition a ( c ) ∼ τ is interpreted as: | a ( c ) − τ | ≤ ϵ · τ , where ϵ is a predeﬁned tolerance. This operator relaxes hard constraints and improves rob ustness by reducing empty-result failures during searching. OQL Pipeline Generated OQL: SELECT IRON_CONDOR FROM SPY WHERE SC.delta ~ 0.15 -- Sell Call Wing at ~15 Delta AND SP.delta ~ -0.15 -- Sell Put Wing at ~-15 Delta AND SC.dte ~ 45 -- Target ~45 Days to Expiry AND SP.dte ~ 45 AND LC.Moneyness = OTM -- Buy Protection (Call side) AND LP.Moneyness = OTM -- Buy Protection (Put side) AND abs(NET_DELTA) < 0.1 -- Risk: Delta Neutral ORDER BY return_on_risk DESC -- Goal: Optimize Efficiency LIMIT 1 Intent: Construct a delta-neutral income strategy on SPY for the next monthly cycle (~45 days). I believe the current high volatility (IV Rank > 50) is overstated and price will stay range-bound. I want to collect premium by selling the 15 Delta wings on both sides, while buying further OTM legs to deﬁne my maximum risk. Optimize the structure for Return on Risk. Human T rader Intelligent Agent a. Trading intent from user directly b. Intent from another intelligent agent Market information Thinking Steps: # Step 1 : Identify "SPY" as the target asset and apply a high-volatility ﬁlter based on the "IV Rank > 50" condition. # Step 2 : Infer the "Iron Condor" strategy type by combining the user's intent to "collect premium" with the requirement for "deﬁned risk". # Step 3 : Map the "next monthly cycle" temporal constraint to a 45-day expiry target and "selling 15 Delta wings" to the short leg selection criteria. # Step 4 : Translate the "delta-neutral" requirement into a strict NetDelta validation rule and set the optimization objective to "Return on Risk". Agent proposed trading agent Stage 1: Prepare intent for option strategy Stage 2: Generate OQL OQL Agent Knowledge of OQL Defination Design Grammar OQL Engine Stage 3: Execute OQL OQL Query Strategies ... OQL Engine Option Chain Data Input Query (z) Call Leg 1 Delta = 0.52 Call Leg 2 Delta = 0.35 Put Leg 1 Delta = -0.25 Put Leg 2 Delta = -0.55 ...... ...... Leg Assembly Strategy Filter Straddle 1 rr_ratio=0.5 Straddle 2 rr_ratio=1.5 Straddle 3 rr_ratio=2.5 Straddle 3 rr_ratio=2.5 ....... Strategy Candidates Figure 2: This ﬁgure illustrates the complete workﬂo w of the Options Query Language (OQL) system, from intent to ex ecutable option strategies. T op: the OQL pipeline collects trading intent from human users or intelligent agents, translates high-level intent into formal OQL queries, and ex ecutes them to retriev e candidate strategies. Bottom: the deterministic OQL compiler P ϕ processes each query through parsing and semantic validation, vectorized ﬁltering over option-chain data, and combinatorial leg assembly with aggreg ate constraints, producing a ranked set of v alid option strategies. The full OQL grammar, formal deﬁnitions and back end are provided in Appendix B. 5 4 . 3 N E U RO - S Y M B O L I C E X E C U T I O N F L O W As shown in Figure 2, gi ven an OQL query z , the execution engine ev aluates and executes P ϕ ( y | z , D ) through a fully deterministic pipeline. Conceptually , this process can be vie wed as constraint parsing and resolution, followed by backend-ex ecutable query construction, where the high-lev el OQL speciﬁcation is translated into concrete operations over option-chain data. Speciﬁcally , (1) the query is ﬁrst parsed into an abstract syntax tree (AST), and structural constraints induced by the role schema R ( s ) are veriﬁed to ensure semantic consistenc y; (2) leg-le vel predicates are applied to D to obtain role-speciﬁc candidate sets {C r } , implemented via vectorized ﬁltering over the option chain; (3) candidate strategies are constructed through Cartesian products over {C r } , followed by strate gy-lev el constraint ev aluation, where only strategies satisfying all HAVING predicates are retained; and (4) the parsed constraints are translated into e xecutable backend queries and ex ecuted to produce the ﬁnal strategy set. Implementation details are provided in Appendix B. 5 E X P E R I M E N T S In this section, we conduct e xtensiv e experiments to address the follo wing three research questions. RQ1 : Can LLMs, under the OQL-based interaction paradigm, successfully search for executable option strate gies (i.e., generate v alid OQL queries that return non-empty and correct strategies)? RQ2 : How capable are different Large Language Models (LLMs) at generating OQL queries, and how do they compare in terms of validity , accuracy , and generation quality? RQ3 : Do option strategies deri ved from OQL queries pro vide practical v alue, including outperforming baselines that gi ve LLMs raw option data directly , and how do these searched strate gies perform in backtests? 5 . 1 D A TA S E T C O N S T R U C T I O N . T o the best of our kno wledge, natural-language-to-option-strategy retrie val is a relati vely new task and lacks a standardized benchmark. W e therefore introduce a ne w dataset to support this problem setting. T o mitigate look-ahead bias (i.e., the “time-trav el” effect where an LLM may e xploit latent knowledge from pre-training) (Golchin & Surdeanu, 2024; Li et al., 2025a), we strictly restrict all market observations to 2025. W e select a di verse set of underlying assets: SPY , NVDA, AAPL, GOOG, and TSLA, which exhibit distinct trend patterns and volatility regimes ov er the year , enabling cov erage of heterogeneous market conditions. W e further partition each underlying’ s 2025 price trajectory into labeled regions characterized by different movement styles (e.g., trending, reversing, range-bound, high-volatility). Both region labeling and strategy-type annotation are curated by human domain experts, who assign the most suitable strategy family for each region to ensure domain correctness. At the start of each labeled region, we write a natural-language trading intent that describes the market context and trading objectiv e, while simulating different trader proﬁciency lev els to reﬂect realistic query styles. W e introduce the details and provide e xamples in Appendix C. 5 . 2 E V A L U A T I O N M E T R I C S W e ev aluate OQL from two complementary perspectiv es: query-level performance and strate gy- level performance . Query-lev el metrics measure an LLM’ s ability to generate valid, accurate, and semantically faithful OQL queries from natural language intent, while strategy-lev el metrics assess the quality of the retrieved option strategies and whether their backtesting outcomes align with the intended trading objectives. Detailed deﬁnitions and computation protocols are provided in Appendix D.1. Query Quality : T o address RQ1 and RQ2 , we adopt a hierarchical ev aluation of query quality that captures progressively stronger notions of correctness. At the most basic level, V alidity Rate (VR) measures whether generated OQL queries are syntactically well-formed and executable, i.e., they can be successfully parsed and return at least one candidate strate gy . Beyond syntactic v alidity , Strate gy Match (SM) ev aluates whether the query selects the correct option strategy family (e.g., spreads, condors) consistent with the user’ s stated intent. Finally , Semantic Accuracy (SA) assesses whether the query constraints faithfully encode the k ey conditions e xpressed in the intent—such as 6 strikes, days-to-expiration, or Greek exposure—without omitting critical requirements or introducing unintended ones. Strategy Quality : For strate gy-lev el ev aluation, we backtest the option strategies returned by each ex ecutable query and assess their performance from proﬁtability and risk perspectiv es. Strategy effecti veness is measured by the W in Rate (WR) , deﬁned as the proportion of strategies that achiev e positi ve end-of-period proﬁt and loss (PnL). Risk exposure is captured by whether a strategy triggers a margin call during the backtest, reﬂecting its vulnerability to extreme downside scenarios. Proﬁtability is further quantiﬁed using both the A verag e Pr oﬁt , deﬁned as the mean terminal PnL across strategies, and the Return on Cost (R OC) , computed as R OC = PnL end / | cost 0 | , which normalizes returns by initial capital commitment. 5 . 3 E X P E R I M E N T S S E T T I N G S . Baseline settings: W e design three baselines: Free-F orm Leg Generation (FFLG) , Partial-Chain Grounded (PCG) , and T ext-to-SQL . FFLG directly prompts the LLM to generate option legs (e.g., expiry , strike, call/put, long/short, and position size) purely from natural-language intent, without access to option-chain e vidence. PCG instead provides a partial option chain as structured context, grounding the generated legs in observed market data. The T ext-to-SQL baseline translates user intent into SQL queries over predeﬁned option-chain tables or views, relying on ﬁxed schemas and handcrafted aggregations rather than compositional strategy reasoning. Detailed designs and implementation details of all baselines are provided in Appendix D.2. Model settings: W e e valuate a range of lar ge language models, including commercial models such as Gemini-2.5-Flash, DeepSeek-V3 (DeepSeek-AI et al., 2025), GPT -4.1, and GPT -4.1-Mini (OpenAI et al., 2024). W e also include smaller open-weight models, such as LLaMA-3.1-8B and Qwen-3 (8B and 4B) (Y ang et al., 2025), as well as coder -focused variants from DeepSeek and Qwen. All models use default temperature settings, and we additionally test our method under Chain-of-Thought prompting (W ei et al., 2022). 5 . 4 R E S U LT S A N D A NA LY S I S For OQL capability and model specialization, T ables 2 and 3 demonstrate that OQL effecti vely bridges natural language intent with ex ecutable ﬁnancial logic. All large models achie ved V alidity Rates (VR) exceeding 0.870 , conﬁrming the frame work’ s robustness. A key ﬁnding is the ef ﬁciency of specialized coding models: notably , the smaller DeepSeek-Coder-6.7B outperforms the larger GPT -4.1-Mini in both proﬁtability and win rate, suggesting that domain-speciﬁc syntax reasoning is more critical than pure parameter size for this task. Furthermore, we observe a trade-off between aggression and stability: while Gemini-2.5-Flash maximizes total Proﬁt and R OC, DeepSeek-Chat offers the most risk-a verse proﬁle with the highest W in Rate and lowest tail risk. T able 2: Comparison of models on query qual- ity . Arrows indicate optimization direction. Size Model VR ↑ SM ↑ SA ↑ Large DeepSeek-V3 0.870 0.822 0.664 Gemini-2.5 Flash 0.875 0.743 0.606 GPT -4.1 0.935 0.770 0.698 GPT -4.1-Mini 0.950 0.721 0.605 Small LLaMA-3.1-8B 0.920 0.582 0.432 Qwen2.5-Coder-7B 0.760 0.763 0.553 DeepSeek-Coder-6.7B 0.660 0.659 0.545 Qwen3-4B 0.715 0.671 0.476 Qwen3-8B 0.780 0.808 0.593 T able 4 highlights that OQL consistently outperforms unstructured baselines (FFLG, PCG) and standard T ext-to-SQL approaches. The primary adv antage of OQL lies in risk management and reliability . By en- forcing a structured intermediate representation, OQL signiﬁcantly reduces dangerous hallucinations com- mon in ra w SQL generation. For instance, DeepSeek- Chat using OQL reduces the b uyer-side Risk@90 to 18.6% (compared to 46.1% with SQL) while achiev- ing the highest ov erall W in Rate ( 60.9% ). This con- ﬁrms that OQL ’ s constrained search space allows models to reason more effecti vely about ﬁnancial constraints, producing consistent alpha rather than the high-variance, high-risk outliers observ ed in PCG methods. Ef ﬁciency analysis in T able 9 reveals that OQL strikes an optimal balance between token consumption and retrie val validity . Unlike the PCG approach, which incurs prohibitive token costs for low retrie val yields, OQL maintains moderate tok en usage while achie ving a dominant cache hit rate of 88.5% . This makes it the most cost-effecti ve framework for high-ﬁdelity strategy retriev al. The asset- 7 lev el analysis in Appendix E reveals that OQL enables models to adapt their behavior to different market conditions rather than producing uniform or rigid strate gies. Some models exhibit stronger performance on volatile, trend-driven assets, while others show greater stability on index-like or mature underlyings. Importantly , OQL exposes these dif ferences without amplifying failure modes, indicating that the framew ork effecti vely translates each model’ s latent ﬁnancial reasoning into ex ecutable strategies instead of constraining them to superﬁcial syntactic patterns. Case studies in Appendix F illustrate OQL ’ s proﬁciency in mapping high-le vel user intent to struc- turally appropriate option strategies. In hedging scenarios, OQL consistently generates accurate in verse e xposures, providing rob ust protection during market stress. For income-oriented objectiv es, the generated spread strategies exhibit stable tracking and enhanced yields. OQL effecti vely aligns semantic intent with market structure and option-chain constraints. T able 3: Downstream strategy performance rearranged by metrics. W e compare the average o ver all ex ecuted strategies ( All ) and the best strategy per case ( T op ) for each metric. WR ↑ RE@50 ↓ RE@90 ↓ Proﬁt ↑ ROC ↑ Size Model All T op All T op All T op All T op All T op Large DeepSeek-V3 0.580 0.609 0.314 0.316 0.180 0.195 368.143 359.776 0.358 0.418 Gemini-2.5-Flash 0.558 0.552 0.361 0.374 0.191 0.224 418.192 331.914 0.270 0.729 GPT -4.1 0.486 0.475 0.469 0.475 0.303 0.339 272.727 261.203 0.051 0.124 GPT -4.1-Mini 0.476 0.489 0.457 0.452 0.306 0.306 211.039 172.547 0.264 0.554 Small LLaMA-3.1-8B 0.420 0.416 0.359 0.371 0.190 0.208 12.430 2.734 -0.018 0.118 Qwen2.5-Coder-7B 0.476 0.464 0.361 0.371 0.208 0.219 136.469 151.447 0.178 0.122 DeepSeek-Coder-6.7B 0.503 0.504 0.401 0.405 0.239 0.275 305.907 301.386 0.217 0.257 Qwen3-4B 0.410 0.406 0.400 0.420 0.179 0.217 118.627 140.538 -0.004 -0.000 Qwen3-8B 0.483 0.546 0.468 0.421 0.284 0.270 146.626 174.551 0.197 0.403 T able 4: Performance comparison of different strategy generation methods across base LLMs. Metrics are reported in percentage (%) where applicable. Best results per base model are bolded. Base Model Method Win Rate (%) Risk@90 (%) Proﬁtability Overall Buyer Seller Buyer Seller Wgt. RoC Proﬁt DeepSeek-V3 FFLG 54.4 45.0 77.6 24.2 24.5 24.3 0.279 76.9 PCG 44.4 42.3 52.5 26.8 25.0 26.5 0.064 92.2 PCG-Full 48.9 43.5 67.4 24.5 23.3 24.2 0.258 117.8 SQL 52.8 43.4 61.2 46.1 5.9 24.8 1.364 195.2 OQL (Ours) 60.9 58.5 66.1 18.6 21.4 19.5 0.418 359.8 OQL-CoT (Ours) 60.2 51.3 76.2 26.5 20.6 24.4 0.282 343.5 Gemini-2.5-Flash FFLG 57.8 45.5 83.1 25.6 22.0 24.4 0.308 109.5 PCG 61.5 49.6 79.7 28.1 25.3 27.0 0.474 189.5 PCG-Full 59.8 43.8 88.7 24.2 23.9 24.1 1.752 161.9 SQL 52.0 43.6 67.7 36.8 3.2 25.1 -0.024 276.2 OQL (Ours) 55.2 51.3 81.8 25.7 0.0 22.4 0.729 331.9 OQL-CoT (Ours) 62.6 57.6 78.6 21.2 9.5 18.4 0.822 449.2 6 C O N C L U S I O N In conclusion, this work introduces a neuro-symbolic pipeline for option strategy search that translates natural-language trading intents into ex ecutable and veriﬁable strategies through an intermediate language, OQL. By decoupling semantic parsing (LLM → OQL) from deterministic execution (OQL engine → strategy set), our approach improves reliability when handling comple x deri vati ves logic and enables large-scale e valuation across both query quality and strategy quality . W e further present, to our knowledge, the ﬁrst study adapting the T ext-to-SQL paradigm to a customized ﬁnancial domain, where the “database” corresponds to a massive option-chain space and the “query result” is a structured strategy set. Our empirical results show that semantic accuracy is more predictiv e of downstream strategy performance than generation success alone, highlighting the importance of faithful constraint grounding for practical option strategy search. Future work will e xtend OQL beyond predeﬁned strategy templates tow ard free-form leg design, enabling more ﬂexible and expressi ve option constructions. W e also plan to support strategy queries conditioned on existing portfolio holdings, allowing the system to adaptively generate strategies based on a user’ s current positions. In addition, we will expand the strategy taxonomy and OQL operators, and incorporate more realistic trading frictions and risk controls to further improv e practical applicability . 8 R E F E R E N C E S V ibhor Agarwal, Y ulong Pei, Salw a Alamir , and Xiaomo Liu. Codemirage: Hallucinations in code generated by large language models. arXiv pr eprint arXiv:2408.08333 , 2024. Anonymous. Alphabench: Benchmarking large language models in formulaic alpha factor mining. In Submitted to The F ourteenth International Confer ence on Learning Repr esentations , 2025. URL https://openreview.net/forum?id=d97Q8r7ZKZ . under review . Dogu Araci. Finbert: Financial sentiment analysis with pre-trained language models. arXiv pr eprint arXiv:1908.10063 , 2019. Feliks Ba ´ nka and Jarosław A Chudziak. Deltahedge: A multi-agent framework for portfolio options optimization. arXiv preprint , 2025. Fischer Black and Myron Scholes. The pricing of options and corporate liabilities. Journal of political economy , 81(3):637–654, 1973. Qizhao Chen and Hiroaki Kawashima. Stock price prediction using llm-based sentiment analysis. In 2024 IEEE International Confer ence on Big Data (BigData) , pp. 4846–4853. IEEE, 2024. Zhiyu Chen, W enhu Chen, Charese Smiley , Sameena Shah, Iana Borov a, Dylan Langdon, Reema Moussa, Matt Beane, T ing-Hao Huang, Bryan R Routledge, et al. Finqa: A dataset of numerical reasoning ov er ﬁnancial data. In Pr oceedings of the 2021 Confer ence on Empirical Methods in Natural Languag e Pr ocessing , pp. 3697–3711, 2021. Robert Culkin and Sanji v R Das. Machine learning in ﬁnance: the case of deep learning for option pricing. Journal of In vestment Management , 15(4):92–100, 2017. Jan De Spiegeleer , Dilip B Madan, Soﬁe Reyners, and W im Schoutens. Machine learning for quantitativ e ﬁnance: fast deriv ative pricing, hedging and ﬁtting. Quantitative F inance , 18(10): 1635–1643, 2018. DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan W ang, Bochao W u, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Y ang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng W ang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, Jianzhong Guo, Jiaqi Ni, Jiashi Li, Jiawei W ang, Jin Chen, Jingchang Chen, Jingyang Y uan, Junjie Qiu, Junlong Li, Junxiao Song, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, K exin Huang, K uai Y u, Lean W ang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Litong W ang, Liyue Zhang, Meng Li, Miaojun W ang, Mingchuan Zhang, Minghua Zhang, Minghui T ang, Mingming Li, Ning Tian, Panpan Huang, Peiyi W ang, Peng Zhang, Qiancheng W ang, Qihao Zhu, Qin yu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji W ang, Runxin Xu, Ruoyu Zhang, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing W u, Shengfeng Y e, Shengfeng Y e, Shirong Ma, Shiyu W ang, Shuang Zhou, Shuiping Y u, Shunfeng Zhou, Shuting Pan, T . W ang, T ao Y un, T ian Pei, Tian yu Sun, W . L. Xiao, W angding Zeng, W anjia Zhao, W ei An, W en Liu, W enfeng Liang, W enjun Gao, W enqin Y u, W entao Zhang, X. Q. Li, Xiangyue Jin, Xianzu W ang, Xiao Bi, Xiaodong Liu, Xiaohan W ang, Xiaojin Shen, Xiaokang Chen, Xiaokang Zhang, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun, Xiaoxiang W ang, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xingkai Y u, Xinnan Song, Xinxia Shan, Xinyi Zhou, Xinyu Y ang, Xinyuan Li, Xuecheng Su, Xuheng Lin, Y . K. Li, Y . Q. W ang, Y . X. W ei, Y . X. Zhu, Y ang Zhang, Y anhong Xu, Y anhong Xu, Y anping Huang, Y ao Li, Y ao Zhao, Y aofeng Sun, Y aohui Li, Y aohui W ang, Y i Y u, Y i Zheng, Y ichao Zhang, Y ifan Shi, Y iliang Xiong, Y ing He, Y ing T ang, Y ishi Piao, Y isong W ang, Y ixuan T an, Y iyang Ma, Y iyuan Liu, Y ongqiang Guo, Y u W u, Y uan Ou, Y uchen Zhu, Y uduan W ang, Y ue Gong, Y uheng Zou, Y ujia He, Y ukun Zha, Y unfan Xiong, Y unxian Ma, Y uting Y an, Y uxiang Luo, Y uxiang Y ou, Y uxuan Liu, Y uyang Zhou, Z. F . W u, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhen Huang, Zhen Zhang, Zhenda Xie, Zhengyan Zhang, Zhe wen Hao, Zhibin Gou, Zhicheng Ma, Zhigang Y an, Zhihong Shao, Zhipeng Xu, Zhiyu Wu, Zhongyu Zhang, Zhuoshu Li, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Ziyi Gao, and Zizheng Pan. Deepseek-v3 technical report, 2025. URL . 9 Dawei Gao, Haibin W ang, Y aliang Li, Xiuyu Sun, Y ichen Qian, Bolin Ding, and Jingren Zhou. T ext-to-sql empowered by large language models: A benchmark evaluation. arXiv pr eprint arXiv:2308.15363 , 2023. Shahriar Golchin and Mihai Surdeanu. Time tra vel in LLMs: Tracing data contamination in large language models. In The T welfth International Confer ence on Learning Repr esentations , 2024. URL https://openreview.net/forum?id=2Rwq6c3tvr . T ian Guo and Emmanuel Hauptmann. Fine-tuning large language models for stock return pre- diction using newsﬂo w . In Franck Dernoncourt, Daniel Preo¸ tiuc-Pietro, and Anastasia Shi- morina (eds.), Pr oceedings of the 2024 Confer ence on Empirical Methods in Natural Lan- guage Processing: Industry T rac k , pp. 1028–1045, Miami, Florida, US, November 2024. As- sociation for Computational Linguistics. doi: 10.18653/v1/2024.emnlp- industry .77. URL https://aclanthology.org/2024.emnlp- industry.77/ . Zijin Hong, Zheng Y uan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, and Xiao Huang. Next-generation database interfaces: A surve y of llm-based text-to-sql. IEEE T ransactions on Knowledge and Data Engineering , 2025. John C Hull and Sankarshan Basu. Options, futures, and other derivatives . Pearson Education India, 2016. Codru t , -Florin Iva s , cu. Option pricing using machine learning. Expert Systems with Applications , 163: 113799, 2021. Kelvin JL K oa, Y unshan Ma, Ritchie Ng, and T at-Seng Chua. Learning to generate explainable stock predictions using self-reﬂective large language models. In Pr oceedings of the ACM W eb Confer ence 2024 , pp. 4304–4315, 2024. Gyubok Lee, Hyeonji Hwang, Seongsu Bae, Y eonsu Kwon, W oncheol Shin, Seongjun Y ang, Minjoon Seo, Jong-Y eup Kim, and Edward Choi. Ehrsql: A practical text-to-sql benchmark for electronic health records. Advances in Neural Information Pr ocessing Systems , 35:15589–15601, 2022. Changlun Li, Y ao SHI, Chen W ang, Qiqi Duan, Runke R U AN, W eijie Huang, Haonan Long, Lijun Huang, Nan T ang, and Y uyu Luo. T ime trav el is cheating: Going live with deepfund for real-time fund in vestment benchmarking. In The Thirty-ninth Annual Confer ence on Neural Information Pr ocessing Systems Datasets and Benchmarks T rac k , 2025a. URL https://openreview.net/ forum?id=SXADEhZ0sl . Haohang Li, Y upeng Cao, Y angyang Y u, Shashidhar Reddy Javaji, Zhiyang Deng, Y ueru He, Y uechen Jiang, Zining Zhu, Kp Subbalakshmi, Jimin Huang, et al. In vestorbench: A benchmark for ﬁnancial decision-making tasks with llm-based agent. In Pr oceedings of the 63r d Annual Meeting of the Association for Computational Linguistics (V olume 1: Long P apers) , pp. 2509–2525, 2025b. Zhiwei Li, Ran Song, Caihong Sun, W ei Xu, Zhengtao Y u, and Ji-Rong W en. Can large language models mine interpretable ﬁnancial factors more effecti vely? a neural-symbolic f actor mining agent model. In Lun-W ei Ku, Andre Martins, and V iv ek Srikumar (eds.), F indings of the Association for Computational Linguistics: ACL 2024 , pp. 3891–3902, Bangkok, Thailand, August 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.ﬁndings- acl.233. URL https: //aclanthology.org/2024.findings- acl.233/ . Xiao-Y ang Liu, Guoxuan W ang, Hongyang Y ang, and Daochen Zha. Fingpt: Democratizing internet- scale data for ﬁnancial large language models. arXiv pr eprint arXiv:2307.10485 , 2023. Y iwei Liu, Junbo W ang, Lei Long, Xin Li, Ruiting Ma, Y uankai W u, and Xuebin Chen. A multi-le vel sentiment analysis framew ork for ﬁnancial texts. arXiv preprint , 2025. Zhuang Liu, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. Finbert: A pre-trained ﬁnancial language representation model for ﬁnancial text mining. In Pr oceedings of the twenty-ninth international conference on international joint conferences on artiﬁcial intelligence , pp. 4513– 4519, 2021. Haochen Luo, Y uan Zhang, and Chen Liu. Efs: Evolutionary f actor searching for sparse portfolio optimization using large language models. arXiv pr eprint arXiv:2507.17211 , 2025. 10 Haochen Luo, Ho T in Ko, Da vid Sun, Y uan Zhang, and Chen Liu. Evoalpha: Evolutionary alpha factor discov ery with large language models. In NeurIPS 2025 W orkshop: Generative AI in F inance , 2026. URL https://openreview.net/forum?id=ALpLmURYWy . Y ing Nie, Binwei Y an, Tian yu Guo, Hao Liu, Haoyu W ang, W ei He, Binfan Zheng, W eihao W ang, Qiang Li, W eijian Sun, et al. Cﬁnbench: A comprehensive chinese ﬁnancial benchmark for lar ge language models. In Pr oceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Languag e T echnologies (V olume 1: Long P apers) , pp. 876–891, 2025. OpenAI, Josh Achiam, Ste ven Adler , Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red A vila, Igor Babuschkin, Suchir Balaji, V alerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bav arian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenn y Bogdonoff, Ole g Boiko, Madelaine Boyd, Anna-Luisa Brakman, Gre g Brockman, T im Brooks, Miles Brundage, K evin Button, T rev or Cai, Rosie Campbell, Andre w Cann, Brittany Carey , Chelsea Carlson, Rory Carmichael, Brooke Chan, Che Chang, Fotis Chantzis, Derek Chen, Sully Chen, Ruby Chen, Jason Chen, Mark Chen, Ben Chess, Chester Cho, Case y Chu, Hyung W on Chung, Dav e Cummings, Jeremiah Currier , Y unxing Dai, Cory Decareaux, Thomas Degry , Noah Deutsch, Damien De ville, Arka Dhar , David Dohan, Ste ve Dowling, Sheila Dunning, Adrien Ecof fet, Atty Eleti, T yna Eloundou, David Farhi, Liam Fedus, Niko Felix, Simón Posada Fishman, Juston Forte, Isabella Fulford, Leo Gao, Elie Georges, Christian Gibson, V ik Goel, T arun Gogineni, Gabriel Goh, Rapha Gontijo-Lopes, Jonathan Gordon, Morgan Grafstein, Scott Gray , Ryan Greene, Joshua Gross, Shixiang Shane Gu, Y ufei Guo, Chris Hallacy , Jesse Han, Jeff Harris, Y uchen He, Mike Heaton, Johannes Heidecke, Chris Hesse, Alan Hick ey , W ade Hickey , Peter Hoeschele, Brandon Houghton, K enny Hsu, Shengli Hu, Xin Hu, Joost Huizinga, Shantanu Jain, Sha wn Jain, Joanne Jang, Angela Jiang, Roger Jiang, Haozhun Jin, Denny Jin, Shino Jomoto, Billie Jonn, Heew oo Jun, T omer Kaftan, Łukasz Kaiser , Ali Kamali, Ingmar Kanitscheider, Nitish Shirish Keskar , T abarak Khan, Logan Kilpatrick, Jong W ook Kim, Christina Kim, Y ongjik Kim, Jan Hendrik Kirchner , Jamie Kiros, Matt Knight, Daniel Kokotajlo, Łukasz K ondraciuk, Andrew K ondrich, Aris K onstantinidis, Kyle K osic, Gretchen Krueger , V ishal Kuo, Michael Lampe, Ikai Lan, T eddy Lee, Jan Leike, Jade Leung, Daniel Levy , Chak Ming Li, Rachel Lim, Molly Lin, Stephanie Lin, Mateusz Litwin, Theresa Lopez, Ryan Lo we, Patricia Lue, Anna Makanju, Kim Malf acini, Sam Manning, T odor Markov , Y aniv Markovski, Bianca Martin, Katie Mayer , Andrew Mayne, Bob McGre w , Scott Mayer McKinney , Christine McLeave y , Paul McMillan, Jake McNeil, Da vid Medina, Aalok Mehta, Jacob Menick, Luke Metz, Andre y Mishchenko, P amela Mishkin, V innie Monaco, Evan Morikaw a, Daniel Mossing, T ong Mu, Mira Murati, Oleg Murk, David Mély , Ashvin Nair , Reiichiro Nakano, Rajeev Nayak, Arvind Neelakantan, Richard Ngo, Hyeonwoo Noh, Long Ouyang, Cullen O’K eefe, Jakub Pachocki, Ale x Paino, Joe Palermo, Ashle y Pantuliano, Giambattista P arascandolo, Joel Parish, Emy Parparita, Alex Passos, Mikhail Pavlo v , Andrew Peng, Adam Perelman, Filipe de A vila Belbute Peres, Michael Petrov , Henrique Ponde de Oliveira Pinto, Michael, Pokorny , Michelle Pokrass, V itchyr H. Pong, T olly Powell, Alethea Po wer , Boris Power , Elizabeth Proehl, Raul Puri, Alec Radford, Jack Rae, Aditya Ramesh, Cameron Raymond, Francis Real, Kendra Rimbach, Carl Ross, Bob Rotsted, Henri Roussez, Nick Ryder , Mario Saltarelli, T ed Sanders, Shibani Santurkar, Girish Sastry , Heather Schmidt, David Schnurr , John Schulman, Daniel Selsam, Kyla Sheppard, T oki Sherbakov , Jessica Shieh, Sarah Shoker , Pranav Shyam, Szymon Sidor , Eric Sigler , Maddie Simens, Jordan Sitkin, Katarina Slama, Ian Sohl, Benjamin Sokolowsk y , Y ang Song, Natalie Staudacher , Felipe Petroski Such, Natalie Summers, Ilya Sutskev er , Jie T ang, Nikolas T ezak, Madeleine B. Thompson, Phil Tillet, Amin T ootoonchian, Elizabeth Tseng, Preston T uggle, Nick T urley , Jerry T worek, Juan Felipe Cerón Uribe, Andrea V allone, Arun V ijayvergiya, Chelsea V oss, Carroll W ainwright, Justin Jay W ang, Alvin W ang, Ben W ang, Jonathan W ard, Jason W ei, CJ W einmann, Akila W elihinda, Peter W elinder , Jiayi W eng, Lilian W eng, Matt W iethoff, Dav e W illner , Clemens Winter , Samuel W olrich, Hannah W ong, Lauren W orkman, Sherwin W u, Jef f W u, Michael W u, Kai Xiao, T ao Xu, Sarah Y oo, Ke vin Y u, Qiming Y uan, W ojciech Zaremba, Rowan Zellers, Chong Zhang, Marvin Zhang, Shengjia Zhao, T ianhao Zheng, Juntang Zhuang, W illiam Zhuk, and Barret Zoph. Gpt-4 technical report, 2024. URL https://arxiv.org/abs/2303.08774 . Gengyuan Shi, Chaokun W ang, Liu Y abin, and Jiawei Ren. Adaptive and robust translation from natural language to multi-model query languages. In W anxiang Che, Joyce Nabende, Ekaterina 11 Shutov a, and Mohammad T aher Pilehvar (eds.), Pr oceedings of the 63r d Annual Meeting of the Association for Computational Linguistics (V olume 1: Long P apers) , pp. 15950–15965, V ienna, Austria, July 2025a. Association for Computational Linguistics. ISBN 979-8-89176-251-0. doi: 10.18653/v1/2025.acl- long.776. URL https://aclanthology.org/2025.acl- long.776/ . Y u Shi, Y itong Duan, and Jian Li. Navigating the alpha jungle: An llm-powered mcts framew ork for formulaic factor mining. arXiv preprint , 2025b. Jie T an, Kangfei Zhao, Rui Li, Jeffrey Xu Y u, Chengzhi Piao, Hong Cheng, Helen Meng, Deli Zhao, and Y u Rong. Can lar ge language models be query optimizer for relational databases? Pr oceedings of the A CM on Management of Data , 3(6):1–28, 2025. Ziyi T ang, Zechuan Chen, Jiarui Y ang, Jiayao Mai, Y ongsen Zheng, K eze W ang, Jinrui Chen, and Liang Lin. Alphaagent: Llm-driven alpha mining with regularized e xploration to counteract alpha decay . In Proceedings of the 31st A CM SIGKDD Confer ence on Knowledge Discovery and Data Mining V . 2 , pp. 2813–2822, 2025. Simon V ine. Options: trading strate gy and risk management , v olume 288. John W iley & Sons, 2011. Meiyun W ang, Kiyoshi Izumi, and Hiroki Sakaji. LLMFactor: Extracting proﬁtable factors through prompts for explainable stock movement prediction. In Lun-W ei Ku, Andre Martins, and V i vek Srikumar (eds.), F indings of the Association for Computational Linguistics: A CL 2024 , pp. 3120– 3131, Bangk ok, Thailand, August 2024. Association for Computational Linguistics. doi: 10.18653/ v1/2024.ﬁndings- acl.185. URL https://aclanthology.org/2024.findings- acl.185/ . Saizhuo W ang, Hang Y uan, Leon Zhou, Lionel Ni, Heung Y eung Shum, and Jian Guo. Alpha- gpt: Human-ai interactive alpha mining for quantitativ e in vestment. In Pr oceedings of the 2025 Confer ence on Empirical Methods in Natural Langua ge Pr ocessing: System Demonstrations , pp. 196–206, 2025. Jason W ei, Xuezhi W ang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denn y Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information pr ocessing systems , 35:24824–24837, 2022. Shijie W u, Ozan Irsoy , Steven Lu, V adim Dabra volski, Mark Dredze, Sebastian Gehrmann, Prabhan- jan Kambadur , Da vid Rosenberg, and Gideon Mann. Bloomberggpt: A large language model for ﬁnance. arXiv preprint , 2023. Y ijia Xiao, Edward Sun, Di Luo, and W ei W ang. T radingagents: Multi-agents LLM ﬁnancial trading framew ork. In The F irst MARW : Multi-Agent AI in the Real W orld W orkshop at AAAI 2025 , 2025. URL https://openreview.net/forum?id=4QPrXwMQt1 . Qianqian Xie, W eiguang Han, Xiao Zhang, Y anzhao Lai, Min Peng, Alejandro Lopez-Lira, and Jimin Huang. Pixiu: A comprehensive benchmark, instruction dataset and large language model for ﬁnance. Advances in Neural Information Pr ocessing Systems , 36:33469–33484, 2023. Qianqian Xie, W eiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Y ueru He, Mengxi Xiao, Dong Li, Y ongfu Dai, Duanyu Feng, et al. Finben: A holistic ﬁnancial benchmark for large language models. Advances in Neural Information Pr ocessing Systems , 37:95716–95743, 2024. An Y ang, Anfeng Li, Baosong Y ang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Y u, Chang Gao, Chengen Huang, Chenxu Lv , Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran W ei, Huan Lin, Jialong T ang, Jian Y ang, Jianhong T u, Jianwei Zhang, Jianxin Y ang, Jiaxi Y ang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Ke xin Y ang, Le Y u, Lianghao Deng, Mei Li, Mingfeng Xue, Mingze Li, Pei Zhang, Peng W ang, Qin Zhu, Rui Men, Ruize Gao, Shixuan Liu, Shuang Luo, T ianhao Li, T ianyi T ang, W enbiao Y in, Xingzhang Ren, Xinyu W ang, Xinyu Zhang, Xuancheng Ren, Y ang Fan, Y ang Su, Y ichang Zhang, Y inger Zhang, Y u W an, Y uqiong Liu, Zekun W ang, Zeyu Cui, Zhenru Zhang, Zhipeng Zhou, and Zihan Qiu. Qwen3 technical report, 2025. URL . Y uzhe Y ANG, Y ifei Zhang, Minghao W u, Kaidi Zhang, Y unmiao Zhang, Honghai Y u, Y an Hu, and Benyou W ang. T winmarket: A scalable behavioral and social simulation for ﬁnancial markets. In The Thirty-ninth Annual Confer ence on Neural Information Pr ocessing Systems , 2025. URL https://openreview.net/forum?id=h60y6zlPyl . 12 Y angyang Y u, Zhiyuan Y ao, Haohang Li, Zhiyang Deng, Y uechen Jiang, Y upeng Cao, Zhi Chen, Jor- dan W . Suchow , Zhenyu Cui, Rong Liu, Zhaozhuo Xu, Denghui Zhang, Koduv ayur Subbalakshmi, GUOJUN XIONG, Y ueru He, Jimin Huang, Dong Li, and Qianqian Xie. Fincon: A synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced ﬁnancial decision making. In The Thirty-eighth Annual Confer ence on Neural Information Pr ocessing Systems , 2024. URL https://openreview.net/forum?id=dG1HwKMYbC . Y angyang Y u, Haohang Li, Zhi Chen, Y uechen Jiang, Y ang Li, Jordan W Suchow , Denghui Zhang, and Khaldoun Khashanah. Finmem: A performance-enhanced llm trading agent with layered memory and character design. IEEE T ransactions on Big Data , 2025. Aniruddha Zalani. Low-latency machine learning for options pricing: High-speed models and trading performance. Journal of Computer Science and T echnology Studies , 7(5):65–72, 2025. Bin Zhang, Y uxiao Y e, Guoqing Du, Xiaoru Hu, Zhishuai Li, Sun Y ang, Chi Harold Liu, Rui Zhao, Ziyue Li, and Hangyu Mao. Benchmarking the text-to-sql capability of lar ge language models: A comprehensiv e e valuation. arXiv pr eprint arXiv:2403.02951 , 2024. Zhihan Zhang, Y ixin Cao, and Lizi Liao. XFinBench: Benchmarking LLMs in complex ﬁnancial problem solving and reasoning. In W anxiang Che, Joyce Nabende, Ekaterina Shutov a, and Mohammad T aher Pilehvar (eds.), F indings of the Association for Computational Linguistics: A CL 2025 , pp. 8715–8758, V ienna, Austria, July 2025. Association for Computational Linguistics. ISBN 979-8-89176-256-5. doi: 10.18653/v1/2025.ﬁndings- acl.457. URL https://aclanthology. org/2025.findings- acl.457/ . A B A C K G R O U N D O F O P T I O N S T R A T E G Y W e summarize the option strategies considered in this w ork under two common trading scenarios: dir ectional views (b ullish/bearish exposure) and volatility views (break out or range-bound). Through- out this appendix, we only consider pure option multi-leg combinations (calls/puts) on the same underlying, and we do not include stock-option mixed positions (e.g., covered calls, protecti ve puts, collars). All payoffs are deﬁned at expiration as functions of S T , omitting transaction costs and premium shifts. A . 1 D I R E C T I O N A L S T R A T E G I E S Long Call (Bullish). Buy one call with strike K : Π LC ( S T ) = ( S T − K ) + . (5) Long Put (Bearish). Buy one put with strike K : Π LP ( S T ) = ( K − S T ) + . (6) Bull Call Spread (Bull Spr ead). Buy a call at K 1 and sell a call at K 2 with K 1 < K 2 : Π BCS ( S T ) = ( S T − K 1 ) + − ( S T − K 2 ) + . (7) Bear Put Spread (Bear Spr ead). Buy a put at K 2 and sell a put at K 1 with K 1 < K 2 : Π BPS ( S T ) = ( K 2 − S T ) + − ( K 1 − S T ) + . (8) A . 2 V O L AT I L I T Y S T R A T E G I E S Long Straddle (Long V olatility). Buy a call and a put at the same strike K : Π Straddle ( S T ) = ( S T − K ) + + ( K − S T ) + = | S T − K | . (9) Long Strangle (Long V olatility). Buy a put at K 1 and a call at K 2 with K 1 < K 2 : Π Strangle ( S T ) = ( K 1 − S T ) + + ( S T − K 2 ) + . (10) 13 Butterﬂy (Short V olatility / Range-Bound). A standard call butterﬂy uses three strikes K 1 < K 2 < K 3 : Π Butterﬂy ( S T ) = ( S T − K 1 ) + − 2( S T − K 2 ) + + ( S T − K 3 ) + . (11) Iron Condor (Short V olatility / Range-Bound). Constructed by combining a put spread and a call spread with K 1 < K 2 < K 3 < K 4 : Π IC ( S T ) =  ( K 2 − S T ) + − ( K 1 − S T ) +  | {z } put spread +  ( S T − K 3 ) + − ( S T − K 4 ) +  | {z } call spread . (12) B O Q L S P E C I FI C A T I O N A N D E X A M P L E S This appendix provides the formal deﬁnition of the Option Query Language (OQL), including its grammatical structure, role-based schema, and practical usage examples. B . 1 F O R M A L S Y N TA X ( E B N F ) T o facilitate Schema-Augmented Generation, OQL is deﬁned by a strict Extended Backus-Naur Form (EBNF) grammar . This grammar is injected into the LLM context to constrain token generation and ensure syntactic validity . Query ::= SelectClause FromClause WhereClause? HavingClause? OrderClause? LimitClause? SelectClause ::= "SELECT" StrategyName StrategyName ::= "BULL_CALL_SPREAD" | "IRON_CONDOR" | ... FromClause ::= "FROM" Underlying Underlying ::= [A-Z]+ -- e.g., SPY WhereClause ::= "WHERE" LegCondition { "AND" LegCondition } LegCondition ::= Role "." Field Op Value HavingClause ::= "HAVING" StratCondition { "AND" StratCondition } StratCondition ::= Field Op Value | Field "BETWEEN" Val "AND" Val OrderClause ::= "ORDER" "BY" OrderItem { "," OrderItem } OrderItem ::= Field [ "ASC" | "DESC" ] LimitClause ::= "LIMIT" INTEGER Role ::= "L" | "S" | "F" | "B" | ... Field ::= "Dte" | "Delta" | "Iv" | ... Op ::= "=" | "!=" | "<" | ">" | "~" 14 B . 2 S T R A T E G Y D E FI N I T I O N S A N D R O L E S C H E M A S OQL enforces structural integrity through Role Schemas . Each strategy type implies a speciﬁc set of leg identiﬁers (Roles) that are v alid within the WHERE clause. The execution engine rejects queries referencing undeﬁned roles (e.g., referencing a "Short Put" role in a "Call Spread" strategy). T able 5: Supported Strategies and Role Deﬁnitions Strategy Roles Description BULL_CALL_SPREAD L , S Lower -strike Long Call + Higher-strike Short Call BEAR_PUT_SPREAD L , S Higher -strike Long Put + Lower -strike Short Put CALENDAR_CALL F , B Near -term Short Call + Far-term Long Call STRADDLE C , P A TM Call + A TM Put IRON_CONDOR SC , LC Short/Long Call W ings SP , LP Short/Long Put W ings BUTTERFLY_CALL L1 , S , L2 Long wings + Short body B . 3 Q U E RY A NA LY S I S A N D E X A M P L E S The follo wing examples demonstrate ho w OQL handles comple x ﬁnancial logic that would otherwise require verbose Python scripts. Example 1: The "Appr oximate" Operator . Natural language often implies fuzzy constraints. The tilde operator ( ∼ ) allows the LLM to e xpress "around 30 days" without hallucinating exact dates. Intent: "Find me an Ir on Condor on QQQ expiring in about a month, where I collect at least $100 cr edit." SELECT IRON_CONDOR FROM QQQ WHERE SC.Dte ~ 30 AND LC.Dte ~ 30 AND SP.Dte ~ 30 AND LP.Dte ~ 30 HAVING net_credit >= 100 ORDER BY rr_ratio DESC Example 2: Iron Condor (Range-Bound Income). OQL supports four-le g strategies and allo ws users to express range-bound vie ws using leg-le vel Greek constraints and maturity conditions. Intent: “TSLA is likely to stay rang e-bound. Build an ir on condor with about 30 days to expir ation. I want positive theta and limited downside risk. ” SELECT IRON_CONDOR FROM TSLA WHERE Dte ~ 30 AND SC.Delta < 0.20 AND LC.Delta < 0.05 AND SP.Delta > -0.20 AND LP.Delta > -0.05 HAVING net_theta > 0 AND max_loss < 500 LIMIT 10 B . 4 B A C K E N D D E S I G N OQL is designed as an abstract domain-speciﬁc language that serves as a middle layer between large language models (LLMs) and the concrete strategy search process. The role of the backend engine is to translate OQL queries into ex ecutable search programs over option-chain data. In principle, OQL is backend-agnostic. After abstracting the core components of option strategy search, such as leg selection, structural constraints, and aggregate risk computation. The ex ecution can be implemented using different backends. For example, the search process can be realized with Python-based data processing framew orks (e.g., NumPy or Pandas) for ﬂexible prototyping, 15 or compiled into SQL query templates or stored procedures for ef ﬁcient and scalable execution in relational databases. The choice of backend depends on system requirements such as performance, scalability , and deployment constraints. In this work, we adopt a Python-based backend to facilitate rapid prototyping and experimental e valuation. Nev ertheless, the design of OQL naturally supports compilation into SQL-based e xecution pipelines. In future work, we plan to migrate the backend to a fully SQL-driv en implementation, lev er- aging relational query optimization and deterministic ex ecution to enable lar ge-scale, reproducible strategy search. C D A TA S E T C O N S T RU C T I O N T o the best of our kno wledge, translating natural-language trading intents into ex ecutable multi-leg option strategies is a relatively une xplored task, and no existing dataset explicitly aligns linguistic descriptions with structured option strategies under realistic market conditions. T o address this gap, we construct a ne w dataset by grounding expert-written intents in carefully segmented mark et re gimes observed in 2025. In this section, we introduce the construction pipeline of our dataset. As shown in Figure 3, rather than relying on uniform calendar-based splits, we ﬁrst identify distinct market regimes based on price action within a held-out test period that is not fully observed by lar ge language models during training. Human experts manually classify market styles (e.g., crashes, recov eries, consolidations, or momentum-dri ven phases) and determine the appr opriate option strate gy ar chetypes commonly adopted under each regime. For dataset annotation purposes, we assume a hypothetical trader who enters the market at the be ginning of each re gime and, as an oracle, has access to the realized price trajectory ov er the subsequent e valuation windo w . This oracle assumption is used solely to assign regime-consistent intent labels and strategy preferences, and is ne ver exposed to the model during training or inference. The resulting regimes and strategy mappings are employed exclusiv ely for intent grounding and ev aluation, ensuring that no look-ahead information is exploited. 1. Market Regime Segmentation High-Level Consolidation (Jan-Mid Apr) Crash (Mar-Apr) V-Shape Recovery (Mid Apr-Jul) Mid-Term Consolidation & Ascent (Aug-Oct) High-Level Wide Oscillation (Nov-Dec) Historical Price Data (e.g., SPY 2025) Human Expert Annotation Identify Distinct Regimes Determine Strategy Archetypes (e.g., Iron Condor, Bear Put Spread) 2. Ground-T ruth Intent Alignment Intent-Strategy Pairing (Regime-Consistent) Natural Language T rading Intents SPY chopping sideways 570-600... collect income... delta-neutral setup. Worried about short-term correction... setup a Put spread... buy back-month protection... Preferred Startegy Labels IRON_CONDOR BEAR_PUT_SPREAD Oracle Assumption (For Annotation Only) Access to realized future price trajectory for accurate labeling, never exposed during training/inference. 3. Dataset Finalization & Backtesting Windows Annotated Dataset Samples Backtesting Alignment {   "case_id": "spy_2025_001",   "intent": "...chopping sideways...",   "strategy": "IRON_CONDOR",   "backtest_start": "2025-01-15",   "backtest_end": "2025-03-01" } {   "case_id": "spy_2025_021",   "intent": "...short-term correction...",   "strategy": "BEAR_PUT_SPREAD",   "backtest_start": "2025-08-29",   "backtest_end": "2025-10-01" } Windows aligned with regime boundaries; Entry near beginning, Exit at transition or expiration. Final Grounded Dataset (Intents, Strategies, Market Regimes) Figure 3: Dataset construction workﬂo w . W e segment historical price data into distinct market regimes, align natural-language trading intents with regime-consistent option strategy labels, and ﬁnalize JSON samples with backtesting windo ws aligned to re gime boundaries, producing a grounded dataset of intents, strategies, and market re gimes. C . 1 M A R K E T R E G I M E S E G M E N TA T I O N For each underlying asset, we manually di vide the yearly price trajectory into consecuti ve regimes based on dominant trend direction, volatility characteristics, drawdo wn magnitude, and structural 16 T able 6: T wo example intent cases in our benchmark. W e categorize queries by user e xpertise (e.g., Junior vs. Senior) and extract key parameters such as the underlying asset, preferred strategy , and backtesting range from the unstructured intent description. Case T ype Underlying Strategy User Intent Description Backtest Period Example 1 NVDA Bull Call “Okay , we are bouncing off 95. Jensen is speaking next week. 2025-04-15 (Junior) Spread I think the bottom is in. I want to catch the run back to 120, to but I don’ t hav e much cash. Get me a cheap Call Spread. ” 2025-05-15 Example 2 GOOG Bear Call “The stock is stuck belo w the 210 resistance and looks heavy . 2025-01-17 (Senior) Spread I want to sell the 215/220 call spread to col- lect premium. to I don’t think it has the energy to break out... ” 2025-02-10 price patterns (e.g., V -shaped reversals, range-bound consolidation, or breakout acceleration). Each regime represents a distinct market condition that naturally corresponds to diff erent option strategy preferences. Below , we illustrate this process using representativ e examples from SPY in T able 7, which serves as a market-wide benchmark and e xhibits a pronounced deep V -shaped rev ersal in 2025, followed by a sustained b ull market and high-le vel consolidation. Based on price action, we identify four major regimes. T able 7: Market regime segmentation of SPY in 2025 and corresponding option strategy preferences. Regime & Time P eriod Price Action Characteristics T ypical Option Strategy Prefer ence I: High-lev el consolidation & drawdown (J an – mid Apr) Range-bound (570–600) followed by rapid Feb sell-off. Breaks support to year-lo w ( ∼ 500) with high volume, indicating panic liquidation. Bearish hedging structures (long puts, bear spreads) to protect against tail risk and capture downside con ve xity . II: V -shaped r ebound & up- trend (mid Apr – J ul) Double-bottom in Apr , then steep rebound. Rallies from ∼ 500 to > 640 in 3 months with strong momentum and minimal pullbacks. Risk-controlled bullish strate gies (bull spreads, call diagonals) to lev erage upside while managing capital and volatility . III: Consolidation & stair- step advance (Aug – Oct) Momentum moderates. Intermittent pullbacks ( ∼ 620) and renewed advances form a zigzag structure with higher highs and lows. Moderately bullish/v olatility-selling (call spreads, calendars, directional strangles) for growth with reduced trend strength. IV : V olatility & stabiliza- tion (late Oct – Dec) Sharp correction (688 to 650) then rapid recov ery . Stabilizes near highs ( ∼ 683) in wide consolidation with elev ated volatility . Non-directional income strategies (iron condors, short straddles) to capture time decay amidst high volatility and uncertainty . C . 2 G R O U N D - T RU T H I N T E N T A L I G N M E N T A N D B AC K T E S T I N G W I N D O W S For each market regime, we construct ground-truth annotations that link observed price-action patterns to natural-language trading intents and corresponding option strategy preferences, we pro vide examples in T able 6. Intents are written by human experts based solely on information a vailable up to the regime endpoint and describe high-level objecti ves such as directional bias (bullish, bearish, or neutral) and risk preference (e.g., downside protection, upside participation, or volatility harvesting), without referencing any future price realization. Each regime is associated with typical strategy archetypes commonly used under similar conditions (e.g., downside-protection strategies during drawdo wns or non-directional income strategies during high-le vel consolidation), as summarized in T able 7. 17 For quantitati ve e v aluation, backtesting windows are manually aligned with regime boundaries. The entry date is set near the beginning of the corresponding regime, while the e xit date is chosen at the regime transition or at option expiration, whiche ver occurs ﬁrst. This design ensures a consistent and transparent mapping between market observ ations, natural-language intents, and backtested strategy outcomes. D D E T A I L S I N E X P E R I M E N T S D . 1 E V A L U A T I O N M E T R I C S W e ev aluate the system from tw o complementary perspecti ves: Query Quality (validity , efﬁcienc y , and semantic correctness of OQL generation) and Strate gy Quality (ﬁnancial performance of the generated strategies). D . 1 . 1 Q U E RY Q UA L I T Y T o address RQ1 and RQ2, we employ a hierarchical set of metrics to ev aluate the LLMs’ proﬁciency in OQL query generation. These metrics range from syntactic v alidity to semantic faithfulness and ex ecution efﬁcienc y . As we introduce three metrics to measure correctness and alignment used in Section 5.2, in this section, we provide more details in Semantic Accurac y (SA). W e use an LLM- based e valuator (GPT -4o) to judge whether the query’ s constraints correctly reﬂect the intent’ s key conditions (e.g., strikes, DTE, Greeks) without missing hard constraints or hallucinating requirements. The speciﬁc prompt used for this ev aluation is detailed in Figure 4. W e additionally design two metrics to further measure the query efﬁcienc y and selectivity . Let D = { 1 , . . . , N } be the set of test cases. For each case j , the model is allowed up to K attempts. Let k j denote the index of the ﬁrst successful attempt (parsable and non-empty). If no success occurs within K tries, we set k j = ∞ . Prompt for Semantic Accuracy (SA) Ev aluation System Instruction: Y ou are an ev aluator for semantic match between a natural-language intent and an OQL query . Goal: • Judge whether the OQL query is a reasonable and faithful translation of the intent. • This is NOT a formal proof task. Allow reasonable interpretations for v ague wording. Key Ideas: • HARD Constraints: Explicit numeric/categorical requirements (e.g., DTE=30, A TM/O TM, net_credit ≥ X, max_loss ≤ Y). • SOFT Constraints: Approximate phrases ("around", "near", "low risk"). Evaluation Rules: 1. Core Strate gy Match: F AIL if the SELECT strategy family dif fers from the intent’ s main trading idea. 2. HARD Constraints: F AIL if violated. P AR TL Y_CORRECT if details are missing but not contradictory . 3. SOFT Constr aints: Treat ﬂe xibly . Missing some does not automatically fail. 4. Extra Constraints: Do NOT penalize structural v alidity constraints. 5. Approximation: Treat " ∼ " or "around" with tolerance (e.g., DTE ± 5). Grading Scale: • completely_correct : Core match + HARD constraints satisﬁed. • partly_correct : Core match + minor details missing or extra constraints narro wing intent. • fail : Wrong strategy or contradiction of HARD constraints. Output Format: JSON with keys grade and comment . Figure 4: The LLM-as-a-Judge prompt used to e valuate Semantic Accuracy (SA). It distinguishes between hard and soft constraints to ensure fair e valuation of the generated OQL code. 18 Efﬁciency (Eff). This metric penalizes multiple retries. It is deﬁned as the av erage remaining budget ratio for successful cases: Eﬀ = 1 N N X j =1 I ( k j ≤ K ) ·  1 − k j K  , (13) where I ( · ) is the indicator function. Selectivity (A vgRows). T o measure query speciﬁcity , we report the av erage number of rows ( r j,k j ) returned by successful queries. Giv en the constraint of LIMIT 10 , this is calculated as: AvgRo ws = 1 |S | X j ∈S r j,k j , (14) where S = { j | k j ≤ K } is the set of solved cases. Lower values indicate tighter , more speciﬁc constraints. D . 1 . 2 S T R A T E G Y Q UA L I T Y W e conduct backtests for M generated strategies. Let c i be the initial net cash ﬂow for strategy i . W e classify strategies as Buy er (Net Debit, c i > 0 ) or Seller (Net Credit, c i < 0 ). Let P i ( t ) denote the cumulativ e PnL at time t ∈ [0 , T ] . Win Rate (WR). The proportion of strategies with a positiv e ﬁnal PnL: WR = 1 M M X i =1 I ( P i ( T ) > 0) . (15) W e report this for the overall set, as well as distinct Buyer and Seller subsets. Risk Exposure (RE). T o capture tail risk, we measure if the interim drawdo wn exceeds a fraction τ ∈ { 0 . 5 , 0 . 9 } of the initial premium magnitude | c i | . The metric RE τ is deﬁned as: RE τ = 1 M M X i =1 I  min t ∈ [0 ,T ] P i ( t ) ≤ − τ | c i |  . (16) RE50 and RE90 correspond to τ = 0 . 5 and 0 . 9 , respectiv ely . High RE values indicate signiﬁcant downside risk, which is particularly critical for Seller strategies that may otherwise show high win rates. D . 2 D E S I G N O F B A S E L I N E S T o validate the necessity of a domain-speciﬁc intermediate representation (OQL), we compare our framew ork against three representati ve paradigms: direct generation, context-grounded generation, and general-purpose query generation. 1. Free-Form Leg Generation (FFLG). This baseline represents the capability of an LLM to generate option strategies relying solely on its parametric knowledge, without access to external market data. The model receiv es the natural language intent and is instructed to output a structured JSON containing the option legs (e.g., expiry , strike, type). This baseline ev aluates whether an LLM can "hallucinate" a correct strategy structure, serving as a lo wer bound for performance. 2. Partial-Chain Gr ounded (PCG). This baseline incorporates a Retrie val-Augmented Generation (RA G) approach. Alongside the user’ s intent, the model is provided with a snapshot of the option chain (e.g., the top- N most liquid contracts around the at-the-money price). The model must select speciﬁc contracts from this context to construct the strategy . This ev aluates the model’ s ability to ground its reasoning in observed data, though it is limited by the context window and cannot perform complex ﬁltering across the entire database. 19 T able 8: Comparison of baseline prompting strategies and output formats. FFLG relies entirely on the model’ s internal priors to infer strikes. PCG grounds generation using a partial option chain, enabling direct contract selection. T ext-to-SQL translates intent into an ex ecutable database query that retriev es v alid multi-leg strategies. Method Input Prompt (Ab breviated) Model Output (Snippet) FFLG System: Y ou are an options trading strategy ex- pert. Given a trading intent, output a JSON strate gy blueprint with no additional text. A vailable strate- gies include Iron Condor , Spreads, Straddles, and Butterﬂies. User: The stock is consolidating nicely around 240. IV is dropping. Sell an Iron Condor to cap- ture theta decay . I don’t expect a breakout this week. Underlying: TSLA. {"strategy_type":"IRON_CONDOR", "legs":{ "SC":{"strike":255}, "LC":{"strike":260}, "SP":{"strike":225}, "LP":{"strike":220}}} PCG System: Y ou are an options trading strategy ex- pert. Giv en a trading intent and a grounded option chain table for reference, output a JSON strate gy blueprint. The generated legs must correspond to contracts in the provided chain. Option Chain (Sample): | ticker | strike | type | close | delta | ... | | O:TSLA251219P00300000 | 300 | P | 19.95 | -0.29 | ... | | O:TSLA251219C00600000 | 600 | C | 2.75 | 0.06 | ... | User: I want to hold TSLA long-term but v olatil- ity is too high. Zero-cost collar expiring next year . Cap upside at 600 and protect downside at 300. {"strategy_type":"COLLAR", "strategy_details":{ "contract_ticker_LP": "O:TSLA251219P00300000", "price_LP":19.95, "contract_ticker_SC": "O:TSLA251219C00600000", "price_SC":2.75}} (Contracts selected dir ectly from pr ovided chain) T ext-to-SQL System / Instruction Y ou are an e xpert in options trading and SQL. Y our task is to generate one single, syntactically valid SQL query that searches for option-strategy candi- dates from an options-chain database based on a natural-language trading intent. Y ou MUST output exactly one JSON object that strictly follows the predeﬁned schema (no mark- down, no e xplanation, no extra te xt). User: TSLA may pause around 450 before the next le g up. Construct a call butterﬂy centered at 450 for late Nov ember . Prefer low net debit. {"strategy_type":"BUTTERFLY_CALL", "sql":"SELECT l1.ticker AS contract_ticker_L1, s.ticker AS contract_ticker_S, l2.ticker AS contract_ticker_L2 FROM tsla_2025_11_10_C l1 JOIN tsla_2025_11_10_C s ON s.strike=450 JOIN tsla_2025_11_10_C l2 ON l2.strike=460 ORDER BY net_debit ASC LIMIT 10"} 3. T ext-to-SQL (SQL). This baseline represents the standard industry approach for database interaction. The model is provided with the full database schema (table deﬁnitions and column descriptions) and is tasked with translating the natural language intent into a standard SQL query . While SQL is expressi ve, it lacks high-le vel abstractions for ﬁnancial logic (e.g., calculating spread costs or Greeks often requires complex joins and nested queries), making it a strong but challenging baseline for complex logical reasoning. E A D D I T I O N A L R E S U LT S T oken Usage. W e ev aluate the computational cost by measuring the token usage using DeepSeek- Chat, av eraged across all cases in the dataset. The results are summarized in T able 9. In this appendix, we provide a granular analysis of model performance, highlighting ke y behaviors across different asset classes and v alidating the effecti veness of our proposed generation framew orks compared to the baseline. 20 High-V olatility Adaptation vs. General Stability . W e observe a distinct div ergence in model behavior relative to asset volatility . DeepSeek-Chat demonstrates exceptional adaptation to high- momentum assets, achieving dominant proﬁtability on v olatile tickers such as NVD A (Proﬁt: 481.3, WR: 0.697) and TSLA (Proﬁt: 984.0, WR: 0.800). This suggests an "aggressive" internal bias suitable for capturing large price swings. In contrast, Gemini 2.5 Flash exhibits superior stability and risk management on broader market indices and mature assets. It outperforms all peers on SPY (Proﬁt: 552.4, R OC: 0.860) and maintains the highest win rate on AAPL (0.657). This dichotomy indicates that while DeepSeek-Chat excels in maximizing alpha in trending markets, Gemini 2.5 Flash offers a more robust baseline for general-purpose strate gy construction. T able 9: Comparison of token usage and cache hit rates across different methods. Our method achiev es the highest cache hit rate. T okens Method Prompt Compl. T otal Hit Rate FFLG 405.34 40.60 445.94 62.1% T ext-to-SQL 1395.02 260.20 1655.22 81.9% OQL (Ours) 2396.60 87.56 2484.16 88.5% PCG 3079.33 127.62 3206.95 22.4% PCG + Full T able 7677.21 127.97 7805.18 8.0% The Gap Between Syntax and Alpha. A critical ﬁnding from T able 13 is that high syntactic validity does not guaran- tee ﬁnancial viability . Specialized coding models like DeepSeek-Coder and Qwen- Coder frequently achiev e competitiv e Ef- ﬁciency scores (e.g., DeepSeek-Coder Ef f: 0.773 on AAPL) but fail to translate this into consistent returns. For instance, on GOOG, Llama-3.1-8B and Qwen3-4B in- curred signiﬁcant losses (-144.8 and -48.8, respectiv ely) despite acceptable v alidity rates. This highlights that ﬁnancial reasoning, the ability to identify causal market factors is a distinct capability from merely adhering to the OQL grammar . Smaller models often "overﬁt" to the syntax without encoding meaningful trading logic. T able 10: Performance metrics across ﬁve underlyings: W in Rate (WR), Risk90, Proﬁt, and R OC. (All) Model AAPL GOOG NVDA SPY TSLA WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC FFLG DeepSeek-Chat 0.389 0.333 24.73 -0.157 0.667 0.212 130.50 0.443 0.571 0.171 122.85 0.146 0.606 0.333 112.68 0.218 0.500 0.156 -6.28 0.176 Gemini-2.5-Flash 0.432 0.324 16.70 -0.123 0.611 0.194 152.70 0.175 0.611 0.111 118.00 0.558 0.735 0.294 195.43 0.476 0.514 0.297 64.90 0.176 PCG DeepSeek-Chat 0.425 0.350 160.03 -0.245 0.395 0.237 -60.03 0.023 0.462 0.154 356.88 0.149 0.564 0.231 397.64 0.622 0.364 0.364 -458.26 -0.392 Gemini-2.5-Flash 0.500 0.375 158.83 0.012 0.600 0.200 138.23 0.273 0.675 0.175 330.28 0.327 0.675 0.350 488.13 0.533 0.625 0.250 -167.88 0.459 PCG Full T able DeepSeek-Chat 0.400 0.325 41.03 -0.033 0.450 0.250 11.90 0.044 0.462 0.256 50.05 0.358 0.641 0.179 408.33 0.502 0.500 0.188 76.06 -0.001 Gemini-2.5-Flash 0.500 0.225 43.35 0.021 0.538 0.282 0.65 0.096 0.700 0.050 321.43 0.930 0.750 0.325 391.70 7.087 0.500 0.325 52.55 0.315 T ext2SQL DeepSeek-Chat 0.435 0.348 118.67 0.030 0.487 0.123 83.26 0.047 0.543 0.215 392.87 0.974 0.632 0.301 571.22 4.882 0.419 0.248 116.57 0.186 Gemini-2.5-Flash 0.487 0.227 199.59 -0.001 0.451 0.180 24.41 0.130 0.593 0.332 214.24 -0.049 0.550 0.231 472.20 0.360 0.561 0.204 543.71 0.024 T able 11: Performance metrics across ﬁve underlyings: W in Rate (WR), Risk90, Proﬁt, and R OC. (All) Model AAPL GOOG NVDA SPY TSLA WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC DeepSeek-Chat 0.584 0.111 214.1 0.223 0.419 0.151 7.6 0.204 0.642 0.163 486.8 0.646 0.456 0.338 83.1 0.242 0.765 0.161 1006.4 0.491 DeepSeek-Coder 0.529 0.278 165.9 0.082 0.413 0.218 48.3 0.343 0.508 0.204 290.3 0.177 0.496 0.292 314.6 0.119 0.557 0.184 713.1 0.419 Gemini-2.5-Flash 0.645 0.114 188.8 0.314 0.455 0.123 -19.7 0.026 0.583 0.196 425.1 0.219 0.554 0.346 602.3 0.319 0.526 0.164 778.3 0.334 GPT -4.1 0.528 0.202 105.1 -0.101 0.473 0.182 3.1 -0.166 0.402 0.390 325.5 4.661 0.417 0.446 227.5 0.001 0.613 0.294 704.7 0.332 GPT -4.1-Mini 0.489 0.238 98.5 0.152 0.417 0.241 -9.9 -0.022 0.426 0.399 139.1 0.203 0.477 0.367 140.2 0.394 0.556 0.296 656.2 0.513 Llama-3.1-8B 0.460 0.254 68.6 0.026 0.362 0.096 -126.4 -0.134 0.374 0.114 -39.7 0.047 0.393 0.263 20.3 -0.079 0.506 0.215 134.0 0.052 Qwen-Coder 0.433 0.205 161.4 0.123 0.447 0.116 -30.9 -0.030 0.453 0.177 -123.6 0.067 0.479 0.311 158.5 0.621 0.573 0.236 505.9 0.445 Qwen3-4B 0.566 0.090 290.9 0.153 0.235 0.173 -117.1 -0.240 0.390 0.248 -25.6 0.015 0.417 0.221 73.7 -0.120 0.378 0.203 302.0 0.097 Qwen3-8B 0.528 0.208 182.1 0.125 0.393 0.167 -156.0 -0.157 0.498 0.236 292.4 0.115 0.473 0.473 89.0 0.375 0.516 0.361 335.7 0.581 Impact of Structured Generation (Baseline Comparison). T able 8 provides compelling evidence for the necessity of our structured T ext2SQL frame work o ver the unstructured FFLG baseline. The FFLG approach consistently resulted in poor risk-adjusted returns, with DeepSeek-Chat posting a negati ve R OC (-0.157) on AAPL and ne gligible proﬁts on TSLA. By constraining the output space with OQL (T ext2SQL), we unlock the models’ latent reasoning capabilities. This is most visible on SPY , where DeepSeek-Chat’ s performance surged from a baseline ROC of 0.218 (FFLG) to 21 an extraordinary 4.882 (T ext2SQL). This massi ve deltas conﬁrms that the primary bottleneck for LLM-based quant research is not the lack of ﬁnancial knowledge, but the inability to articulate it ex ecutably without a formal grammar . Prompting P aradigms: PCG vs. T ext2SQL. While T ext2SQL generally yields the highest peak alpha (e.g., DeepSeek-Chat on TSLA), the PCG method offers speciﬁc advantages for ensuring consistency . For Gemini-2.5-Flash, PCG signiﬁcantly improved outcomes on SPY , achieving the highest recorded R OC in the appendix (7.087). Howe ver , PCG proved less stable for DeepSeek-Chat, which regressed to ne gativ e proﬁts on TSLA (-458.26) under this paradigm. This suggests a model- speciﬁc preference: reasoning-heavy models like Gemini beneﬁt from the step-by-step decomposition of PCG, whereas models with strong ra w instruction-follo wing capabilities lik e DeepSeek-Chat thrive under the direct constraints of T ext2SQL. T able 12: Query execution v alidity (VR), strategy-OK rate (SR), ef ﬁciency (Ef f.), and a verage ro w count (Rows) across ﬁ ve underlyings. Model AAPL GOOG NVDA SPY TSLA VR SR Eff. Rows VR SR Eff. Rows VR SR Eff. Rows VR SR Eff. Rows VR SR Eff. Rows DeepSeek-Chat 0.975 0.923 0.718 9.59 0.900 0.833 0.733 7.64 0.825 0.727 0.697 9.67 0.750 0.667 0.660 8.67 0.850 0.794 0.706 8.85 DeepSeek-Coder 0.750 0.833 0.773 9.70 0.650 0.769 0.754 8.65 0.575 0.478 0.748 7.87 0.675 0.481 0.763 9.33 0.650 0.692 0.738 9.39 Gemini 2.5 Flash 0.900 1.000 0.717 9.53 0.875 0.771 0.663 8.03 0.950 0.632 0.658 9.18 0.875 0.600 0.669 10.0 0.900 0.694 0.672 9.94 GPT -4.1 1.000 0.950 0.670 9.50 1.000 0.850 0.670 8.38 0.875 0.657 0.680 9.20 0.900 0.611 0.639 10.0 0.875 0.771 0.703 9.63 GPT -4.1 Mini 0.975 0.949 0.769 9.74 0.975 0.795 0.703 8.05 0.925 0.486 0.697 8.78 0.900 0.556 0.722 9.92 0.950 0.658 0.726 9.71 Llama 3.1 8B 0.950 0.895 0.758 9.05 0.925 0.730 0.686 8.14 0.900 0.306 0.722 8.83 0.850 0.441 0.718 9.79 0.900 0.472 0.622 9.42 Qwen-Coder 0.850 0.971 0.741 9.18 0.850 0.735 0.641 8.00 0.675 0.593 0.674 8.67 0.725 0.621 0.703 9.24 0.625 0.680 0.616 9.76 Qwen3-4B 0.875 0.943 0.777 9.86 0.800 0.719 0.731 8.41 0.650 0.423 0.731 9.42 0.650 0.423 0.692 10.0 0.625 0.680 0.744 9.64 Qwen3-8B 0.850 0.971 0.671 9.38 0.850 0.824 0.659 8.50 0.775 0.677 0.600 8.45 0.775 0.677 0.677 8.77 0.725 0.793 0.600 9.10 T able 13: Performance metrics across ﬁve underlyings: W in Rate (WR), Risk90, Proﬁt, and R OC. (T op Model AAPL GOOG NVDA SPY TSLA WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC WR Risk90 Proﬁt ROC DeepSeek-Chat 0.615 0.103 209.1 0.207 0.400 0.229 22.1 0.000 0.697 0.152 481.3 0.686 0.531 0.406 104.8 0.358 0.800 0.114 984.0 0.529 DeepSeek-Coder 0.467 0.300 165.6 0.016 0.423 0.269 32.7 0.074 0.478 0.261 216.7 0.011 0.577 0.346 309.0 0.189 0.577 0.192 793.7 0.402 Gemini-2.5-Flash 0.657 0.114 191.4 0.185 0.438 0.188 -23.5 0.519 0.579 0.237 456.6 0.183 0.559 0.382 552.4 0.860 0.514 0.200 444.2 0.317 GPT -4.1 0.487 0.308 109.4 -0.072 0.474 0.263 47.0 -0.216 0.417 0.389 223.7 0.065 0.429 0.486 272.5 0.105 0.571 0.257 684.9 0.277 GPT -4.1-Mini 0.436 0.308 75.5 -0.229 0.421 0.237 14.6 0.888 0.500 0.421 93.1 0.280 0.545 0.333 304.2 0.461 0.553 0.237 381.4 0.587 Llama-3.1-8B 0.472 0.250 50.6 0.022 0.368 0.184 -144.8 -0.157 0.417 0.083 -9.0 0.014 0.353 0.324 4.8 0.237 0.471 0.206 117.7 0.305 Qwen-Coder 0.485 0.212 161.9 0.205 0.394 0.152 23.8 -0.074 0.429 0.179 -129.5 0.182 0.464 0.321 186.0 -0.709 0.552 0.241 522.4 0.321 Qwen3-4B 0.541 0.108 302.2 0.112 0.300 0.267 -48.8 -0.240 0.346 0.231 -36.7 -0.005 0.417 0.250 164.2 -0.121 0.385 0.269 284.3 0.054 Qwen3-8B 0.583 0.222 217.8 0.139 0.484 0.129 -120.1 -0.052 0.556 0.222 264.7 0.274 0.483 0.517 34.8 0.456 0.621 0.276 479.9 1.296 F C A S E S T U DY W e present sev eral case studies in T able 14 to illustrate the strategic adv antages of our method over the baselines. In particular, we focus on two representative objectives: income enhancement and portfolio delta hedging. The results sho w that our approach consistently achie ves lower tracking error , better-controlled risk e xposure, and higher proﬁt potential. Compared with baseline methods, our approach demonstrates a clear advantage in accurately trans- lating hedging intent into effecti ve option strategies. As shown in Cases 2, 4, and 6, our method exhibits more precise in verse tracking of the underlying assets for hedging purposes. This capability is especially v aluable for portfolio protection during periods of market stress, such as the market panic observed in April 2025. The improv ed hedging performance mainly stems from a better understanding of delta exposure and the ability of OQL to identify the most suitable hedging structures. In addition, Cases 1 and 3 highlight the effecti veness of our method in income enhancement scenarios using spread strategies. W e observe that strategies generated by OQL maintain stable tracking behavior relati ve to the underlying assets while deliv ering enhanced income. This stability is largely due to OQL ’ s ability to directly lev erage real-time option chain information, enabling the selection of well-balanced and market-consistent option combinations. 22 T able 14: Qualitative case studies of intent-to-strate gy translation ID Description (Title & Intent) Result (V isual & Comment) 1 Moderately Bullish Call Ratio Spread User Intent: “I’m b ullish but cautious at these le vels (350). Buy one A TM call and sell two O TM calls at 380. I want to proﬁt from a slow drift higher , not a spike. ” 2025-08-01 2025-08-05 2025-08-07 2025-08-11 2025-08-13 2025-08-15 2025-08-19 2025-08-21 2025-08-25 2025-08-27 2025-08-29 200 0 200 400 600 800 1000 1200 1400 Portfolio Close OQL SQL FFLG PCG Stock 310 320 330 340 350 Underlying Price Comment: The model correctly interpreted the intent as a bullish call ratio spread, capturing upside from a gradual price increase while limiting gains under sharp rallies. The OQL strategy maintained positive returns during the underlying asset’ s volatile periods without experiencing signiﬁcant dra wdowns. 2 Delta-Neutral Iron Condor User Intent: “NVD A is range bound 160-180 (Sep). Construct a Delta Neutral Iron Condor . Harvest theta. ” 2025-09-10 2025-09-12 2025-09-16 2025-09-18 2025-09-22 2025-09-24 2025-09-26 2025-09-30 2025-10-02 2025-10-06 2025-10-08 2025-10-10 200 0 200 400 600 800 Portfolio Close OQL SQL FFLG PCG Stock 170 175 180 185 190 Underlying Price Comment: The model successfully mapped the range- bound and income-seeking intent to a delta-neutral iron condor , prioritizing theta decay while maintaining bal- anced directional exposure. The OQL strategy demon- strates staggering explosiv e power for in verse proﬁts during sharp declines in the underlying asset. 3 Support-Driven Put Spr eads User Intent: “The trendline support at 200 is holding beautifully . I’m willing to bet my house it doesn’t drop below 195. Sell aggressiv e put spreads to ﬁnance a long position. ” 2025-08-01 2025-08-04 2025-08-05 2025-08-06 2025-08-07 2025-08-08 2025-08-11 2025-08-12 2025-08-13 2025-08-14 2025-08-15 2025-08-18 2025-08-19 2025-08-20 2025-08-21 2025-08-22 2025-08-25 0 200 400 600 800 1000 Portfolio Close OQL SQL FFLG PCG Stock 190.0 192.5 195.0 197.5 200.0 202.5 205.0 207.5 Underlying Price Comment: The model correctly interpreted the intent as a support-driv en bullish strate gy and mapped it to a call ratio spread (or equivalent bullish structure) that beneﬁts from a gradual upside move while controlling do wnside risk. The OQL strategy perfectly synchronizes with the underlying asset’ s upward momentum. Continued on next pa ge... 23 T able 14: Qualitative case studies (Continued) ID Description (Title & Intent) Result (V isual & Comment) 4 Theta Harvest (Consolidation) User Intent: “Consolidation phase between 200 and 220. IV is still rich. Sell an Iron Condor to harvest theta as we chop side ways. ” 2025-05-15 2025-05-20 2025-05-23 2025-05-29 2025-06-03 2025-06-06 2025-06-11 2025-06-16 2025-06-20 2025-06-25 2025-06-30 0 250 500 750 1000 1250 1500 Portfolio Close OQL SQL FFLG PCG Stock 196 198 200 202 204 206 208 210 Underlying Price Comment: The model successfully translated the range- bound and volatility-rich intent into a delta-neutral iron condor , explicitly optimizing for theta decay . The OQL strategy achieves rapid portfolio value recovery and maintains a consistent upward trajectory . 5 T rendline Support (V ariant) User Intent: “It’ s accelerating downside! 180 is gone. T arget is 160. Get me a vertical put spread for the next two weeks to maximize R OI on this crash. ” 2025-03-18 2025-03-19 2025-03-20 2025-03-21 2025-03-24 2025-03-25 2025-03-26 2025-03-27 2025-03-28 2025-03-31 400 300 200 100 0 100 200 300 Portfolio Close OQL SQL FFLG PCG Stock 155.0 157.5 160.0 162.5 165.0 167.5 170.0 172.5 Underlying Price Comment: The model interpreted the intent as a bearish momentum strategy . The OQL and PCG strategies exhibit similar behavior , achieving rapid growth in returns during sharp declines in stock prices. In contrast, FFLG is subject to extremely massi ve dra wdowns. 6 V olatility Rich Condor User Intent: “Panic selling at 100! V olume is insane. I don’t kno w where the bottom is. Buy a Straddle. 30 days. ” 2025-03-25 2025-03-27 2025-03-31 2025-04-02 2025-04-04 2025-04-08 2025-04-10 2025-04-14 2025-04-16 2025-04-21 2025-04-23 2025-04-25 1500 1000 500 0 500 1000 1500 Portfolio Close OQL SQL FFLG PCG Stock 95 100 105 110 115 120 Underlying Price Comment: The model translated the intent into a high- volatility Straddle strategy . The OQL strategy demon- strates strong in verse proﬁt characteristics. In contrast, the FFLG and PCG strate gies suf fer from prolonged neg- ativ e returns, while SQL remains stuck at zero returns for the long term. 24

From Natural Language to Executable Option Strategies via Large Language Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment