LLM-Enabled Low-Altitude UAV Natural Language Navigation via Signal Temporal Logic Specification Translation and Repair
Natural language (NL) navigation for low-altitude unmanned aerial vehicles (UAVs) offers an intelligent and convenient solution for low-altitude aerial services by enabling an intuitive interface for non-expert operators. However, deploying this capa…
Authors: Yuqi Ping, Huahao Ding, Tianhao Liang
LLM-Enabled Lo w-Altitude U A V Natural Langu age Na vigation via Signal T emporal Logic S p ecificat ion T ranslation and Repair Y uqi Ping, Huahao Ding, T ianhao Liang, Longyu Zhou, Guangyu Lei, Xinglin Ch en, Junwei W u, J ieyu Zh ou, Tingting Zhang Abstract —Natural language (NL) naviga ti on for low-altitude unmanned aerial vehicles (U A Vs) offers an i n telligent and con- venient solution for l ow-altitude aerial services by en ab l ing an intuitive interface for non-expert operators. Howe ver , deploying this capability in urban env i ronments necessitates the pre cise grounding of u nderspecified in structions into safety-critical, dynamically feasible motion pl ans subject to spatiotemporal constraints. T o address this challenge, we propose a unified framewo rk that translates NL in structions into Si gn al T emporal Logic (STL) specifications and sub sequently synthesizes trajecto- ries via mixed-in teger linear program ming (M ILP). S pecifically , to generate executable STL formulas from free-f orm NL, we deve lop a reasoning-enhanced lar ge language model (LLM) lev eraging chain-of-thought (CoT) supervision and group-relati ve policy optimization (GRPO), which ensures high syntactic validity and semantic consistency . Furthermore, to resolve in feasibilities induced by stringent l ogical or spatial requirements, we in tro- duce a specifi cation repair mechanism. This module combin es MILP-based diagnosis with L LM-guided semantic reasoning to selectiv ely relax task constrain ts wh i le strictly enfor cin g safety guarantees. Extensiv e simulations and r eal-world flight experi- ments demonstrate that the proposed closed-loop framework sig- nificantly improv es NL-to-STL translation robustness, enablin g safe, interpr etable, and adaptable UA V na vigation in com p lex scenarios. Index T erms —Natural language navigation, low-altitude U A V , signal temporal logic, specification repair I . I N T RO D U C T I O N A. Ba ckgr o und and Motivation Low-altitude u nmanne d aerial vehicles (U A Vs) hav e been increasingly deployed in m ission-critical scen arios such as safety monitoring [1], for est firefighting [2], log istics [3], emergency com munication s [4], and low-altitude network- ing [5]. Comp ared with high- altitude operation s, low-altitude flight requ ires UA Vs to operate in clo se proximity to complex urban structu res, dyn amic ob stacles, and h uman acti vities, which im p oses strin gent safety and regulatory constraints. At Y uqi Ping , Huahao Ding, Tia nhao Liang, Guangyu Lei, Xingl in Chen, Junwei W u, and T . Zhang are with Guangdong Provi ncial Ke y Laboratory of Space-Ae rial Networki ng and Intellige nt Sensing, Harbin Institute of T ech- nology , Shenzhe n, China, (e-mail: pi ngyq@stu.hit . edu.cn; hitszdhh@163.com; liangt h@hit.edu.cn; GuangyuLei@stu.hit.edu.cn ; chenxingli n@stu.hit.edu.cn; 220210419@st u. hit.edu.cn ; zhangtt@hi t.edu.cn); T . Zhang is also with Peng Cheng Laboratory (PCL), Shenzhen, China. L ongyu Zhou is with the Infor - mation Systems T echnology and Design, Singapore Uni versity of T echnology and Design, Singapore 487372, (e-mail: zhoulyfutu re@outl ook.com). Jie yu Zhou is with School of Computer S cience and Enginee ring, Central South Uni versity , Changsha, China, (e-mail: zhoujie yu@csu.edu.cn). the same time, low-altitude missions o f ten in volve h ig h-level objectives with co mplex temporal, spatial, and logical requ ire- ments that must b e specified clearly and executed reliab ly [6]. Natural-lang u age (NL) instructions provide an in tu iti ve interface f or exp ressing such high- lev el task intent, e sp ecially for non -expert operato rs. H owever , the inhe r ent ambiguity and undersp e cification of NL stand in fun damental tension with the strict safety , tempor al, a nd spatial r equireme n ts that low- altitude UA V navigation mu st satisfy . Recent ad vances in large lang uage models (LLMs) h av e demonstra te d strong capabilities in NL und erstanding an d high-level reasonin g, which has led to growing interest in languag e -driven robotic au tonomy [7]. Despite this pro gress, directly mapping NL instructio ns to low-le vel co ntrol com- mands remains unsuitab le for safety-critical U A V ope r ation, as such mapp ings lac k form al guaran tees, inte r pretability , an d verifiability [8]. Conv e r sely , c lassical model-ba sed plann ing and control frameworks are capable of gener a ting dynamically feasible trajectories u n der co mplex constraints, but they typi- cally do not provid e a systematic mech anism to inter pret and enforce high-level task semantics expressed in NL [9]. This disconnect highlights the need f o r an in termediate represen- tation that can faithfully capture language- lev el inte n t while remaining a menable to formal a nalysis and execution. Formal meth o ds offer a prin c ip led bridg e between high-level semantic inten t and lo w-level contro l. In particular, Signal T em p oral Logic ( ST L ) provides precise semantics for spec- ifying complex temp oral an d spatial behaviors [1 0], en abling safety c o nstraints and mission o bjectiv es to b e expressed in a form suitable for verification an d optimization- based planning . Howe ver, enablin g NL U A V navigation through STL in troduces two coupled ch allenges. First, NL instruction s must b e tr a n slated in to ST L specifica tio ns that accurately preserve th e intended semantics. Second, ev e n semantically correct STL specifications ma y render the un derlying planning problem infeasib le wh en tempor a l or spatial requirem ents are overly restrictive in low-altitude environments. These ch al- lenges mo tiv ate a u n ified framework that jointly add resses specification translation and feasibility repair , thereb y ensuring both semantic fidelity and physical executability for safe and adaptable low-altitude UA V n avigation. B. Re lated W orks In recent y ears, NL -guided n avigation for UA Vs has at- tracted increasin g attention. Compar ed with grou nd or indoor navigation, low-altitude U A V scenario s in volve 3 D motion , varying flight altitudes, and m ore comp lex spatial relations, which substantially increases the difficulty of both language understan ding an d navigation execution [ 1 1]. Early studies mainly adop te d hand -crafted grammatical rules or keyword- based mappin gs to co n vert NL comman ds in to predefined flight action s or wayp oint sequence s [1 2], [13]. Altho ugh these methods o ffer interpretab ility an d en gineering con trollability , their expressive cap acity is limited, ma k ing th em inad equate for open - vocab ulary o r c o mpositiona l instruction s and leading to poor gener alization un der environmental variations [1 4]. Subsequen tly , deep learning techniques were widely intro - duced, lev eraging imitation learnin g or reinfor cement learning to jointly mod el NL, perception , and action space s. This line of research gave rise to N L -guided navigation and v i- sual languag e navigation ( VLN) f rameworks [1 1], [1 5]– [18]. Nev e rtheless, th eir c o re parad igm rem ains task - specific policy learning, with limited capacity for explicit instruction de- composition and p lan-level reasoning need e d for open -ended , composition al command s. More recently , LLM-enab led methods h av e pushe d NL navigation tow a rd stron ger seman tic u nderstand ing an d flex- ible task execution. By supporting h igh-level reaso ning, task decomp o sition, and open- vocab ulary goal grounding , LLMs enable U A Vs to follow longer, freer-form instru c tions and improve semantic adaptability [19]–[2 2 ] . Nev ertheless, ma ny existing studies are e valuated in open simu lation settings without explicitly modeling low-altitude airspa ce regulations, making it hard to for m ally verify wheth e r a gener ated fligh t plan violates rule constraints in safety -critical 3D airspaces [23]. T o add ress the se limitations, several recent studies have be- gun incorpor ating for mal meth o ds into NL-guide d navigation frameworks [24]–[26]. T em poral logics such as linea r temporal logic (L TL) and STL have b een u sed to explicitly en code task ob jectiv es, safety constraints, an d tempo ral r e quiremen ts, which are sub sequently enforced throu gh plan ning algorithm s. A key challeng e in th ese framew orks is translating NL instruc- tions into well-form ed tem poral-lo gic sp e cifications. Early research o ften assum e d stru ctured or con tr olled langua ge to simplify NL-to - TL mapp ing [2 7], an d som e navigation- oriented appro aches relied o n manual or semi-autom atic tr a n s- lation pro cedures to construct for mulas specifying visitation order, obstacle a voidanc e b ehaviors, o r timing constraints [ 2 4]. Learning -based seman tic parsing m ethods have su bsequently been explored in se veral domain s a n d have been shown to be effectiv e in map ping NL to formal specifications [28]– [30], but they typically require sub stantial annotated data and may gen eralize poorly to comp lex instru ctions that inv o lve implicit reaso ning. More recen tly , LLM-b ased method s have been explo red to d irectly g e nerate L TL or STL specification s from fr ee-form NL [25], [26], [31]–[ 33], offering a p otential way to im prove task generalizatio n beyond wh at is achiev able with traditio n al supervised par sers. Despite this pro gress, existing studies r emain con strained by limited specification expressiv en ess, unstable languag e-to-log ic mappin gs sensitive to prom p ts an d c o ntext, and the fact that current LLM-b ased generation can a lso generalize poo rly on complex instruction s with implicit reasonin g [34], [ 35]. Moreover , th e afo r ementione d methods often presume that human- provided NL instructions are correct and can b e faith- fully captur ed by a correspo nding formal specification. In practice, the stated in tent may conflict w ith U A V d ynamical feasibility limits o r lo w- a ltitu de airspac e constrain ts, ren dering the synthesized plan infeasible. Once an STL specification is obtained, feasibility restoration has been extensively studied in th e formal m ethods literature, where mo st a p proach es keep the STL structur e fixed and focu s on minimal parameter- lev el re pairs. These m ethods typically a n alyze mixed-integer linear p r ogramm ing (MI L P) encoding s to identify irredu cibly infeasible su b systems ( IIS) or unsatisfiable cores, and restor e feasibility by relaxing temporal b ounds or p redicate thresholds throug h slack variables, weighted objec tives, or least-violating formu latio ns un der restricted fragments [36]–[40]. While these optimization -driven techn iques provide clear o b jectiv es and formal guar antees, they ar e largely lan guage-a g nostic, the decision of what to relax is ty pically determined by predefine d costs or priorities, implicitly assuming that the original spec- ification prec isely reflects user intent. I n langu age-gro unded settings, infe a sibility may instead stem from sema n tic ambi- guity , u n derspecification , or NL-to-STL misinterpretation, in which case for m ally min im al relaxatio ns can be sem a ntically misaligned, such as w e a kening task-critical predicates when timing co nstraints are actually negotiable. This motiv ates incorpo rating lan guage-level r easoning into the repair lo op to guide the selection among alternativ e repair directions, such as relaxing temporal constraints versus pre d icate conditions, while still rigorou sly en forcing non -negotiable safety con- straints th rough for m al planning a n d op timization. C. Main Contributions This paper aims to ena b le safe, reliable, an d interp retable low-altitude UA V navigation from n atural-lang uage (NL) in- structions by jointly a d dressing semantic grounding , form al specification, an d motion- planning feasibility . The main co n- tributions are sum marized as follows. • W e develop an integrated navigation framework that conv e r ts NL in structions into STL specifications, detects planning infeasibility , an d r epairs the specifications based on solver f e edback to restore feasibility . The framework tightly cou ples LLM-b a sed sema ntic reasonin g, STL specification translation and rep air , and MI LP-based mo- tion plannin g with in a clo sed-loop architec tu re, enabling safe and executable low-altitude UA V n avigation. • W e pro pose an LLM- based translatio n m ethod th a t map s NL instructio ns into STL specificatio n s. The method inte- grates super vised chain - of-tho u ght (CoT) alignm ent with group -relative policy op timization ( GRPO) reinforcem ent learning. This pipeline improves the generatio n of syn- tactically well- formed STL spe c ifications an d increases exact-match NL-to-STL tran slation accuracy under strict canonical no rmalization. • W e intro duce an LLM-assisted, systematic STL repair mechanism to handle in feasible planning instances. Di- agnostic information from the MILP solver is mapped back to spe c ific STL subf ormulas, and the LLM is lev er aged to reason about semantic in te n t and prior itize repair directions, enablin g selective relax ation of non- safety-critical requirem e n ts while r ig orously preserving hard safety constra in ts. • W e validate the proposed appro ach thr ough extensiv e simulations and real-world U A V experimen ts. Exp er- imental results show that, co mpared with traditional NL-to-STL translation model, the pro posed app roach achieves higher translation accur a cy while u sing a smaller model. In ad d ition, the results d emonstrate that the p ro- posed closed-loop f r amew ork c a n gener ate d ynamically feasible trajec to ries and recover from inf easible specifi- cations. The re m ainder of th is pap er is organized as follo ws. Sec- tion II fo rmulates the prob lem a n d reviews STL pr eliminaries. Section III presents the overall fr amew o rk. Sectio n s IV –V de- scribe the NL-to- ST L translation and the MIL P- based p lanning and r epair mo dules. Section VI repo rts experimen tal results, and Sectio n VII conclude s. I I . P R E L I M I N A R I E S A N D P RO B L E M F O R M U L A T I O N In this section , we first describe the UA V dynam ics and th e en v ironmen t representatio n. W e then rec all the STL , which serves as a f ormal specification layer b ridging NL instru c tions and trajector y plan ning. Finally , we state th e U A V NL navi- gation pro blem conside red. A. UA V Dyna mics an d Envir onme nt Model W e co nsider a U A V oper a ting in a boun ded workspace W ⊂ R 3 with obstacles and no- fly zo nes. The U A V state a t discrete time step k is d efined as x k = p ⊤ k v ⊤ k ⊤ , wher e p k ∈ R 3 and v k ∈ R 3 denote the position and velocity of the UA V , respectively . Th e control inp ut u k ∈ R 3 correspo n ds to an acceleration co mmand. W e adop t a discrete-time linear dynam ics mod el with sam- pling tim e ∆ t : x k +1 = Ax k + Bu k , (1) where A = I ∆ t I 0 I , B = 1 2 ∆ t 2 I ∆ t I . (2) Here, I den otes the identity matrix o f app ropriate dimen sion and 0 denotes a zero matrix. State a n d c o ntrol co nstraints ar e imposed as: x k ∈ X , u k ∈ U , (3) where X and U are con vex polyhed ral sets enco ding workspace bou nds, velocity limits, and actu ation co n straints. The en v ir onment contains a set of labeled regions {R i } M i =1 that represent task-relevant area s in the work space. Obstacles and no-fly zones ar e modeled as f orbidd en regions that must be avoided by th e U A V for all th e k time steps. B. S ignal T emporal Logic S p ecification s STL has been extensively used to specify an d verify tem - poral p roperties o f dy n amical systems. In this work , STL for- mulas are interp reted over d iscrete-time UA V state trajectories x 0: H = { x 0 , x 1 , . . . , x H } , wher e H ∈ N den otes the planning horizon . An STL form ula ϕ is defin e d recursively as: ϕ ::= ⊤ | µ | ¬ ϕ | ϕ 1 ∧ ϕ 2 | ϕ 1 ∨ ϕ 2 | G [ a,b ] ϕ | F [ a,b ] ϕ | ϕ 1 U [ a,b ] ϕ 2 , (4) where a, b ∈ N with a ≤ b denote discrete-time bou nds.Here, the tem poral oper a tors F , G , and U corresp o nd to the eventu- ally , always , and u ntil opera to rs, r espectiv ely . And ϕ 1 and ϕ 2 denote arbitrary STL subform ulas. Th e atom ic pred icate µ is defined as an inequality over the system state: µ ( x k ) = g ( x k ) ≥ 0 , (5) where g ( · ) is an affine function of the system state. Such predicates can encode region me m bership, ob stacle clearance, no-fly zon es, kinem atic limits, and o ther safety envelopes. The STL semantics is r ecursively defined as follows [10]: ( x 0: H , k ) | = µ ⇐ ⇒ g ( x k ) ≥ 0 , (6) ( x 0: H , k ) | = ¬ ϕ ⇐ ⇒ ( x 0: H , k ) 6| = ϕ, (7) ( x 0: H , k ) | = ϕ 1 ∧ ϕ 2 ⇐ ⇒ ( x 0: H , k ) | = ϕ 1 ∧ ( x 0: H , k ) | = ϕ 2 , (8) ( x 0: H , k ) | = ϕ 1 ∨ ϕ 2 ⇐ ⇒ ( x 0: H , k ) | = ϕ 1 ∨ ( x 0: H , k ) | = ϕ 2 , (9) ( x 0: H , k ) | = F [ a,b ] ϕ ⇐ ⇒ ∃ k ′ ∈ [ k + a, k + b ] : ( x 0: H , k ′ ) | = ϕ, (10) ( x 0: H , k ) | = G [ a,b ] ϕ ⇐ ⇒ ∀ k ′ ∈ [ k + a, k + b ] : ( x 0: H , k ′ ) | = ϕ, (11) ( x 0: H , k ) | = ϕ 1 U [ a,b ] ϕ 2 ⇐ ⇒ ∃ k ′ ∈ [ k + a, k + b ] s.t. ( x 0: H , k ′ ) | = ϕ 2 ∧ ∀ k ′′ ∈ [ k , k ′ ] : ( x 0: H , k ′′ ) | = ϕ 1 . (12) STL also defines a robust semantics b y associating each formu la ϕ with a real-valued fu nction ρ ϕ ( x 0: H , k ) such that ( x 0: H , k ) | = ϕ if and only if ρ ϕ ( x 0: H , k ) ≥ 0 . The magn itude of ρ ϕ can be in terpreted as the margin by which ϕ is satisfied Natural Language Instruction 濏濣濿濸濴瀆濸澳濴瀅瀅濼瀉濸澳濴瀇澳濔瀅濸濴澳濔澳 瀊濼瀇濻濼瀁澳 濄濃澳瀆濸濶瀂瀁濷瀆濁濁濁濑 濏濦瀃濴瀇濼濴濿澳濺瀂濴濿瀆濑 濏濧濸瀀瀃瀂瀅濴濿澳 濶瀂瀁瀆瀇瀅濴濼瀁瀇瀆濑 濏濦濴濹濸瀇瀌澳瀅瀈濿濸瀆濑 濏濣瀅濸濹濸瀅濸瀁濶濸瀆濑 濏濦瀌瀆瀇濸瀀澳濣瀅瀂瀀瀃瀇濑 濏濙濸瀊激瀆濻瀂瀇澳濘瀋濴瀀瀃濿濸瀆濑 Prompt Reasoning-Enhanced NL-STL T ranslator 濏濡濟澳濜瀁瀆瀇瀅瀈濶瀇濼瀂瀁濑 濷 濷 濷 Synthetic CoT Data … LoRA GRPO LLM CoT Reasoning 濏瀇濻濼瀁濾濑濁濁濁濏濂瀇濻濼瀁濾濑 Diagnosis IIS Extraction Confilict Mapping Original NL Unrepaired STL Conflict Tuple T emporal Slack Predicate Slack Relaxation Specification Repair LLM Repair Decision STL to MILP Encoding Binary Auxiliary V ar Robustness Marg in STL-Constrained T rajectory Planner Un Solver Status Environm ent Model UAV Dynamic STL T rajectory Execution STL Generation 濏濴瀁瀆瀊濸瀅濑濁濁濁濏濂濴瀁瀆瀊濸瀅濑 Fig. 1. Overvi ew of the proposed language-gu ided planning frame work. A natural -langua ge instruction is transl ated into an ST L specific ation, which is enforce d by a constrained planner . When infeasibili ty is detec ted, solver feedba ck is used to guide specifica tion refinement and re-plannin g in a closed loop. or violated. Follo wing robust STL semantics in [41], the quantitative seman tics is defined recur sively a s: ρ µ ( x 0: H , k ) = g ( x k ) , (13) ρ ¬ ϕ ( x 0: H , k ) = − ρ ϕ ( x 0: H , k ) , (14) ρ ϕ 1 ∧ ϕ 2 ( x 0: H , k ) = min ρ ϕ 1 ( x 0: H , k ) , ρ ϕ 2 ( x 0: H , k ) , (15) ρ ϕ 1 ∨ ϕ 2 ( x 0: H , k ) = max ρ ϕ 1 ( x 0: H , k ) , ρ ϕ 2 ( x 0: H , k ) , (16) ρ G [ a,b ] ϕ ( x 0: H , k ) = min k ′ ∈ [ k + a, k + b ] ρ ϕ ( x 0: H , k ′ ) , (17) ρ F [ a,b ] ϕ ( x 0: H , k ) = max k ′ ∈ [ k + a, k + b ] ρ ϕ ( x 0: H , k ′ ) , (18) ρ ϕ 1 U [ a,b ] ϕ 2 ( x 0: H , k ) = max k ′ ∈ [ k + a, k + b ] min ρ ϕ 2 ( x 0: H , k ′ ) , min k ′′ ∈ [ k, k ′ ] ρ ϕ 1 ( x 0: H , k ′′ ) . (1 9) In this p aper, STL provide s a verifiable intermed iate spec - ification be tween NL instructions an d op timization-b a sed tra- jectory plan ning. Safety-critical requir ements are enfo r ced as hard STL constraints. T ask - related objectives are expressed via temporal op erators and may be selecti vely relaxed when the resulting STL-constrained plan ning problem is inf easible. C. Pr oblem Defi nition Consider a U A V operating in a k nown en vir onment with discrete-time dyna m ics an d a dmissible state and contro l sets. Let L d enote a NL instructio n describ ing a navigation task. The instru ction is interpr e te d as a STL specificatio n : ϕ = T ( L ) , (20) where T ( · ) d enotes a translation m odel. Giv en the in duced STL specification ϕ and a finite plannin g horizon H , the objective is to com pute a d ynamically feasible state trajec to ry x 0: H and co ntrol sequen ce u 0: H − 1 such that: x 0: H | = ϕ. (21) The r esulting tr ajectory mu st satisfy the system dyn amics, state and co ntrol con stra ints, and all safety-critical requiremen ts encoded in ϕ . The overall NL navigation pro blem add ressed in this paper can be summ a r ized by the following relation: ( L , H ) 7→ ( ϕ, x 0: H , u 0: H − 1 ) . (22) Due to ambiguity in na tural langu age or conflicts among temporal and spatial task re quiremen ts, the STL-constra ined planning p r oblem induc e d by ϕ may b e infe asible. In such cases, the goal is to restore feasibility by minimally mod ifying task-related comp onents of the specificatio n while strictly preserving all safety-critical constraints. I I I . S Y S T E M O V E RV I E W A N D F R A M E WO R K Fig. 1 shows the propo sed framework for low-altitude UA V navigation from a NL instruction L . The system translates L into a verifiable STL specification, plans a dynamically feasible trajectory un der STL and ph y sical constraints, and triggers specification repair when the planning problem is infeasible. Reasoning -Enha nced NL-S TL T ranslator . Giv en L , the translator uses a structu red prom pt and a reasoning-e n hanced LLM to g enerate an STL specification ϕ . T he m odel is trained on a synthetic NL-STL dataset a u gmented with CoT reasoning traces, and is furth er fine-tuned using parameter- efficient LoRA and GRPO to improve STL syntactic validity and sema ntic con sistency . STL-Constrained T rajectory Plann er . The plann er converts ϕ into a MILP by introd ucing bin a ry satisfaction variables for STL subform ulas and a robustness margin variable. The MILP also inco rporates the environment model and UA V d ynamics with state/con trol limits. If th e solver is fe a sible, it retur n s a trajectory f or execution, otherwise, it outpu ts an infeasibility status tha t activ ate s repair . Specifica tion Rep air . Upo n infeasibility , the sy stem extracts an IIS and maps co nflicting MILP constraints back to STL subform ulas an d time in dices to f orm a conflict tu ple. Using the orig inal NL instruction, unrepair ed STL, and co n flict tuple, an LLM selects a repa ir mo de f or ea ch conflict: temporal r elaxa tion or pr e d icate relaxation , while safety co nstraints are never rela xed . Th e selected mode is implemented via temporal or pred icate slack variables with penalties, and th e repaired STL is reconstru cted and sen t back to the p lanner f or re- solving. Input Data GRPO Optimization Reward Function Reference Model Group Computation Policy Model O 1 O 2 O N ... A 1 A 2 A N ... R 1 R 2 R N ... The UAV reaches region A within 49 seconds, reaches region B within 75 seconds, and avoids collisions with region C throughout the entire operation. CoT format reward CoT length reward STL syntax reward STL Correct Reward Region A = signal_1_n Region B = signal_2_n Region C = signal_3_n KL Divergence Update Model Fig. 2. GRPO-based RL frame work for reasoning-enhan ced NL-to-STL generati on. Overall, th e framew ork r ealizes L → ϕ → ( x 0: H , u 0: H − 1 ) with autom atic diagnosis-and -repair in the loop , en a bling safe and executable U A V navigation while keeping repairs interpretab le and min imally task-degrad ing. I V . R E A S O N I N G - E N H A N C E D N L T O S T L T R A N S L A T I O N This section p resents a reasoning-en hanced pipeline fo r translating NL in structions in to STL specifications. W e con- struct training data by augmen tin g an existing NL-to-STL dataset with explicit reason ing traces. A CoT data generatio n pipeline generates inter mediate structur ed re a soning to connec t each in struction to its target STL formula, fo rming an NL- CoT -STL corpus. W e then use th is corp us for cold star t su- pervised fine-tun ing an d apply GRPO reinfo rcement learn ing to further improve STL syntactic correctn ess and semantic consistency . Th e overall train ing pipeline, including cold-start SFT and GRPO-based reinfor cement lear ning, is illu stra te d in Fig. 2. A. S ynthetic Dataset and CoT Data Generation T o improve NL-to -STL translatio n , we b uild a syn thetic training corpus based on NL2TL [3 9]. W e ado pt NL2T L as the b ase dataset and collect 2 0 K NL - STL p airs fr om it. While NL2TL provides align e d NL instructions and STL fo r mulas, the inter mediate semantic d ecompo sition from NL to STL is missing, making it difficult f or the model to learn how linguis- tic cues ar e mapped to STL operators and tempora l co nstraints. T o ad dress this issue, eac h NL-STL pair is au gmented with an intermediate CoT , which serves as a structured bridge between the NL instructio n and the correspond ing STL specification. The CoT captures th e essential semantic decom position steps required fo r temporal logic construction , transfor ming each original NL-STL pair in to a NL-CoT -STL triplet. The overall data gen eration pip eline is illustrated in Fig. 3 . The CoT anno tations are generated using DeepSeek-V3 .1 [42]. Given a NL instruction and its g round -truth STL form ula, the m o del recon structs the k ey reasoning steps required for temporal logic co nstruction, inc lu ding pred icate identificatio n, temporal bo und extraction, operato r selection, and formula composition . These CoT traces serve as an explicit in ter- mediate re p resentation that exposes th e seman tic structure underly ing the NL-to- STL mapping . Applying this pipeline to th e selected NL2TL samples yields a syn thetic dataset with explicit sup ervision over both reasoning and final STL o utputs. This dataset forms the ba sis for the subseque n t cold -start super vised fine-tun ing stag e and improves th e structural robustness and semantic con sistency of downstream STL generation . NL-STL data Prompt NL-STL CoT NL: Within the first 62 to 88 time units , signal_1_n shall be consistently equal to 29.3 . STL: G[62:88](signal_1_n==29.3) NL: Within the first 62 to 88 time units , signal_1_n shall be consistently equal to 29.3 . STL: G[62:88](signal_1_n==29.3). CoT: The requirement specifies a time window from 62 to 88 time units, which corresponds to the bounded always operator G[62:88]. The condition that signal_1_n must be consistently equal to 29.3 throughout this interval is expressed as signal_1_n==29.3. Now, I give you the natual language and STL for it, can you give me several sentences to conclude why the nl transfor to this STL. Fig. 3. Co T Data Engine for augmenting NL2TL with intermediat e reasonin g traces. B. Cold Sta rt Stage Recent advances in reinforceme nt learning for reasoning, such as DeepSeek-R1 [43], indicate that policy op timization can imp r ove lo ng-ho rizon r easoning. Howev er , NL-to-ST L translation requ ires b oth explicit CoT reasoning and strict adheren ce to STL syntax. Acq uiring these ca pabilities purely throug h en d-to-en d r einforcem ent learn ing fro m scratch is of- ten unstab le, d ue to rigid syn tactic constraints and sparse task- lev el rewards. Theref ore, we introdu ce a cold-start supervised fine-tunin g (SFT) stage to in itialize th e model before GRPO training. The cold - start stage targets form at alignment and rea so n- ing indu ction. Given the rigidity of STL syn tax and the scarcity o f tem poral-lo g ic structures in general pre- tr aining corpor a, promp t-based appr oaches alo ne are insu fficient to guaran tee consistently well-fo rmed ou tputs. W e fine - tune the model on the synthetic NL-CoT -STL dataset constructed in the previous section, where each sample fo llows a structur ed output pattern: the interme diate rea so ning is enclosed within tags, and the final STL specification is enclo sed within tags. W e optimize the mode l using a standard max imum-likelihoo d objective, an d implement SFT via param eter-ef ficien t fine-tun in g with L ow-Rank Adap ta tio n (LoRA) [44] to reduce me m ory and comp utational overhead while preservin g the b a se m o del’ s general languag e capab ility . This cold start initialization encourag es the m odel to pro- duce coher e nt reasoning traces and syntactically valid STL formu las in a stable and con trollable mann er . By reducin g the sear c h space and variance encou ntered in subsequent reinfor c e ment learn ing, it enables the GRPO op timization stage to focu s on improving g lobal stru c tural validity and semantic fidelity , r ather than correcting low-le vel formatting errors. C. GRPO Reinfo r cement Learn ing Although SFT en ables the mod e l to im itate correct ou tputs, it optimizes token-level likelihood rather than global structu ral validity . For struc tu red outputs such as STL f ormulas, mino r token erro rs can in validate the en tire for m ulas. Reinforceme nt learning addresses this issue by directly o p- timizing task-level rewards. While Pro x imal Policy Op timiza- tion (PPO) [45] or Group -based Pr oximal Policy Optimizatio n (GPPO) [46] ar e a widely adop ted RL metho d fo r LLMs, it requires an addition al value network and often suffers from high com p utational cost and training instability . T o overcome these limitations, we adop t GRPO [4 2]. GRPO elimin ates the need fo r a value ne twork b y estimating the baseline u sing group -lev el rew ard statistics. W e explicitly structure the model output into a reasoning process enclosed by tags and a final STL formula enclosed by tags. The overall RL pipeline follows three stages including policy sampling, re ward computation , and p olicy update. Giv en an in put instru ction, we sample a gr o up of G can - didate ou tputs { O i } G i =1 from the cu r rent policy π θ old with a relativ ely hig h temp erature co efficient τ to encourage explo- ration and o utput diversity . For each sampled output O i , we compute a comp osite rew ar d by summing f our comp onents: R i = R CoT format + R CoT length + R STL syntax + R STL correct . (23) Each r eward term is com p uted fro m the samp led outpu t O i . T o enc o urage the structur al integrity of th e CoT fo rmat, we define th e CoT format reward with three cases: R CoT format = + k 1 , if b oth , exist + k 2 , if exactly o ne exists , + k 3 , if n either exists . (24) T o encour age sufficient r easoning while penalizing ver - bosity , we define the CoT le ng th reward : R CoT length = k 4 · min L CoT , L max , (25) where L CoT is the nu m ber of tokens inside the span and L max is the maximum allowed CoT leng th . T o en force syntactic validity of the genera te d STL, we define th e STL synt ax reward : R STL syntax = ( + k 5 , if ST L syntax is valid − k 5 , otherwise (26) where validity r e quires that all variables belong to the p re- defined set an d that all temp o ral o perators are stru c turally complete. T o provide a dense similarity-b ased signal, we d efine a BLEU-based STL correctness reward : R STL correct = k 6 · BLE U , (27) where BLEU [ 4 7] is computed by com b ining n -g ram precision and a brevity penalty: BLEU = BP · ex p N X n =1 w n log p n ! . (28) Here, p n is th e mo dified n -gr am precision of order n b e- tween the gen erated a n d reference STL sequ e n ces, w n is the correspo n ding weight with P N n =1 w n = 1 , and BP is the brevity p enalty determin ed by the length s of the generated and reference sequences. W e use { k j } 6 j =1 as scalar re ward coefficients to balan ce the contributions of the reward ter ms. After reward com putation, we p erform policy update using GRPO. For each input, the rewards R = { R i } G i =1 of the sampled g r oup ar e no rmalized r elati ve to the g roup mean and variance, yielding the group -relative ad vantage: ˆ A i,t = R i − mea n ( R ) std ( R ) , (29) where t ∈ { 1 , . . . , | O i |} indexes tokens in th e generated tra- jectory O i . T he same gro up-relative advantage ˆ A i,t is assigned to all tokens in the trajectory O i . The GRPO objective comb ines a clipped policy update with KL r egularization against a reference p olicy: L GRPO ( θ ) = E " 1 G G X i =1 1 | O i | | O i | X t =1 L CLIP ( θ, ˆ A i,t ) − β D KL π θ k π ref # . (30) For e a ch token po sition t in th e trajector y O i , we co mpute the importance ratio: r i,t ( θ ) = π θ ( a i,t | s i,t ) π θ old ( a i,t | s i,t ) , (31) where a i,t denotes the t - th gen e rated token in trajector y O i , and s i,t denotes the correspond ing generatio n co ntext, consisting o f the inpu t an d the previously generated to kens. W e then d efine the clipped surrogate objective: L ( i,t ) CLIP = min r i,t ( θ ) ˆ A i,t , clip ( r i,t ( θ ) , 1 − ǫ, 1 + ǫ ) ˆ A i,t . (32) The policy paramete r s are th en upd ated by gradient a scent on L GRPO ( θ ) . The comp le te GRPO training proced ure, inc lu ding cold- start initialization a n d main reinforcem ent lea r ning phases, is summarized in Algo rithm 1. Algorithm 1 GRPO Training for NL-to-STL Translator with Chain-of- Thoug h t 1: Pha se 0: Co ld Start 2: Build D of NL-CoT -STL triplets with and tags; SFT to ob tain π (0) θ ; 3: In itialize π θ ← π (0) θ and r eference po licy π ref ← π (0) θ ; 4: Pha se 1: GRP O Main T raining 5: for k = 1 t o K do 6: Set π θ old ← π θ ; 7: Sample a grou p { O i } G i =1 from π θ old ( · | x ) with tem per- ature τ ; 8: Compute rewards { R i } G i =1 where R i = R CoT format + R CoT length + R STL syntax + R STL correct ; 9: Compute µ = mea n( { R i } ) , σ = std( { R i } ) , and set ˆ A i,t = ( R i − µ ) / ( σ + ε ) fo r all t = 1 , . . . , | O i | ; 10: Compute L GRPO ( θ ) = 1 G P G i =1 1 | O i | P | O i | t =1 L ( i,t ) CLIP − β D KL [ π θ k π ref ] , wh ere 11: r i,t ( θ ) = π θ ( a i,t | s i,t ) /π θ old ( a i,t | s i,t ) and 12: L ( i,t ) CLIP = min r i,t ( θ ) ˆ A i,t , clip ( r i,t ( θ ) , 1 − ǫ, 1 + ǫ ) ˆ A i,t ; 13: Update b y grad ient ascent: θ ← θ + η ∇ θ L GRPO ( θ ) ; 14: end for V . S T L - C O N S T R A I N E D T R A J E C T O RY P L A N N I N G A N D S P E C I FI C A T I O N R E PA I R Giv en the discrete-time UA V dynamics and the STL speci- fication gen erated fr om NL instructio ns, we formulate the tra- jectory gener ation pr oblem as a constra ined plan ning p roblem over a finite horizon . This section presents a unified for m ula- tion that integrates STL- c onstrained trajectory planning with feasibility d iag nosis and specification rep air . A. MILP Enco ding of STL Satisfaction Giv en the discrete-tim e UA V dynam ics and the STL spec- ification ϕ tr anslated fro m the NL instructio n by the LLM, we enco d e satisfaction of ϕ over a finite plannin g ho rizon H using a MILP formulation . For each STL subformu la ψ and discrete time step k , we introdu c e a binar y auxiliary variable z ψ ,k ∈ { 0 , 1 } indicatin g whether ψ is satisfied at tim e k . The atomic p redicate µ is defined as an ineq uality over the system state x k of the form µ ( x k ) ≥ 0 . The im plication betwe e n the b inary variable and predicate satisfaction is en coded using Big- M constraints as µ ( x k ) + (1 − z µ,k ) M ≥ γ , µ ( x k ) − z µ,k M ≤ γ , (33) where M > 0 is a sufficiently large constant and γ ≥ 0 is a global robustness margin variable. Unlike the quantitative STL robustness ρ ϕ ( x 0: H , k ) defin ed in Section I I-B, which evalu- ates the satisfaction margin o f a formula on a g iv en tr ajectory , γ is an optimization variable th a t enforces a un iform lower bound on pre d icate satisfaction across the entir e trajector y . Boolean c o mpositions of STL fo rmulas are encode d recu r- si vely . For a conju nction ψ = V i ψ i , we impose: z ψ ,k ≤ z ψ i ,k +∆ i , ∀ i, (34) and f or a disjunction ψ = W i ψ i : z ψ ,k ≤ X i z ψ i ,k +∆ i , (35) where ∆ i denotes the relative time o ffset ind uced by the syntax tree of ψ . T em p oral op erators are unfold ed over the ir discrete-time intervals accord ing to the STL semantics defined in Sectio n II- B. Specifically , G [ a,b ] operator s are expand ed as con junctions over { k + a, . . . , k + b } , while F [ a,b ] operator s are expanded as disjunctions over the same in terval. The until operator U [ a,b ] is han dled analogou sly usin g its standard Boolean exp a n sion. All resu ltin g Boolean constrain ts are enco ded using Equation s (34)–(35). Satisfaction of the overall STL specification is enfo rced by introdu c ing a roo t variable z ϕ, 0 and r equiring : z ϕ, 0 = 1 , γ ≥ 0 . (36) Any feasible solution to th e resulting MILP theref ore corre- sponds to a trajectory x 0: H that satisfies ϕ . B. S TL-Constrained Optimization F ormulation Combining th e system dynamics, state and control con - straints, and the M I LP enco d ing o f STL satisfaction, we o btain the fo llowing mixed-in teger optimization problem over the horizon H : min . x 0: H , u 0: H − 1 , z , γ − γ + H − 1 X k =0 x ⊤ k Qx k + u ⊤ k Ru k (P1) s.t. x 0 = x fixed , (37a) x k +1 = Ax k + Bu k , ∀ k = 0 , . . . , H − 1 , (37b) x k ∈ X , u k ∈ U , ∀ k = 0 , . . . , H − 1 , (37c) (33) − (36) . The o b jectiv e in Problem (P1) seek s a traje c to ry that max- imizes the glo bal robustness margin γ while penalizing state deviation and control e ffort thr o ugh q u adratic regularizatio n. Feasibility with γ ≥ 0 guarantees satisfaction of the STL specification, and larger values of γ cor r espond to increased robustness against p erturbation s. C. F easibility Certification and IIS E xtraction If the solver dec la r es Problem (P1) inf easible, we perf orm feasibility diagn osis u sin g an IIS, which id entifies a minimal set of con stra in ts and variable boun d s that can not b e satisfied simultaneou sly . W e adopt the IIS-based diagnosis procedure in [36] to localize in f easibility at the spec ificatio n level. T o e nable sem antic interpr etation of solver f e edback, each STL-indu c ed constraint in the MILP is associated with a traceability record that links it to the orig inating STL sub- formu la and time index. Usin g this mappin g, th e co nstraints contained in the I IS are pr o jected back to the STL sy n tax tree and summarized as a set of infeasible ato m ic events, each character ized b y an atom ic predicate and its associated temporal context. This diagnosis result localizes the source of infea sibility in a form suitable f or high -level r easoning and serves as stru ctured inpu t to the subsequent sp ecification re pair stage. D. LLM-Guid ed Rep air via P r edica te-T e mporal Choice When the STL-constra in ed optimizatio n problem is infea- sible, o u r objective is to r estore feasibility b y minim a lly modify in g task-related req uirements while strictly preserv ing all safety-critical con straints. Un like prio r appr oaches tha t require th e design er to m anually specify wh ich pr edicates or temporal p arameters ar e eligible for repair , we delegate the choice of repair dimen sion to a LLM, which op e rates at the semantic level and does not directly m anipulate nu merical optimization variables. For each infeasible ato mic e vent identified by the IIS- based diagnosis in Section V -C, the L L M is provid ed with a structu red input that co mbines symb olic, temporal, and semantic information . Specifically , the input co nsists of: (i) the origin al NL instruction fro m which the task specification was gen erated, (ii) the unrepair ed STL specification ϕ , and (iii) a structured description o f each diagnosed atomic event represented as a tu p le ( µ, σ , O , R ) , w h ere µ denotes the atomic predicate inequality , σ is its discrete - time support interval induced by the STL sem a n tics, O is th e p a rent tempor a l operator in the STL syn tax tree, and R characterizes the semantic role of the p redicate. This rep resentation allows the LLM to r eason jointly over th e or iginal task in tent, the formal specification structure, and the loc alized in feasibility infor- mation, witho ut exposing solver-le vel variables or numerical relaxation p arameters. Safety-critical predicates, such as obstacle av oidance and no-fly constraints, are explicitly excluded from this interface and ar e never co nsidered fo r relaxa tion. Giv en th is in put, the LLM outpu ts a binary decision select- ing exactly one admissible rep a ir mode for each infeasible atomic e vent: pr ed icate relaxation or temporal r ela xation . T o preserve in terpretability and a clear separatio n between spatial and tempo ral mo difications, mixed r epairs for a single atomic event are intentionally disallowed. The LL M d oes no t propo se relax ation mag nitudes, threshold values, o r mo dified temporal bounds; it only determines wh ic h repa ir dimen sion is permitted, while the q u antitative extent of repair is determ ined entirely by the optimization lay e r . The LLM’ s decisions are then realized at the nu merical lev el thro ugh selecti ve relaxa tion in the MILP . If pr edicate relaxation is selected for an a to mic event, a no nnegative slack variable s µ,k ≥ 0 is introd uced to relax the corresponding Big- M c onstraint: µ ( x k ) + (1 − z µ,k ) M + s µ,k ≥ γ . (38) If temp oral relaxation is selected, nonn egati ve tem poral slack variables τ ψ ,k are intro duced to relax the Boolean con stra in ts arising f rom temporal opera tor expansion. For ato m ic events deemed non-n egotiable, n o relaxatio n variables are introd uced and th e corre sponding constraints remain un changed . After intro ducing the selected relaxations, we r esolve the optimization p roblem with an au gmented o bjectiv e that pe nal- izes the total relaxation mag nitude: min . − γ + X k x ⊤ k Qx k + u ⊤ k Ru k + λ p X s µ,k + λ t X τ ψ ,k , (39) where λ p , λ t > 0 weight the relati ve cost of p r edicate and temporal relax ations. The op timized relaxation values p rovide quantitative guidan ce for constructin g an explicit repaired STL specification ˜ ϕ : p r edicate r elaxations are tran slated into adjusted num erical threshold s o r geometr ic p arameters, while temporal relaxations ar e m a pped to mod ified temp o ral bo unds. Finally , ˜ ϕ is reconstructed withou t slack variables and re- encoded into a stand ard STL-co nstrained MILP; solving this problem y ields a dy n amically feasible trajector y that satisfies ˜ ϕ exactly . The introdu ction of the LLM does not mod ify the un- derlying IIS-based diag nosis o r th e optimization -based rep a ir mechanism. For any fixed set of LLM r e pair-mode selections, feasibility an d quantitativ e minimality a r e enfo rced by the MILP for m ulation, while the LLM influen ces only the ad- missible r epair dimen sio n for each diagno sed atomic event. V I . E X P E R I M E N TA L E V A L UAT I O N This section pre sen ts a compr e hensive exper imental evalu- ation of th e propo sed NL navigation fram ew o rk. Th e experi- ments are d esigned to assess the effectiveness of the system at mu ltiple levels , rangin g fro m NL und erstanding an d form al specification gene r ation to closed-loo p trajectory plannin g and execution. W e first ev alu ate th e perfor mance of th e propo sed reasoning -enhanc e d NL-to-STL translation model in isolation, focusing on its syntactic corr e c tness with r espect to gr o und- truth temp oral log ic specifications. W e then assess the full planning framework in simulation , wher e the translated STL specifications are integrated with th e MILP- based trajector y planner . These experiments ev aluate th e system’ s ability to generate d ynamically feasib le trajector ie s, enforce safety- critical constraints, and r e cover f r om infeasible spe cifications throug h selecti ve r e pair . Finally , we demon stra te the appli- cability of the propo sed approach in real-world exp e r iments using a quadrotor U A V p latform in a representative searc h - and-rescu e task. A. NL-to- STL T ranslation Exp eriments This subsection ev aluates the prop osed reasoning -enhan ced NL to STL T ranslation mode l, with an emph a sis o n its ab ility to gen erate syntactically valid and semantically accurate STL specifications f r om NL instruction s. The translator is built upon Qwen2.5-0. 5B-Instruct and trained u sing the reasoning- enhanced pipeline describe d in Section IV . All exper iments ar e cond ucted on the NL2 TL dataset [3 9], where 20K NL- STL pairs are used for tra ining and an ad- ditional 4K samples are h eld out for testing . W e comp are against several representative baselin es, in c lu ding a Llama2- finetuned mode l [40], a T5- fin etuned m odel [39], the o r iginal Qwen2.5- 0.5B-Instru ct mo del witho ut task-specific adaptatio n, and a Qwen2.5-0 .5B-Instru ct mod el fine- tuned on NL-STL pairs without CoT supervision . All fine- tuned baselines are trained on the sam e d ata split a n d e valuated u nder identical decodin g con straints. T ra n slation perfor mance is measured using accuracy , de- fined as the exact-match rate betwee n the g enerated STL specification and th e grou nd-truth formula after cano nical normalizatio n of syntax and oper ators.This m etric is intention- ally strict, as any syn tac tic erro r , incorrect temporal bo und, or logica l mismatch ren ders the o utput incorr ect, reflecting the req uirement th at gene r ated specification s m ust be directly executable by downstream formal planners. Fig. 4 shows the conv ergen ce o f th e p roposed NL-to- STL translatio n mo d el acr o ss training stages. SFT denotes supervised fine-tuning o n data without CoT . The accuracy improves steadily as the model learns the rigid STL g rammar and its align ment with natural- language expressions, b u t the gains gradually satu rate. The co ld-start stage corr esponds to supervised fine-tuning on data with CoT , which yields a clear jump in accu racy , sug g esting that CoT provides useful inter- mediate reason ing sig n als that im prove structural cor rectness. After introdu cing GRPO, th e mo del achieves f u rther impr ove- ments with a smo other, m ore stable tra jectory , indicating that 2 4 6 8 10 Step 0.4 0.5 0.6 0.7 0.8 0.9 1.0 A ccuracy SF T Cold Start GRPO Fig. 4. NL -to-STL translati on accurac y during SFT and GRPO trainin g. rew ar d-based optimization can b e tter correc t seque n ce-level and g lo bal structu r al errors tha t are difficult to elimin ate using token-level likeliho od training alone. T ABLE I N L - T O - S T L T R A N S L A T I O N P E R F O R M A N C E C O M P A R I S O N . Method Model size Accuracy (%) Llama2-finetuned 13B 94.8 T5-finetuned 220M 93.1 Qwen2.5-0.5B (prompt-only) 0.5B 45.2 Qwen2.5-0.5B-finetuned w/o CoT 0.5B 93.3 Qwen2.5-0.5B-finetuned w/ CoT (Proposed) 0.5B 98.0 Quantitative results are summarized in T able I. The pro mpt- only Qwen2. 5 -0.5B-I n struct mod el achiev es an ac curacy of only 45 . 2% , h ig hlighting the difficulty of reliably gener ating well-forme d STL specifications without task-spe cific adap- tation. Supervised fine-tun ing withou t CoT already y ie ld s a substantial improvement, raising a ccuracy to 93 . 3% , which is compara b le to or slightly better than the T5-fine tu ned baseline. The Llama2-finetun ed mod el ac h iev es 94 . 8% accuracy but relies o n a significantly larger p a r ameter coun t. By contrast, the pro posed Qwen 2.5-0 . 5B-Instruct model fine-tuned with CoT supervision and GRPO achieves an accuracy of 98 . 0% , outpe r forming all baselines wh ile using substantially f ewer p a r ameters. Com pared with the same back- bone trained witho ut CoT , this cor responds to an absolute improvement of 5 . 8 perc entage points, demonstrating that explicit reasoning sup ervision p lays a critical role in reducing temporal- logic composition er rors. Th ese results indicate th at for NL-to-STL tr anslation, semantic align ment and structural reasoning are more decisive than raw model scale, and that the p roposed reasoning-enha n ced train ing pipeline is highly effecti ve for produ cing executable form al specifications. B. S imulation Naviga tion Experiments In this su b section, we ev alua te the propo sed NL n avigation framework in simulation. All simulation s ar e co nducted in a 100 , m × 100 , m planar w o rkspace at a fixed altitude, with static o bstacles a nd n o-fly zon es. The UA V d y namics follow the discrete-tim e linear model described in Section II with sampling time ∆ t = 1 , s , and th e co ntrol lim its enfo rce a max - imum speed o f v max = 5 , m / s and a maximum ac c eleration of a max = 1 , m / s 2 throug hout the plan ning horizon . 1) F easible STL T ask Examples: W e first p resent rep resen- tati ve task instances who se NL instruction s can b e translated into feasible STL specifications withou t any repair . These examples ar e used to illustrate th e correspond ence between languag e -lev el task descr iptions, the translated STL form ulas, and th e resulting dynam ically feasible trajec tories. Fig. 5. Simulated traject ory for T ask 1. The U A V reaches the goal region within the specified time windo w while avo iding obstacles. T ask 1 ( Disjunctive intermediate goals with obstac le avoidance). Th e NL instruction is: “The U A V must av oid th e obstacle ( R o 1 ) and ( R o 2 ) before reach ing a goal ( R g ) . Alon g the way , the U A V mu st reach on e o f two intermediate targets ( R t 1 ) or ( R t 2 ) . ” This instru ction is translated by the p roposed tran slation module into the following STL specification : ϕ 1 = F [0 ,T ] R t 1 ∨ F [0 ,T ] R t 2 ∧ F [0 ,T ] R g ∧ G [0 ,T ] ¬R o 1 ∧ G [0 ,T ] ¬R o 2 . (40) The resulting ST L specification is d irectly enforced b y the MILP-based p lanner, wh ic h successfu lly finds a dy n amically feasible trajectory satisfying all co nstraints. Fig. 5 v isualizes the planned trajecto r y , where the UA V reaches on e of the intermediate targets an d sub sequently enters the goal region while maintaining safe clearance fro m the obstacle region. T ask 2 (Sequential authorization and constrained pas- sage). The NL instruction is: “The U A V must enter designa te d autho rization r e- gions R k before it is allowed to pass thro ugh the correspo n ding r e stricted air corrid o rs or gates R d and fina lly re a ch the ta rget location R g . Th rough out the mission, the U A V is r equired to av o id obstacle ( R o 1 ) , ( R o 2 ) an d ( R o 3 ) . ” Fig. 6. Simulate d trajectory for T ask 2. The U A V sequentially reache s the specified targe t regions while av oiding the no-fly zone at all times. This instru ction is translated by the p roposed translation module into the following STL specification : ϕ 2 = ¬R d U [0 ,T ] R k ∧ F [0 ,T ] R g ∧ G [0 ,T ] ¬R o 1 ∧ G [0 ,T ] ¬R o 2 ∧ G [0 ,T ] ¬R o 3 . (41) Here, the until operato r cap tu res the temp oral depende ncy that the UA V is prohib ited from entering the restricted corridor or gate region R d until it has visited the corresp onding authorizatio n region R k . Continuou s av o idance of no-fly zones and o bstacles is encoded as global safety c onstraints. T he planner succe ssfully synthesizes a trajector y that satisfies the seque ntial access requ ir ement w h ile respectin g all safety constraints, as illustrated in Fig. 6. 2) Specificatio n Rep air E xample: W e next conside r a rep- resentative UA V task in which a NL instruction ind uces an STL specification that is infe a sible und er the giv en dynamics and environmental constraints. This examp le dem onstrates how the propo sed lang uage-g uided specification r epair mech- anism restores feasibility by selecti vely relax ing temporal requirem ents, while preserving all safety -critical constraints and g eometric task definition s. Fig. 7. Simulated traj ectory for the repaire d task. The U A V reac hes the goal regi on after temporal repair while continuo usly av oiding the obstacle. Fig. 8. Executed UA V trajec tory for the real-world search-and-rescu e expe riment. T ask 3 (Time-constrained goa l reaching with o bstacle avoidance). T he NL in struction provided by th e hu man oper- ator is: “The U A V m ust alw ay s a void the o bstacle region ( R o ) an d reach the goa l region ( R g ) within 20 s . ” This instru ction is translated by the p roposed tran slation module into the following STL specification : ϕ 3 = G [0 ,T ] ¬R o ∧ F [0 , 20] R g , (42) where R o denotes the obstacle region and R g denotes the goal region . The glo bal operator G enfo rces co ntinuou s ob- stacle av oida n ce over the entire plan ning h orizon, while the ev e n tually op erator F [0 , 20] requires the UA V to r each the goal within a strict deadline o f 20 s . Under th e system d ynamics an d contro l limits descr ibed earlier in th is section, the re sulting MI LP p roblem is infeasible. The o b stacle lies directly be tween the initial p osition an d the goal, forcin g the UA V to execute a detou r . Giv en th e boun ded velocity an d acceler ation, the solver determines that the U A V cannot reach R g within the spec ified 20 s without violating either the dynamic constraints o r the obstacle-avoidance r e - quiremen t. Upon detecting infeasibility , the specification repair proce- dure is autom atically triggered . T he MILP solver comp u tes an IIS, and the conflictin g co nstraints are tr aced back to the STL subform ulas using th e constraint- to -specification mappin g. In this case, the conflict is localized to th e tempo r al bound of the reachability r e quiremen t F [0 , 20] R g . Guided by the LLM, the rep a ir m odule is instructed to relax only the tem poral compon e nt o f the task, wh ile keeping all geometric pr e d icates uncha nged. In particu lar , the size and location o f the g oal r egion R g and the obstacle-avoidance constraint G [0 ,T ] ¬R o are treated as saf ety-critical and are no t modified. A slack variable is introd uced on the upper bound of the e ventu ally op erator, and a weigh ted ℓ 1 penalty is used to minim ize the amoun t of temp oral relaxation . As a result, th e r epaired specificatio n beco mes: ϕ repair 3 = G [0 ,T ] ¬R o ∧ F [0 , 30] R g , (43) correspo n ding to an extension of the rea c hability d eadline from 20 s to 30 s . The repaired STL specification is th en enforce d by the MILP-based planner, wh ich successfully syn thesizes a d ynam- ically feasible trajectory . As shown in Fig. 7, th e U A V detour s around R o , maintain s co ntinuou s safety , an d reaches the original goal region R g within the r elaxed tempo ral wind ow . This examp le illustrates that the pro posed f r amew ork can reconcile high- le vel time-critical langu age instructio ns with low-le vel dynamical feasibility th rough targeted, interpretable specification repa ir, without com promising safety constraints or task semantics. C. Real- W o rld Experiments T o validate th e propo sed fram ew o r k un der real-world co n- ditions, we conduct a real- W orld experimen t in volving an U A V perfor ming a search - and-rescu e task guided by NL instructions. 1) Experimental Se tu p: The exper iments are carried out using a DJI Matrice 300 R T K quadro tor eq uipped with an (a) UAV platform (b) Annotated test site map Fig. 9. Real-world experimenta l setup for the search-a nd-rescue task. (a) DJI Matrice 300 R TK equip ped with an onboard 4G/5G wirele s s signal acquisition and locali zation devi ce. (b) Annotat ed test site map sho wing designated rescue areas and no-fly zones. onboa r d 4G/5 G wireless signal acquisition a n d localiza tio n de- vice. The U A V operates in an outd o or environmen t with a pre- defined bound ed workspace th at includes designated search- and-rescu e region s an d safety-restricted no -fly zones. State estimation is p rovided by the onboard R TK position ing system and fused in e rtial measuremen ts, while LLM translation and STL plan n ing are per formed offboard and tr ansmitted to the U A V via a wireless commu n ication link . Fig. 9 illustrates the real-world exper im ental setup, includin g the U A V p latform and the annotated test site with rescue areas an d no- fly zo nes. 2) Sear ch-a nd-Rescue T ask Description : W e consider a representative search- and-rescu e scenario in which the UA V is instructed to search multiple designated regions within a fixed time window wh ile co ntinuou sly av oidin g all no - fly zones. The task is specified through th e following NL instruction: • “Search the three rescue areas R s 1 , R s 2 , and R s 3 within 60 seco n ds, while av oidin g all no-fly zon es. ” This instru ction is translated by the p roposed tran slation module into the following STL specification : ϕ SAR = F [0 , 60] R s 1 ∧ F [0 , 60] R s 2 ∧ F [0 , 60] R s 3 ∧ G [0 , 60] ¬R nf 1 ∧ G [0 , 60] ¬R nf 2 ∧ G [0 , 60] ¬R nf 3 ∧ G [0 , 60] ¬R nf 4 ∧ G [0 , 60] ¬R nf 5 . (44) Here, R s 1 , R s 2 , an d R s 3 denote the three designated search regions, while R nf 1 – R nf 5 represent the in dividual no-fly zones in the environment. The glob al oper ator G [0 , 60] enforce s continuo us av o idance of all no-fly regions over the entire mission hor izon, en coding safety-critical constraints th at must never be violated. Th e e ventually o p erators F [0 , 60] require that each sear ch region be visited at least once with in the 60 -second time window , with o ut impo sing a strict visitation order amon g them. The resulting S TL specification captures the essential requirem ents of the search-a n d-rescue task, namely time - bound ed coverage of all designated search areas and persistent av oidan ce of restricted airspace, and is sub sequently enforced by the MIL P- based planner to gener ate a dy namically feasible patrol tr ajectory . 3) Experimental Resu lts: Fig. 8 shows the executed pa- trol tr ajectory d uring th e search -and-r e scue task. The U A V successfully completes the patrol within the prescribed time window while respec ting all safe ty constraints specified by the STL for mula. Dur ing the mission, the U A V detected a victim waiting for rescue inside the region R s 1 and repo rted the finding to the gr o und op erator for su bsequent response. The experiment demonstrates th at the propo sed lang uage-g uided planning fram ew o r k can be deployed on a real UA V system and can reliably execute co mplex, tempo rally constrain ed tasks derived fro m NL instructions. V I I . C O N C L U S I O N This pape r p resented a u n ified framework fo r NL low- altitude UA V n avigation b y tran slating free-f orm instructions into STL sp e cifications a nd synthesizing dy namically fea- sible trajectories und er formal constrain ts. By integrating a reasoning -enhanc e d LLM with MILP-based STL- constrained planning , the pr o posed approach enables r obust NL-to-STL translation while rigorously enforcing safety-critical require- ments. A solver -in -the-loop specification repair mechanism was further introduced, in which an LL M provides sema n - tic g uidance to selecti vely relax non-safety-c ritical task con- straints while strictly p reserving safety guaran tees. E xtensive simulation resu lts an d real-world flig ht experim ents demon- strate that th e pro posed framework achieves safe, interpretable, and adaptable UA V n avigation in complex low-altitude envi- ronmen ts. Future work w ill f ocus on extending th e fr amew o rk to partially observable en vir onments, mu lti-U A V coord ination, and o nline adaptatio n under d ynamic task revisions. R E F E R E N C E S [1] S. K. K. Hari, S. Rathina m, S. Darbha, K. Kalyanam, S. G. Manyam, and D . Casbeer , “Optimal UA V route planning for persistent monitoring missions, ” IEEE T ransactions on Robotic s , vol. 37, no. 2, pp. 550–566, 2020. [2] Y . Ping, T . Liang, H. Ding, G. Lei, J . W u, X. Zou, K. Shi, R. Shao, C. Zhang, W . Zhang, W . Y uan, and T . Z hang, “Multimodal large languag e models-enable d UA V swarm: T owa rds ef ficient and int ellig ent autonomous aerial systems, ” IE EE W irele ss Communicat ions , pp. 1–9, 2025. [3] M. Chen, L. Y ang, J. Cao, G. Z hu, W . Y uan, H. Jiang, and D. Niyato, “Car go U A Vs pick-up systems for low-a ltitude economy with commu- nicat ion quality , battery ener gy , and time window constraints, ” IEEE T ransacti ons on Mobile Computing , pp. 1–18, 2025. [4] T . Liang, H. Ding, Y . Ping, T . Zhang, L . Zhou, Q. Zhang, and T . Q. Quek, “Satel lite-assisted UA V control: Sensing and communicat ion scheduling for energ y ef ficient data collect ion, ” IEEE Internet of Things Journal , 2025. [5] R. Zhang, G. Liu, Y . Liu, C. Zhao, J. W ang, Y . Xu, D. Niyato, J. Kang, Y . Li, S. Mao et al. , “T oward edge general intellig ence with agentic AI and agentifica tion: Concepts, tec hnologies, and future directi ons, ” arXiv pre print arXiv:2508.18725 , 2025. [6] H. Liu, G. W u, L . Zhou, W . Pedrycz, and P . N. Suganthan, “T angent- based path planning for UA V in a 3-D low altit ude urban en vironment, ” IEEE T ransac tions on Intellig ent T ransportation Systems , vol. 24, no. 11, pp. 12 062–12 077, 2023. [7] R. Shao, W . Li, L. Zhang, R. Z hang, Z. Liu, R. Chen, and L. Nie, “Large VLM-based vision-lan guage-a ction models for robotic manipula tion: A surve y , ” arXiv preprint arXiv:2508.13073 , 2025. [8] S. T elle x, N. Gopalan, H. Kress-Gazit , and C. Matuszek, “Robots that use langu age, ” A nnual R ev iew of Contr ol, R obotic s, and Autonomous Systems , vol. 3, no. 1, pp. 25–55, 2020. [9] B. Quartey , E. Rosen, S. T ellex, and G. Konida ris, “V erifiably follo wing comple x robot instructions w ith founda tion model s , ” in 2025 IEEE Internati onal Confer ence on Robotics and Automatio n (ICRA) . IEEE, 2025, pp. 1–8. [10] C. Belta and S. Sadraddini , “Fo rmal methods for control synthesis: An optimiza tion perspect ive , ” Annual Revie w of Contr ol, Robotics, and Autonomou s Systems , vol. 2, no. 1, pp. 115–140, 2019. [11] S. Liu, H. Zhang, Y . Qi, P . W ang, Y . Zhang, and Q. W u, “ Aeri- alVLN: V ision-and-lan guage navigat ion for U A Vs, ” in Proce edings of the IEEE/CVF International Confer ence on Computer V ision , 2023, pp. 15 384–15 394. [12] M. Chandarana , E. L. Meszaros, A. Truj illo, and B. D. Allen, “’Fly Like This’: Natural langua ge interface for uav mission planning, ” in Internati onal Confer ence on Advances in Computer-Human Interaction s , no. NF1676L-26108, 2017. [13] M. Chandarana, E. L. Meszaro s, A. Tr ujill o, and B. Danett e Allen, “Nat- ural language based multimodal interf ace for U A V mission plannin g, ” in Proc eedings of the Human F actor s and Ergonomics Society Annual Meeti ng , vol. 61, no. 1. SAGE Publicati ons Sage CA: Los Angeles, CA, 2017, pp. 68–72. [14] F . Y ao, Y . Liu, W . Zhang, Z . Zhu, C. Li, N. Liu, P . Hu, Y . Y ue, K. W ei, X. He et al. , “AeroV erse-Re vie w: Comprehensi ve survey on aerial embodied vision-and-lang uage naviga tion, ” The Inno vation Informatics , vol. 1, no. 1, pp. 100 015–1, 2025. [15] P . Anderson, Q. W u, D. T ene y , J . Bruce, M. Johnson, N. S ¨ underhau f, I. Reid, S. Gould, and A. V an Den Hengel , “V ision-and -langua ge navi gation: Interpretin g visually-ground ed navigati on instructions in real en vironments, ” in Proce edings of the IEEE confer ence on computer vision and pattern re cogn ition , 2018, pp. 3674–3683. [16] K. Narasimhan, T . Kulkarni, and R. Barzila y , “Language understanding for text -based games using deep reinforcement learning, ” in Pr oceedings of the 2015 Confere nce on Empirical Methods in Natural Languag e Pr ocessing , L. M ` arquez, C. Callison-Burch, and J. Su, Eds. Lisbon, Portugal : Association for Computational Linguistics, Sep. 2015, pp. 1–11. [Online]. A vail able: https:/ /acla nthology .org/D15- 1001/ [17] X. W ang, Q. Huan g, A. Celikyil maz, J. Gao, D. Shen, Y .-F . W ang, W . Y . W ang, and L. Zhang, “Reinfor ced cross-modal matchi ng and self-superv ised imitation lear ning for visi on-langu age navigati on, ” in Pr oceedings of the IEE E/CVF confer ence on compute r vision and pattern rec ogni tion , 2019, pp. 6629–6638. [18] Y . Hong, Q. Wu, Y . Qi, C. Rodrigue z-Opazo , and S. Gould, “VLN BER T: A recurrent vision-and-l anguage BER T for navigat ion, ” in Pro- ceedi ngs of the IEEE/CVF confe re nce on Computer V ision and P attern Recogn ition , 2021, pp. 1643–1653. [19] W . Zhang, C. Gao, S. Y u, R. Peng, B. Zhao, Q. Zhang, J. Cui, X . Chen , and Y . Li, “CityNav Agent: Aerial vision-an d-langua ge navigat ion with hierarc hical semantic planning and global memory , ” arXiv preprint arXiv:2505.05622 , 2025. [20] P . Saxe na, N. Raghuv anshi, and N. Gov eas, “U A V -VLN: End-to- End vision language guide d naviga tion for ua vs, ” arXiv pre print arXiv:2504.21432 , 2025. [21] J. Lee, T . Miyanishi, S. Kurita, K. Sakamoto, D. Azuma, Y . Matsuo, and N. Inoue, “Cityna v: Language- goal aerial navigati on dataset with geograph ic information, ” arXiv prep rint arXiv:2406.14240 , 2024. [22] R. Z hang, H. Du, Y . Liu, D. Niyato, J. Kang, S. Sun, X. Shen, and H. V . Poor, “Interac tiv e AI with retr ie val-a ugmented generation for next generat ion netwo rking, ” IEEE Network , vol. 38, no. 6, pp. 414–424, 2024. [23] S. Sanyal and K. Roy , “ Asma: An adapti ve safet y margin algorithm for vision-langua ge drone navigat ion via scene -aw are control barrie r functio ns, ” IEEE Robotics and Automation Letter s , 2025. [24] Y . Zhang, V . N. Fernandez-A yala, and D. V . Dimarogonas, “Multi-robot human-in-t he-loop contro l unde r spati otemporal speci fi cations, ” in 2024 IEEE Internat ional Con fer ence on Robotics and Automation (ICRA) , 2024, pp. 4841–4847. [25] S. Xu, X. L uo, Y . Huang, L. Leng, R. Liu, and C. Liu, “Nl2hltl 2plan: Scaling up natura l la nguage un derstand ing for mul ti-robot s through hierarc hical temporal logic task spe cificat ions, ” IEEE Robotics and Automat ion Letters , 2025. [26] Y . Wu, Z. Xiong, Y . Hu, S. S. Iyengar , N. Jiang, A. Bera, L . T an, and S. Jaganna than, “SELP: Generati ng s afe and effici ent task plans for robot agents with large language m odels, ” in 2025 IE E E Internationa l Confer ence on R obotics and Automation (ICRA ) . IEEE, 2025, pp. 2599–2605. [27] H. Kress-Gazit, G. E. Faine kos, and G. J . P appas, “From structured english to robot motion, ” in 2007 IEEE/RSJ Internati onal Confer ence on Intellig ent Robots and Systems . IEEE, 2007, pp. 2717–2722. [28] N. Gopalan , D. Arumugam, L. L. W ong, and S. T ellex, “Sequence- to- Sequence language grounding of Non-Markovi an task specification s. ” in Robotics: Science and Systems , vol. 2018, 2018. [29] C. W ang, C. Ross, Y .-L. K uo, B. Katz, and A. Bar bu, “Learn ing a natura l-language to L TL e xec utable semantic parser for grounded robotic s, ” in Confer ence on Robot Learning . PMLR, 2021, pp. 1706– 1718. [30] R. Pate l, E. P avl ick, and S. T ellex, “Groun ding languag e to Non- Marko vian tasks with no supervision of task specification s. ” in Robotics: Scienc e and Systems , vol. 2020, 2020. [31] J. Pan, G. Chou, and D. Berenson, “Data -efficie nt learni ng of natural languag e to linear temporal logic translat ors for robot task specificati on, ” arXiv prepri nt arXiv:2303.08006 , 2023. [32] F . Fuggitti and T . Chakraborti, “Nl2ltl–a pyt hon package for conv erting natural la nguage in structi ons to line ar temporal log ic formulas, ” in Pr oceedings of the AA AI Confer ence on Artificial Intellig ence , vol. 37, no. 13, 2023, pp. 16 428–16 430. [33] J. X. Liu, Z. Y ang, I. Idrees, S. Liang, B. Schornstei n, S. T ellex, and A. Shah, “Grounding complex natural lang uage commands for temporal tasks in unseen en vironments, ” in Confer ence on Robot Learning . PMLR, 2023, pp. 1084–1110. [34] R. Zhang, S. T ang, Y . Liu, D. Niyato, Z . Xiong, S. Sun, S. Mao, and Z. Han, “T ow ard agent ic AI: Generat ive information retrie v al inspired intel ligent communications and net workin g, ” IEEE Communicat ions Magazi ne , vol. 64, no. 1, pp. 197–204, 2026. [35] R. Zhang, H. Du, Y . Liu, D. Niyato, J. Kang, Z . Xiong, A. J amalipour , and D. I. Kim, “Gene rati ve AI agents with large language model for satell ite networks via a mixture of exper ts transmission, ” IE EE J ournal on Selected Areas in Communicati ons , 2024. [36] S. G hosh, D. Sadigh, P . Nuzzo, V . Raman, A. Donz ´ e, A. L. Sangio vanni- V incente lli, S. S. Sastry , and S. A. Seshia, “Diagnosi s and repair for synthesis from s ignal temporal logic specification s, ” in Pr oceedings of the 19th International Confere nce on Hybrid Systems: Computation and Contr ol , 2016, pp. 31–40. [37] A. T . Buyukkocak and D. Aksaray , “T emporal relaxation of signa l temporal logic specificat ions for resilient cont rol synthe sis, ” in 2022 IEEE 61st Confer ence on Decision and Contr ol (CDC) . IEEE, 2022, pp. 2890–2896. [38] ——, “Resilie nt online planning for mobile robots with minimal re- laxat ion of s ignal tempora l logic speci fication s, ” IEEE Robotics and Automat ion Letters , 2025. [39] Y . Chen, R. Gan dhi, Y . Zhang, and C. Fan, “Nl2tl: Transforming natural langua ges to temporal logi cs using large language models, ” arXiv pre print arXiv:2305.07766 , 2023. [40] Y . Mao, T . Zhang, X. Cao, Z. Chen, X. L iang, B. Xu, and H. Fang, “Nl2stl: Transformat ion from logic natural language to signal temporal logics using llama2, ” in 2024 IEE E Internati onal Confer ence on Cyber- netic s and Intellig ent Systems (CIS) and IEE E Internati onal Confere nce on R obotics, Automation and Mechatr onics (RAM) . IEEE, 2024, pp. 469–474. [41] A. Donz ´ e and O. Maler , “Robust satisfacti on of temporal logic over real-v alued signals, ” in Internati onal confe rence on formal modeling and analysis of timed systems . Springer , 2010, pp. 92–106. [42] DeepSeek-AI, A. L iu, B . Feng , B. Xue, B. W ang, B. W u, and et al., “Deepseek-v3 technic al report. ” [Online]. A v ailab le: https:/ /arxi v .org/abs/24 12.19437 [43] DeepSeek-AI, D. Guo, D. Y ang, H. Z hang, J. Song, R. Zhang, , and et al ., “Dee pSeek-R1: Incenti vizing reasoning ca pabili ty in LLMs via reinforce m ent learnin g, ” 2025. [Online]. A v ailable: https:/ /arxi v .org/abs/25 01.12948 [44] E. J. Hu, Y . Shen, P . W allis, Z. Allen-Zhu, Y . Li, S. W ang, L. W ang, W . Chen et al. , “LORA: Low-ra nk adaptat ion of lar ge language m odels. ” ICLR , vol. 1, no. 2, p. 3, 2022. [45] J. Schulman, F . W olski, P . Dhariwal, A. Radford, and O. Klimov , “Proximal polic y optimizati on algori thms, ” 2017. [Online]. A va ilable: https:/ /arxi v .org/abs/17 07.06347 [46] R. Z hang, Y . Liu, S. T ang, J. W ang, D. Niyato, G. Sun, Y . L i, and S. Sun, “Cov ert prompt transmission for secure lar ge language model service s , ” IEEE Jou rnal on Select ed Areas in Communicat ions , pp. 1–1, 2025. [47] K. Papine ni, S. Roukos, T . W ard, and W . -J. Zhu, “BLEU: a method for automati c ev aluatio n of machine translat ion, ” in Pr oceedings of the 40th annual meet ing of the A ssociatio n for Comput ational Lingui stics , 2002, pp. 311–318.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment