LLM-Enabled Low-Altitude UAV Natural Language Navigation via Signal Temporal Logic Specification Translation and Repair

LLM-Enabled Lo w-Altitude U A V Natural Langu age Na vigation via Signal T emporal Logic S p eciﬁcat ion T ranslation and Repair Y uqi Ping, Huahao Ding, T ianhao Liang, Longyu Zhou, Guangyu Lei, Xinglin Ch en, Junwei W u, J ieyu Zh ou, Tingting Zhang Abstract —Natural language (NL) naviga ti on for low-altitude unmanned aerial vehicles (U A Vs) offers an i n telligent and con- venient solution for l ow-altitude aerial services by en ab l ing an intuitive interface for non-expert operators. Howe ver , deploying this capability in urban env i ronments necessitates the pre cise grounding of u nderspeciﬁed in structions into safety-critical, dynamically feasible motion pl ans subject to spatiotemporal constraints. T o address this challenge, we propose a uniﬁed framewo rk that translates NL in structions into Si gn al T emporal Logic (STL) speciﬁcations and sub sequently synthesizes trajecto- ries via mixed-in teger linear program ming (M ILP). S peciﬁcally , to generate executable STL formulas from free-f orm NL, we deve lop a reasoning-enhanced lar ge language model (LLM) lev eraging chain-of-thought (CoT) supervision and group-relati ve policy optimization (GRPO), which ensures high syntactic validity and semantic consistency . Furthermore, to resolve in feasibilities induced by stringent l ogical or spatial requirements, we in tro- duce a speciﬁ cation repair mechanism. This module combin es MILP-based diagnosis with L LM-guided semantic reasoning to selectiv ely relax task constrain ts wh i le strictly enfor cin g safety guarantees. Extensiv e simulations and r eal-world ﬂight experi- ments demonstrate that the proposed closed-loop framework sig- niﬁcantly improv es NL-to-STL translation robustness, enablin g safe, interpr etable, and adaptable UA V na vigation in com p lex scenarios. Index T erms —Natural language navigation, low-altitude U A V , signal temporal logic, speciﬁcation repair I . I N T RO D U C T I O N A. Ba ckgr o und and Motivation Low-altitude u nmanne d aerial vehicles (U A Vs) hav e been increasingly deployed in m ission-critical scen arios such as safety monitoring [1], for est ﬁreﬁghting [2], log istics [3], emergency com munication s [4], and low-altitude network- ing [5]. Comp ared with high- altitude operation s, low-altitude ﬂight requ ires UA Vs to operate in clo se proximity to complex urban structu res, dyn amic ob stacles, and h uman acti vities, which im p oses strin gent safety and regulatory constraints. At Y uqi Ping , Huahao Ding, Tia nhao Liang, Guangyu Lei, Xingl in Chen, Junwei W u, and T . Zhang are with Guangdong Provi ncial Ke y Laboratory of Space-Ae rial Networki ng and Intellige nt Sensing, Harbin Institute of T ech- nology , Shenzhe n, China, (e-mail: pi ngyq@stu.hit . edu.cn; hitszdhh@163.com; liangt h@hit.edu.cn; GuangyuLei@stu.hit.edu.cn ; chenxingli n@stu.hit.edu.cn; 220210419@st u. hit.edu.cn ; zhangtt@hi t.edu.cn); T . Zhang is also with Peng Cheng Laboratory (PCL), Shenzhen, China. L ongyu Zhou is with the Infor - mation Systems T echnology and Design, Singapore Uni versity of T echnology and Design, Singapore 487372, (e-mail: zhoulyfutu re@outl ook.com). Jie yu Zhou is with School of Computer S cience and Enginee ring, Central South Uni versity , Changsha, China, (e-mail: zhoujie yu@csu.edu.cn). the same time, low-altitude missions o f ten in volve h ig h-level objectives with co mplex temporal, spatial, and logical requ ire- ments that must b e speciﬁed clearly and executed reliab ly [6]. Natural-lang u age (NL) instructions provide an in tu iti ve interface f or exp ressing such high- lev el task intent, e sp ecially for non -expert operato rs. H owever , the inhe r ent ambiguity and undersp e ciﬁcation of NL stand in fun damental tension with the strict safety , tempor al, a nd spatial r equireme n ts that low- altitude UA V navigation mu st satisfy . Recent ad vances in large lang uage models (LLMs) h av e demonstra te d strong capabilities in NL und erstanding an d high-level reasonin g, which has led to growing interest in languag e -driven robotic au tonomy [7]. Despite this pro gress, directly mapping NL instructio ns to low-le vel co ntrol com- mands remains unsuitab le for safety-critical U A V ope r ation, as such mapp ings lac k form al guaran tees, inte r pretability , an d veriﬁability [8]. Conv e r sely , c lassical model-ba sed plann ing and control frameworks are capable of gener a ting dynamically feasible trajectories u n der co mplex constraints, but they typi- cally do not provid e a systematic mech anism to inter pret and enforce high-level task semantics expressed in NL [9]. This disconnect highlights the need f o r an in termediate represen- tation that can faithfully capture language- lev el inte n t while remaining a menable to formal a nalysis and execution. Formal meth o ds offer a prin c ip led bridg e between high-level semantic inten t and lo w-level contro l. In particular, Signal T em p oral Logic ( ST L ) provides precise semantics for spec- ifying complex temp oral an d spatial behaviors [1 0], en abling safety c o nstraints and mission o bjectiv es to b e expressed in a form suitable for veriﬁcation an d optimization- based planning . Howe ver, enablin g NL U A V navigation through STL in troduces two coupled ch allenges. First, NL instruction s must b e tr a n slated in to ST L speciﬁca tio ns that accurately preserve th e intended semantics. Second, ev e n semantically correct STL speciﬁcations ma y render the un derlying planning problem infeasib le wh en tempor a l or spatial requirem ents are overly restrictive in low-altitude environments. These ch al- lenges mo tiv ate a u n iﬁed framework that jointly add resses speciﬁcation translation and feasibility repair , thereb y ensuring both semantic ﬁdelity and physical executability for safe and adaptable low-altitude UA V n avigation. B. Re lated W orks In recent y ears, NL -guided n avigation for UA Vs has at- tracted increasin g attention. Compar ed with grou nd or indoor navigation, low-altitude U A V scenario s in volve 3 D motion , varying ﬂight altitudes, and m ore comp lex spatial relations, which substantially increases the difﬁculty of both language understan ding an d navigation execution [ 1 1]. Early studies mainly adop te d hand -crafted grammatical rules or keyword- based mappin gs to co n vert NL comman ds in to predeﬁned ﬂight action s or wayp oint sequence s [1 2], [13]. Altho ugh these methods o ffer interpretab ility an d en gineering con trollability , their expressive cap acity is limited, ma k ing th em inad equate for open - vocab ulary o r c o mpositiona l instruction s and leading to poor gener alization un der environmental variations [1 4]. Subsequen tly , deep learning techniques were widely intro - duced, lev eraging imitation learnin g or reinfor cement learning to jointly mod el NL, perception , and action space s. This line of research gave rise to N L -guided navigation and v i- sual languag e navigation ( VLN) f rameworks [1 1], [1 5]– [18]. Nev e rtheless, th eir c o re parad igm rem ains task - speciﬁc policy learning, with limited capacity for explicit instruction de- composition and p lan-level reasoning need e d for open -ended , composition al command s. More recently , LLM-enab led methods h av e pushe d NL navigation tow a rd stron ger seman tic u nderstand ing an d ﬂex- ible task execution. By supporting h igh-level reaso ning, task decomp o sition, and open- vocab ulary goal grounding , LLMs enable U A Vs to follow longer, freer-form instru c tions and improve semantic adaptability [19]–[2 2 ] . Nev ertheless, ma ny existing studies are e valuated in open simu lation settings without explicitly modeling low-altitude airspa ce regulations, making it hard to for m ally verify wheth e r a gener ated ﬂigh t plan violates rule constraints in safety -critical 3D airspaces [23]. T o add ress the se limitations, several recent studies have be- gun incorpor ating for mal meth o ds into NL-guide d navigation frameworks [24]–[26]. T em poral logics such as linea r temporal logic (L TL) and STL have b een u sed to explicitly en code task ob jectiv es, safety constraints, an d tempo ral r e quiremen ts, which are sub sequently enforced throu gh plan ning algorithm s. A key challeng e in th ese framew orks is translating NL instruc- tions into well-form ed tem poral-lo gic sp e ciﬁcations. Early research o ften assum e d stru ctured or con tr olled langua ge to simplify NL-to - TL mapp ing [2 7], an d som e navigation- oriented appro aches relied o n manual or semi-autom atic tr a n s- lation pro cedures to construct for mulas specifying visitation order, obstacle a voidanc e b ehaviors, o r timing constraints [ 2 4]. Learning -based seman tic parsing m ethods have su bsequently been explored in se veral domain s a n d have been shown to be effectiv e in map ping NL to formal speciﬁcations [28]– [30], but they typically require sub stantial annotated data and may gen eralize poorly to comp lex instru ctions that inv o lve implicit reaso ning. More recen tly , LLM-b ased method s have been explo red to d irectly g e nerate L TL or STL speciﬁcation s from fr ee-form NL [25], [26], [31]–[ 33], offering a p otential way to im prove task generalizatio n beyond wh at is achiev able with traditio n al supervised par sers. Despite this pro gress, existing studies r emain con strained by limited speciﬁcation expressiv en ess, unstable languag e-to-log ic mappin gs sensitive to prom p ts an d c o ntext, and the fact that current LLM-b ased generation can a lso generalize poo rly on complex instruction s with implicit reasonin g [34], [ 35]. Moreover , th e afo r ementione d methods often presume that human- provided NL instructions are correct and can b e faith- fully captur ed by a correspo nding formal speciﬁcation. In practice, the stated in tent may conﬂict w ith U A V d ynamical feasibility limits o r lo w- a ltitu de airspac e constrain ts, ren dering the synthesized plan infeasible. Once an STL speciﬁcation is obtained, feasibility restoration has been extensively studied in th e formal m ethods literature, where mo st a p proach es keep the STL structur e ﬁxed and focu s on minimal parameter- lev el re pairs. These m ethods typically a n alyze mixed-integer linear p r ogramm ing (MI L P) encoding s to identify irredu cibly infeasible su b systems ( IIS) or unsatisﬁable cores, and restor e feasibility by relaxing temporal b ounds or p redicate thresholds throug h slack variables, weighted objec tives, or least-violating formu latio ns un der restricted fragments [36]–[40]. While these optimization -driven techn iques provide clear o b jectiv es and formal guar antees, they ar e largely lan guage-a g nostic, the decision of what to relax is ty pically determined by predeﬁne d costs or priorities, implicitly assuming that the original spec- iﬁcation prec isely reﬂects user intent. I n langu age-gro unded settings, infe a sibility may instead stem from sema n tic ambi- guity , u n derspeciﬁcation , or NL-to-STL misinterpretation, in which case for m ally min im al relaxatio ns can be sem a ntically misaligned, such as w e a kening task-critical predicates when timing co nstraints are actually negotiable. This motiv ates incorpo rating lan guage-level r easoning into the repair lo op to guide the selection among alternativ e repair directions, such as relaxing temporal constraints versus pre d icate conditions, while still rigorou sly en forcing non -negotiable safety con- straints th rough for m al planning a n d op timization. C. Main Contributions This paper aims to ena b le safe, reliable, an d interp retable low-altitude UA V navigation from n atural-lang uage (NL) in- structions by jointly a d dressing semantic grounding , form al speciﬁcation, an d motion- planning feasibility . The main co n- tributions are sum marized as follows. • W e develop an integrated navigation framework that conv e r ts NL in structions into STL speciﬁcations, detects planning infeasibility , an d r epairs the speciﬁcations based on solver f e edback to restore feasibility . The framework tightly cou ples LLM-b a sed sema ntic reasonin g, STL speciﬁcation translation and rep air , and MI LP-based mo- tion plannin g with in a clo sed-loop architec tu re, enabling safe and executable low-altitude UA V n avigation. • W e pro pose an LLM- based translatio n m ethod th a t map s NL instructio ns into STL speciﬁcatio n s. The method inte- grates super vised chain - of-tho u ght (CoT) alignm ent with group -relative policy op timization ( GRPO) reinforcem ent learning. This pipeline improves the generatio n of syn- tactically well- formed STL spe c iﬁcations an d increases exact-match NL-to-STL tran slation accuracy under strict canonical no rmalization. • W e intro duce an LLM-assisted, systematic STL repair mechanism to handle in feasible planning instances. Di- agnostic information from the MILP solver is mapped back to spe c iﬁc STL subf ormulas, and the LLM is lev er aged to reason about semantic in te n t and prior itize repair directions, enablin g selective relax ation of non- safety-critical requirem e n ts while r ig orously preserving hard safety constra in ts. • W e validate the proposed appro ach thr ough extensiv e simulations and real-world U A V experimen ts. Exp er- imental results show that, co mpared with traditional NL-to-STL translation model, the pro posed app roach achieves higher translation accur a cy while u sing a smaller model. In ad d ition, the results d emonstrate that the p ro- posed closed-loop f r amew ork c a n gener ate d ynamically feasible trajec to ries and recover from inf easible speciﬁ- cations. The re m ainder of th is pap er is organized as follo ws. Sec- tion II fo rmulates the prob lem a n d reviews STL pr eliminaries. Section III presents the overall fr amew o rk. Sectio n s IV –V de- scribe the NL-to- ST L translation and the MIL P- based p lanning and r epair mo dules. Section VI repo rts experimen tal results, and Sectio n VII conclude s. I I . P R E L I M I N A R I E S A N D P RO B L E M F O R M U L A T I O N In this section , we ﬁrst describe the UA V dynam ics and th e en v ironmen t representatio n. W e then rec all the STL , which serves as a f ormal speciﬁcation layer b ridging NL instru c tions and trajector y plan ning. Finally , we state th e U A V NL navi- gation pro blem conside red. A. UA V Dyna mics an d Envir onme nt Model W e co nsider a U A V oper a ting in a boun ded workspace W ⊂ R 3 with obstacles and no- ﬂy zo nes. The U A V state a t discrete time step k is d eﬁned as x k =  p ⊤ k v ⊤ k  ⊤ , wher e p k ∈ R 3 and v k ∈ R 3 denote the position and velocity of the UA V , respectively . Th e control inp ut u k ∈ R 3 correspo n ds to an acceleration co mmand. W e adop t a discrete-time linear dynam ics mod el with sam- pling tim e ∆ t : x k +1 = Ax k + Bu k , (1) where A =  I ∆ t I 0 I  , B =  1 2 ∆ t 2 I ∆ t I  . (2) Here, I den otes the identity matrix o f app ropriate dimen sion and 0 denotes a zero matrix. State a n d c o ntrol co nstraints ar e imposed as: x k ∈ X , u k ∈ U , (3) where X and U are con vex polyhed ral sets enco ding workspace bou nds, velocity limits, and actu ation co n straints. The en v ir onment contains a set of labeled regions {R i } M i =1 that represent task-relevant area s in the work space. Obstacles and no-ﬂy zones ar e modeled as f orbidd en regions that must be avoided by th e U A V for all th e k time steps. B. S ignal T emporal Logic S p eciﬁcation s STL has been extensively used to specify an d verify tem - poral p roperties o f dy n amical systems. In this work , STL for- mulas are interp reted over d iscrete-time UA V state trajectories x 0: H = { x 0 , x 1 , . . . , x H } , wher e H ∈ N den otes the planning horizon . An STL form ula ϕ is deﬁn e d recursively as: ϕ ::= ⊤ | µ | ¬ ϕ | ϕ 1 ∧ ϕ 2 | ϕ 1 ∨ ϕ 2 | G [ a,b ] ϕ | F [ a,b ] ϕ | ϕ 1 U [ a,b ] ϕ 2 , (4) where a, b ∈ N with a ≤ b denote discrete-time bou nds.Here, the tem poral oper a tors F , G , and U corresp o nd to the eventu- ally , always , and u ntil opera to rs, r espectiv ely . And ϕ 1 and ϕ 2 denote arbitrary STL subform ulas. Th e atom ic pred icate µ is deﬁned as an inequality over the system state: µ ( x k ) = g ( x k ) ≥ 0 , (5) where g ( · ) is an afﬁne function of the system state. Such predicates can encode region me m bership, ob stacle clearance, no-ﬂy zon es, kinem atic limits, and o ther safety envelopes. The STL semantics is r ecursively deﬁned as follows [10]: ( x 0: H , k ) | = µ ⇐ ⇒ g ( x k ) ≥ 0 , (6) ( x 0: H , k ) | = ¬ ϕ ⇐ ⇒ ( x 0: H , k ) 6| = ϕ, (7) ( x 0: H , k ) | = ϕ 1 ∧ ϕ 2 ⇐ ⇒ ( x 0: H , k ) | = ϕ 1 ∧ ( x 0: H , k ) | = ϕ 2 , (8) ( x 0: H , k ) | = ϕ 1 ∨ ϕ 2 ⇐ ⇒ ( x 0: H , k ) | = ϕ 1 ∨ ( x 0: H , k ) | = ϕ 2 , (9) ( x 0: H , k ) | = F [ a,b ] ϕ ⇐ ⇒ ∃ k ′ ∈ [ k + a, k + b ] : ( x 0: H , k ′ ) | = ϕ, (10) ( x 0: H , k ) | = G [ a,b ] ϕ ⇐ ⇒ ∀ k ′ ∈ [ k + a, k + b ] : ( x 0: H , k ′ ) | = ϕ, (11) ( x 0: H , k ) | = ϕ 1 U [ a,b ] ϕ 2 ⇐ ⇒ ∃ k ′ ∈ [ k + a, k + b ] s.t. ( x 0: H , k ′ ) | = ϕ 2 ∧ ∀ k ′′ ∈ [ k , k ′ ] : ( x 0: H , k ′′ ) | = ϕ 1 . (12) STL also deﬁnes a robust semantics b y associating each formu la ϕ with a real-valued fu nction ρ ϕ ( x 0: H , k ) such that ( x 0: H , k ) | = ϕ if and only if ρ ϕ ( x 0: H , k ) ≥ 0 . The magn itude of ρ ϕ can be in terpreted as the margin by which ϕ is satisﬁed Natural Language Instruction 濏濣濿濸濴瀆濸澳濴瀅瀅濼瀉濸澳濴瀇澳濔瀅濸濴澳濔澳瀊濼瀇濻濼瀁澳濄濃澳瀆濸濶瀂瀁濷瀆濁濁濁濑濏濦瀃濴瀇濼濴濿澳濺瀂濴濿瀆濑濏濧濸瀀瀃瀂瀅濴濿澳濶瀂瀁瀆瀇瀅濴濼瀁瀇瀆濑濏濦濴濹濸瀇瀌澳瀅瀈濿濸瀆濑濏濣瀅濸濹濸瀅濸瀁濶濸瀆濑濏濦瀌瀆瀇濸瀀澳濣瀅瀂瀀瀃瀇濑濏濙濸瀊激瀆濻瀂瀇澳濘瀋濴瀀瀃濿濸瀆濑 Prompt Reasoning-Enhanced NL-STL T ranslator 濏濡濟澳濜瀁瀆瀇瀅瀈濶瀇濼瀂瀁濑濷濷濷 Synthetic CoT Data … LoRA GRPO LLM CoT Reasoning 濏瀇濻濼瀁濾濑濁濁濁濏濂瀇濻濼瀁濾濑 Diagnosis IIS Extraction Confilict Mapping Original NL Unrepaired STL Conflict Tuple T emporal Slack Predicate Slack Relaxation Specification Repair LLM Repair Decision STL to MILP Encoding Binary Auxiliary V ar Robustness Marg in STL-Constrained T rajectory Planner Un Solver Status Environm ent Model UAV Dynamic STL T rajectory Execution STL Generation 濏濴瀁瀆瀊濸瀅濑濁濁濁濏濂濴瀁瀆瀊濸瀅濑 Fig. 1. Overvi ew of the proposed language-gu ided planning frame work. A natural -langua ge instruction is transl ated into an ST L speciﬁc ation, which is enforce d by a constrained planner . When infeasibili ty is detec ted, solver feedba ck is used to guide speciﬁca tion reﬁnement and re-plannin g in a closed loop. or violated. Follo wing robust STL semantics in [41], the quantitative seman tics is deﬁned recur sively a s: ρ µ ( x 0: H , k ) = g ( x k ) , (13) ρ ¬ ϕ ( x 0: H , k ) = − ρ ϕ ( x 0: H , k ) , (14) ρ ϕ 1 ∧ ϕ 2 ( x 0: H , k ) = min  ρ ϕ 1 ( x 0: H , k ) , ρ ϕ 2 ( x 0: H , k )  , (15) ρ ϕ 1 ∨ ϕ 2 ( x 0: H , k ) = max  ρ ϕ 1 ( x 0: H , k ) , ρ ϕ 2 ( x 0: H , k )  , (16) ρ G [ a,b ] ϕ ( x 0: H , k ) = min k ′ ∈ [ k + a, k + b ] ρ ϕ ( x 0: H , k ′ ) , (17) ρ F [ a,b ] ϕ ( x 0: H , k ) = max k ′ ∈ [ k + a, k + b ] ρ ϕ ( x 0: H , k ′ ) , (18) ρ ϕ 1 U [ a,b ] ϕ 2 ( x 0: H , k ) = max k ′ ∈ [ k + a, k + b ] min  ρ ϕ 2 ( x 0: H , k ′ ) , min k ′′ ∈ [ k, k ′ ] ρ ϕ 1 ( x 0: H , k ′′ )  . (1 9) In this p aper, STL provide s a veriﬁable intermed iate spec - iﬁcation be tween NL instructions an d op timization-b a sed tra- jectory plan ning. Safety-critical requir ements are enfo r ced as hard STL constraints. T ask - related objectives are expressed via temporal op erators and may be selecti vely relaxed when the resulting STL-constrained plan ning problem is inf easible. C. Pr oblem Deﬁ nition Consider a U A V operating in a k nown en vir onment with discrete-time dyna m ics an d a dmissible state and contro l sets. Let L d enote a NL instructio n describ ing a navigation task. The instru ction is interpr e te d as a STL speciﬁcatio n : ϕ = T ( L ) , (20) where T ( · ) d enotes a translation m odel. Giv en the in duced STL speciﬁcation ϕ and a ﬁnite plannin g horizon H , the objective is to com pute a d ynamically feasible state trajec to ry x 0: H and co ntrol sequen ce u 0: H − 1 such that: x 0: H | = ϕ. (21) The r esulting tr ajectory mu st satisfy the system dyn amics, state and co ntrol con stra ints, and all safety-critical requiremen ts encoded in ϕ . The overall NL navigation pro blem add ressed in this paper can be summ a r ized by the following relation: ( L , H ) 7→ ( ϕ, x 0: H , u 0: H − 1 ) . (22) Due to ambiguity in na tural langu age or conﬂicts among temporal and spatial task re quiremen ts, the STL-constra ined planning p r oblem induc e d by ϕ may b e infe asible. In such cases, the goal is to restore feasibility by minimally mod ifying task-related comp onents of the speciﬁcatio n while strictly preserving all safety-critical constraints. I I I . S Y S T E M O V E RV I E W A N D F R A M E WO R K Fig. 1 shows the propo sed framework for low-altitude UA V navigation from a NL instruction L . The system translates L into a veriﬁable STL speciﬁcation, plans a dynamically feasible trajectory un der STL and ph y sical constraints, and triggers speciﬁcation repair when the planning problem is infeasible. Reasoning -Enha nced NL-S TL T ranslator . Giv en L , the translator uses a structu red prom pt and a reasoning-e n hanced LLM to g enerate an STL speciﬁcation ϕ . T he m odel is trained on a synthetic NL-STL dataset a u gmented with CoT reasoning traces, and is furth er ﬁne-tuned using parameter- efﬁcient LoRA and GRPO to improve STL syntactic validity and sema ntic con sistency . STL-Constrained T rajectory Plann er . The plann er converts ϕ into a MILP by introd ucing bin a ry satisfaction variables for STL subform ulas and a robustness margin variable. The MILP also inco rporates the environment model and UA V d ynamics with state/con trol limits. If th e solver is fe a sible, it retur n s a trajectory f or execution, otherwise, it outpu ts an infeasibility status tha t activ ate s repair . Speciﬁca tion Rep air . Upo n infeasibility , the sy stem extracts an IIS and maps co nﬂicting MILP constraints back to STL subform ulas an d time in dices to f orm a conﬂict tu ple. Using the orig inal NL instruction, unrepair ed STL, and co n ﬂict tuple, an LLM selects a repa ir mo de f or ea ch conﬂict: temporal r elaxa tion or pr e d icate relaxation , while safety co nstraints are never rela xed . Th e selected mode is implemented via temporal or pred icate slack variables with penalties, and th e repaired STL is reconstru cted and sen t back to the p lanner f or re- solving. Input Data GRPO Optimization Reward Function Reference Model Group Computation Policy Model O 1 O 2 O N ... A 1 A 2 A N ... R 1 R 2 R N ... The UAV reaches region A within 49 seconds, reaches region B within 75 seconds, and avoids collisions with region C throughout the entire operation. CoT format reward CoT length reward STL syntax reward STL Correct Reward Region A = signal_1_n Region B = signal_2_n Region C = signal_3_n KL Divergence Update Model Fig. 2. GRPO-based RL frame work for reasoning-enhan ced NL-to-STL generati on. Overall, th e framew ork r ealizes L → ϕ → ( x 0: H , u 0: H − 1 ) with autom atic diagnosis-and -repair in the loop , en a bling safe and executable U A V navigation while keeping repairs interpretab le and min imally task-degrad ing. I V . R E A S O N I N G - E N H A N C E D N L T O S T L T R A N S L A T I O N This section p resents a reasoning-en hanced pipeline fo r translating NL in structions in to STL speciﬁcations. W e con- struct training data by augmen tin g an existing NL-to-STL dataset with explicit reason ing traces. A CoT data generatio n pipeline generates inter mediate structur ed re a soning to connec t each in struction to its target STL formula, fo rming an NL- CoT -STL corpus. W e then use th is corp us for cold star t su- pervised ﬁne-tun ing an d apply GRPO reinfo rcement learn ing to further improve STL syntactic correctn ess and semantic consistency . Th e overall train ing pipeline, including cold-start SFT and GRPO-based reinfor cement lear ning, is illu stra te d in Fig. 2. A. S ynthetic Dataset and CoT Data Generation T o improve NL-to -STL translatio n , we b uild a syn thetic training corpus based on NL2TL [3 9]. W e ado pt NL2T L as the b ase dataset and collect 2 0 K NL - STL p airs fr om it. While NL2TL provides align e d NL instructions and STL fo r mulas, the inter mediate semantic d ecompo sition from NL to STL is missing, making it difﬁcult f or the model to learn how linguis- tic cues ar e mapped to STL operators and tempora l co nstraints. T o ad dress this issue, eac h NL-STL pair is au gmented with an intermediate CoT , which serves as a structured bridge between the NL instructio n and the correspond ing STL speciﬁcation. The CoT captures th e essential semantic decom position steps required fo r temporal logic construction , transfor ming each original NL-STL pair in to a NL-CoT -STL triplet. The overall data gen eration pip eline is illustrated in Fig. 3 . The CoT anno tations are generated using DeepSeek-V3 .1 [42]. Given a NL instruction and its g round -truth STL form ula, the m o del recon structs the k ey reasoning steps required for temporal logic co nstruction, inc lu ding pred icate identiﬁcatio n, temporal bo und extraction, operato r selection, and formula composition . These CoT traces serve as an explicit in ter- mediate re p resentation that exposes th e seman tic structure underly ing the NL-to- STL mapping . Applying this pipeline to th e selected NL2TL samples yields a syn thetic dataset with explicit sup ervision over both reasoning and ﬁnal STL o utputs. This dataset forms the ba sis for the subseque n t cold -start super vised ﬁne-tun ing stag e and improves th e structural robustness and semantic con sistency of downstream STL generation . NL-STL data Prompt NL-STL CoT NL: Within the first 62 to 88 time units , signal_1_n shall be consistently equal to 29.3 . STL: G[62:88](signal_1_n==29.3) NL: Within the first 62 to 88 time units , signal_1_n shall be consistently equal to 29.3 . STL: G[62:88](signal_1_n==29.3). CoT: The requirement specifies a time window from 62 to 88 time units, which corresponds to the bounded always operator G[62:88]. The condition that signal_1_n must be consistently equal to 29.3 throughout this interval is expressed as signal_1_n==29.3. Now, I give you the natual language and STL for it, can you give me several sentences to conclude why the nl transfor to this STL. Fig. 3. Co T Data Engine for augmenting NL2TL with intermediat e reasonin g traces. B. Cold Sta rt Stage Recent advances in reinforceme nt learning for reasoning, such as DeepSeek-R1 [43], indicate that policy op timization can imp r ove lo ng-ho rizon r easoning. Howev er , NL-to-ST L translation requ ires b oth explicit CoT reasoning and strict adheren ce to STL syntax. Acq uiring these ca pabilities purely throug h en d-to-en d r einforcem ent learn ing fro m scratch is of- ten unstab le, d ue to rigid syn tactic constraints and sparse task- lev el rewards. Theref ore, we introdu ce a cold-start supervised ﬁne-tunin g (SFT) stage to in itialize th e model before GRPO training. The cold - start stage targets form at alignment and rea so n- ing indu ction. Given the rigidity of STL syn tax and the scarcity o f tem poral-lo g ic structures in general pre- tr aining corpor a, promp t-based appr oaches alo ne are insu fﬁcient to guaran tee consistently well-fo rmed ou tputs. W e ﬁne - tune the model on the synthetic NL-CoT -STL dataset constructed in the previous section, where each sample fo llows a structur ed output pattern: the interme diate rea so ning is enclosed within tags, and the ﬁnal STL speciﬁcation is enclo sed within tags. W e optimize the mode l using a standard max imum-likelihoo d objective, an d implement SFT via param eter-ef ﬁcien t ﬁne-tun in g with L ow-Rank Adap ta tio n (LoRA) [44] to reduce me m ory and comp utational overhead while preservin g the b a se m o del’ s general languag e capab ility . This cold start initialization encourag es the m odel to pro- duce coher e nt reasoning traces and syntactically valid STL formu las in a stable and con trollable mann er . By reducin g the sear c h space and variance encou ntered in subsequent reinfor c e ment learn ing, it enables the GRPO op timization stage to focu s on improving g lobal stru c tural validity and semantic ﬁdelity , r ather than correcting low-le vel formatting errors. C. GRPO Reinfo r cement Learn ing Although SFT en ables the mod e l to im itate correct ou tputs, it optimizes token-level likelihood rather than global structu ral validity . For struc tu red outputs such as STL f ormulas, mino r token erro rs can in validate the en tire for m ulas. Reinforceme nt learning addresses this issue by directly o p- timizing task-level rewards. While Pro x imal Policy Op timiza- tion (PPO) [45] or Group -based Pr oximal Policy Optimizatio n (GPPO) [46] ar e a widely adop ted RL metho d fo r LLMs, it requires an addition al value network and often suffers from high com p utational cost and training instability . T o overcome these limitations, we adop t GRPO [4 2]. GRPO elimin ates the need fo r a value ne twork b y estimating the baseline u sing group -lev el rew ard statistics. W e explicitly structure the model output into a reasoning process enclosed by tags and a ﬁnal STL formula enclosed by tags. The overall RL pipeline follows three stages including policy sampling, re ward computation , and p olicy update. Giv en an in put instru ction, we sample a gr o up of G can - didate ou tputs { O i } G i =1 from the cu r rent policy π θ old with a relativ ely hig h temp erature co efﬁcient τ to encourage explo- ration and o utput diversity . For each sampled output O i , we compute a comp osite rew ar d by summing f our comp onents: R i = R CoT format + R CoT length + R STL syntax + R STL correct . (23) Each r eward term is com p uted fro m the samp led outpu t O i . T o enc o urage the structur al integrity of th e CoT fo rmat, we deﬁne th e CoT format reward with three cases: R CoT format =      + k 1 , if b oth , exist + k 2 , if exactly o ne exists , + k 3 , if n either exists . (24) T o encour age sufﬁcient r easoning while penalizing ver - bosity , we deﬁne the CoT le ng th reward : R CoT length = k 4 · min  L CoT , L max  , (25) where L CoT is the nu m ber of tokens inside the span and L max is the maximum allowed CoT leng th . T o en force syntactic validity of the genera te d STL, we deﬁne th e STL synt ax reward : R STL syntax = ( + k 5 , if ST L syntax is valid − k 5 , otherwise (26) where validity r e quires that all variables belong to the p re- deﬁned set an d that all temp o ral o perators are stru c turally complete. T o provide a dense similarity-b ased signal, we d eﬁne a BLEU-based STL correctness reward : R STL correct = k 6 · BLE U , (27) where BLEU [ 4 7] is computed by com b ining n -g ram precision and a brevity penalty: BLEU = BP · ex p N X n =1 w n log p n ! . (28) Here, p n is th e mo diﬁed n -gr am precision of order n b e- tween the gen erated a n d reference STL sequ e n ces, w n is the correspo n ding weight with P N n =1 w n = 1 , and BP is the brevity p enalty determin ed by the length s of the generated and reference sequences. W e use { k j } 6 j =1 as scalar re ward coefﬁcients to balan ce the contributions of the reward ter ms. After reward com putation, we p erform policy update using GRPO. For each input, the rewards R = { R i } G i =1 of the sampled g r oup ar e no rmalized r elati ve to the g roup mean and variance, yielding the group -relative ad vantage: ˆ A i,t = R i − mea n ( R ) std ( R ) , (29) where t ∈ { 1 , . . . , | O i |} indexes tokens in th e generated tra- jectory O i . T he same gro up-relative advantage ˆ A i,t is assigned to all tokens in the trajectory O i . The GRPO objective comb ines a clipped policy update with KL r egularization against a reference p olicy: L GRPO ( θ ) = E " 1 G G X i =1 1 | O i | | O i | X t =1  L CLIP ( θ, ˆ A i,t ) − β D KL  π θ k π ref   # . (30) For e a ch token po sition t in th e trajector y O i , we co mpute the importance ratio: r i,t ( θ ) = π θ ( a i,t | s i,t ) π θ old ( a i,t | s i,t ) , (31) where a i,t denotes the t - th gen e rated token in trajector y O i , and s i,t denotes the correspond ing generatio n co ntext, consisting o f the inpu t an d the previously generated to kens. W e then d eﬁne the clipped surrogate objective: L ( i,t ) CLIP = min  r i,t ( θ ) ˆ A i,t , clip ( r i,t ( θ ) , 1 − ǫ, 1 + ǫ ) ˆ A i,t  . (32) The policy paramete r s are th en upd ated by gradient a scent on L GRPO ( θ ) . The comp le te GRPO training proced ure, inc lu ding cold- start initialization a n d main reinforcem ent lea r ning phases, is summarized in Algo rithm 1. Algorithm 1 GRPO Training for NL-to-STL Translator with Chain-of- Thoug h t 1: Pha se 0: Co ld Start 2: Build D of NL-CoT -STL triplets with and tags; SFT to ob tain π (0) θ ; 3: In itialize π θ ← π (0) θ and r eference po licy π ref ← π (0) θ ; 4: Pha se 1: GRP O Main T raining 5: for k = 1 t o K do 6: Set π θ old ← π θ ; 7: Sample a grou p { O i } G i =1 from π θ old ( · | x ) with tem per- ature τ ; 8: Compute rewards { R i } G i =1 where R i = R CoT format + R CoT length + R STL syntax + R STL correct ; 9: Compute µ = mea n( { R i } ) , σ = std( { R i } ) , and set ˆ A i,t = ( R i − µ ) / ( σ + ε ) fo r all t = 1 , . . . , | O i | ; 10: Compute L GRPO ( θ ) = 1 G P G i =1 1 | O i | P | O i | t =1  L ( i,t ) CLIP − β D KL [ π θ k π ref ]  , wh ere 11: r i,t ( θ ) = π θ ( a i,t | s i,t ) /π θ old ( a i,t | s i,t ) and 12: L ( i,t ) CLIP = min  r i,t ( θ ) ˆ A i,t , clip ( r i,t ( θ ) , 1 − ǫ, 1 + ǫ ) ˆ A i,t  ; 13: Update b y grad ient ascent: θ ← θ + η ∇ θ L GRPO ( θ ) ; 14: end for V . S T L - C O N S T R A I N E D T R A J E C T O RY P L A N N I N G A N D S P E C I FI C A T I O N R E PA I R Giv en the discrete-time UA V dynamics and the STL speci- ﬁcation gen erated fr om NL instructio ns, we formulate the tra- jectory gener ation pr oblem as a constra ined plan ning p roblem over a ﬁnite horizon . This section presents a uniﬁed for m ula- tion that integrates STL- c onstrained trajectory planning with feasibility d iag nosis and speciﬁcation rep air . A. MILP Enco ding of STL Satisfaction Giv en the discrete-tim e UA V dynam ics and the STL spec- iﬁcation ϕ tr anslated fro m the NL instructio n by the LLM, we enco d e satisfaction of ϕ over a ﬁnite plannin g ho rizon H using a MILP formulation . For each STL subformu la ψ and discrete time step k , we introdu c e a binar y auxiliary variable z ψ ,k ∈ { 0 , 1 } indicatin g whether ψ is satisﬁed at tim e k . The atomic p redicate µ is deﬁned as an ineq uality over the system state x k of the form µ ( x k ) ≥ 0 . The im plication betwe e n the b inary variable and predicate satisfaction is en coded using Big- M constraints as µ ( x k ) + (1 − z µ,k ) M ≥ γ , µ ( x k ) − z µ,k M ≤ γ , (33) where M > 0 is a sufﬁciently large constant and γ ≥ 0 is a global robustness margin variable. Unlike the quantitative STL robustness ρ ϕ ( x 0: H , k ) deﬁn ed in Section I I-B, which evalu- ates the satisfaction margin o f a formula on a g iv en tr ajectory , γ is an optimization variable th a t enforces a un iform lower bound on pre d icate satisfaction across the entir e trajector y . Boolean c o mpositions of STL fo rmulas are encode d recu r- si vely . For a conju nction ψ = V i ψ i , we impose: z ψ ,k ≤ z ψ i ,k +∆ i , ∀ i, (34) and f or a disjunction ψ = W i ψ i : z ψ ,k ≤ X i z ψ i ,k +∆ i , (35) where ∆ i denotes the relative time o ffset ind uced by the syntax tree of ψ . T em p oral op erators are unfold ed over the ir discrete-time intervals accord ing to the STL semantics deﬁned in Sectio n II- B. Speciﬁcally , G [ a,b ] operator s are expand ed as con junctions over { k + a, . . . , k + b } , while F [ a,b ] operator s are expanded as disjunctions over the same in terval. The until operator U [ a,b ] is han dled analogou sly usin g its standard Boolean exp a n sion. All resu ltin g Boolean constrain ts are enco ded using Equation s (34)–(35). Satisfaction of the overall STL speciﬁcation is enfo rced by introdu c ing a roo t variable z ϕ, 0 and r equiring : z ϕ, 0 = 1 , γ ≥ 0 . (36) Any feasible solution to th e resulting MILP theref ore corre- sponds to a trajectory x 0: H that satisﬁes ϕ . B. S TL-Constrained Optimization F ormulation Combining th e system dynamics, state and control con - straints, and the M I LP enco d ing o f STL satisfaction, we o btain the fo llowing mixed-in teger optimization problem over the horizon H : min . x 0: H , u 0: H − 1 , z , γ − γ + H − 1 X k =0  x ⊤ k Qx k + u ⊤ k Ru k  (P1) s.t. x 0 = x ﬁxed , (37a) x k +1 = Ax k + Bu k , ∀ k = 0 , . . . , H − 1 , (37b) x k ∈ X , u k ∈ U , ∀ k = 0 , . . . , H − 1 , (37c) (33) − (36) . The o b jectiv e in Problem (P1) seek s a traje c to ry that max- imizes the glo bal robustness margin γ while penalizing state deviation and control e ffort thr o ugh q u adratic regularizatio n. Feasibility with γ ≥ 0 guarantees satisfaction of the STL speciﬁcation, and larger values of γ cor r espond to increased robustness against p erturbation s. C. F easibility Certiﬁcation and IIS E xtraction If the solver dec la r es Problem (P1) inf easible, we perf orm feasibility diagn osis u sin g an IIS, which id entiﬁes a minimal set of con stra in ts and variable boun d s that can not b e satisﬁed simultaneou sly . W e adopt the IIS-based diagnosis procedure in [36] to localize in f easibility at the spec iﬁcatio n level. T o e nable sem antic interpr etation of solver f e edback, each STL-indu c ed constraint in the MILP is associated with a traceability record that links it to the orig inating STL sub- formu la and time index. Usin g this mappin g, th e co nstraints contained in the I IS are pr o jected back to the STL sy n tax tree and summarized as a set of infeasible ato m ic events, each character ized b y an atom ic predicate and its associated temporal context. This diagnosis result localizes the source of infea sibility in a form suitable f or high -level r easoning and serves as stru ctured inpu t to the subsequent sp eciﬁcation re pair stage. D. LLM-Guid ed Rep air via P r edica te-T e mporal Choice When the STL-constra in ed optimizatio n problem is infea- sible, o u r objective is to r estore feasibility b y minim a lly modify in g task-related req uirements while strictly preserv ing all safety-critical con straints. Un like prio r appr oaches tha t require th e design er to m anually specify wh ich pr edicates or temporal p arameters ar e eligible for repair , we delegate the choice of repair dimen sion to a LLM, which op e rates at the semantic level and does not directly m anipulate nu merical optimization variables. For each infeasible ato mic e vent identiﬁed by the IIS- based diagnosis in Section V -C, the L L M is provid ed with a structu red input that co mbines symb olic, temporal, and semantic information . Speciﬁcally , the input co nsists of: (i) the origin al NL instruction fro m which the task speciﬁcation was gen erated, (ii) the unrepair ed STL speciﬁcation ϕ , and (iii) a structured description o f each diagnosed atomic event represented as a tu p le ( µ, σ , O , R ) , w h ere µ denotes the atomic predicate inequality , σ is its discrete - time support interval induced by the STL sem a n tics, O is th e p a rent tempor a l operator in the STL syn tax tree, and R characterizes the semantic role of the p redicate. This rep resentation allows the LLM to r eason jointly over th e or iginal task in tent, the formal speciﬁcation structure, and the loc alized in feasibility infor- mation, witho ut exposing solver-le vel variables or numerical relaxation p arameters. Safety-critical predicates, such as obstacle av oidance and no-ﬂy constraints, are explicitly excluded from this interface and ar e never co nsidered fo r relaxa tion. Giv en th is in put, the LLM outpu ts a binary decision select- ing exactly one admissible rep a ir mode for each infeasible atomic e vent: pr ed icate relaxation or temporal r ela xation . T o preserve in terpretability and a clear separatio n between spatial and tempo ral mo diﬁcations, mixed r epairs for a single atomic event are intentionally disallowed. The LL M d oes no t propo se relax ation mag nitudes, threshold values, o r mo diﬁed temporal bounds; it only determines wh ic h repa ir dimen sion is permitted, while the q u antitative extent of repair is determ ined entirely by the optimization lay e r . The LLM’ s decisions are then realized at the nu merical lev el thro ugh selecti ve relaxa tion in the MILP . If pr edicate relaxation is selected for an a to mic event, a no nnegative slack variable s µ,k ≥ 0 is introd uced to relax the corresponding Big- M c onstraint: µ ( x k ) + (1 − z µ,k ) M + s µ,k ≥ γ . (38) If temp oral relaxation is selected, nonn egati ve tem poral slack variables τ ψ ,k are intro duced to relax the Boolean con stra in ts arising f rom temporal opera tor expansion. For ato m ic events deemed non-n egotiable, n o relaxatio n variables are introd uced and th e corre sponding constraints remain un changed . After intro ducing the selected relaxations, we r esolve the optimization p roblem with an au gmented o bjectiv e that pe nal- izes the total relaxation mag nitude: min . − γ + X k  x ⊤ k Qx k + u ⊤ k Ru k  + λ p X s µ,k + λ t X τ ψ ,k , (39) where λ p , λ t > 0 weight the relati ve cost of p r edicate and temporal relax ations. The op timized relaxation values p rovide quantitative guidan ce for constructin g an explicit repaired STL speciﬁcation ˜ ϕ : p r edicate r elaxations are tran slated into adjusted num erical threshold s o r geometr ic p arameters, while temporal relaxations ar e m a pped to mod iﬁed temp o ral bo unds. Finally , ˜ ϕ is reconstructed withou t slack variables and re- encoded into a stand ard STL-co nstrained MILP; solving this problem y ields a dy n amically feasible trajector y that satisﬁes ˜ ϕ exactly . The introdu ction of the LLM does not mod ify the un- derlying IIS-based diag nosis o r th e optimization -based rep a ir mechanism. For any ﬁxed set of LLM r e pair-mode selections, feasibility an d quantitativ e minimality a r e enfo rced by the MILP for m ulation, while the LLM inﬂuen ces only the ad- missible r epair dimen sio n for each diagno sed atomic event. V I . E X P E R I M E N TA L E V A L UAT I O N This section pre sen ts a compr e hensive exper imental evalu- ation of th e propo sed NL navigation fram ew o rk. Th e experi- ments are d esigned to assess the effectiveness of the system at mu ltiple levels , rangin g fro m NL und erstanding an d form al speciﬁcation gene r ation to closed-loo p trajectory plannin g and execution. W e ﬁrst ev alu ate th e perfor mance of th e propo sed reasoning -enhanc e d NL-to-STL translation model in isolation, focusing on its syntactic corr e c tness with r espect to gr o und- truth temp oral log ic speciﬁcations. W e then assess the full planning framework in simulation , wher e the translated STL speciﬁcations are integrated with th e MILP- based trajector y planner . These experiments ev aluate th e system’ s ability to generate d ynamically feasib le trajector ie s, enforce safety- critical constraints, and r e cover f r om infeasible spe ciﬁcations throug h selecti ve r e pair . Finally , we demon stra te the appli- cability of the propo sed approach in real-world exp e r iments using a quadrotor U A V p latform in a representative searc h - and-rescu e task. A. NL-to- STL T ranslation Exp eriments This subsection ev aluates the prop osed reasoning -enhan ced NL to STL T ranslation mode l, with an emph a sis o n its ab ility to gen erate syntactically valid and semantically accurate STL speciﬁcations f r om NL instruction s. The translator is built upon Qwen2.5-0. 5B-Instruct and trained u sing the reasoning- enhanced pipeline describe d in Section IV . All exper iments ar e cond ucted on the NL2 TL dataset [3 9], where 20K NL- STL pairs are used for tra ining and an ad- ditional 4K samples are h eld out for testing . W e comp are against several representative baselin es, in c lu ding a Llama2- ﬁnetuned mode l [40], a T5- ﬁn etuned m odel [39], the o r iginal Qwen2.5- 0.5B-Instru ct mo del witho ut task-speciﬁc adaptatio n, and a Qwen2.5-0 .5B-Instru ct mod el ﬁne- tuned on NL-STL pairs without CoT supervision . All ﬁne- tuned baselines are trained on the sam e d ata split a n d e valuated u nder identical decodin g con straints. T ra n slation perfor mance is measured using accuracy , de- ﬁned as the exact-match rate betwee n the g enerated STL speciﬁcation and th e grou nd-truth formula after cano nical normalizatio n of syntax and oper ators.This m etric is intention- ally strict, as any syn tac tic erro r , incorrect temporal bo und, or logica l mismatch ren ders the o utput incorr ect, reﬂecting the req uirement th at gene r ated speciﬁcation s m ust be directly executable by downstream formal planners. Fig. 4 shows the conv ergen ce o f th e p roposed NL-to- STL translatio n mo d el acr o ss training stages. SFT denotes supervised ﬁne-tuning o n data without CoT . The accuracy improves steadily as the model learns the rigid STL g rammar and its align ment with natural- language expressions, b u t the gains gradually satu rate. The co ld-start stage corr esponds to supervised ﬁne-tuning on data with CoT , which yields a clear jump in accu racy , sug g esting that CoT provides useful inter- mediate reason ing sig n als that im prove structural cor rectness. After introdu cing GRPO, th e mo del achieves f u rther impr ove- ments with a smo other, m ore stable tra jectory , indicating that 2 4 6 8 10 Step 0.4 0.5 0.6 0.7 0.8 0.9 1.0 A ccuracy SF T Cold Start GRPO Fig. 4. NL -to-STL translati on accurac y during SFT and GRPO trainin g. rew ar d-based optimization can b e tter correc t seque n ce-level and g lo bal structu r al errors tha t are difﬁcult to elimin ate using token-level likeliho od training alone. T ABLE I N L - T O - S T L T R A N S L A T I O N P E R F O R M A N C E C O M P A R I S O N . Method Model size Accuracy (%) Llama2-ﬁnetuned 13B 94.8 T5-ﬁnetuned 220M 93.1 Qwen2.5-0.5B (prompt-only) 0.5B 45.2 Qwen2.5-0.5B-ﬁnetuned w/o CoT 0.5B 93.3 Qwen2.5-0.5B-ﬁnetuned w/ CoT (Proposed) 0.5B 98.0 Quantitative results are summarized in T able I. The pro mpt- only Qwen2. 5 -0.5B-I n struct mod el achiev es an ac curacy of only 45 . 2% , h ig hlighting the difﬁculty of reliably gener ating well-forme d STL speciﬁcations without task-spe ciﬁc adap- tation. Supervised ﬁne-tun ing withou t CoT already y ie ld s a substantial improvement, raising a ccuracy to 93 . 3% , which is compara b le to or slightly better than the T5-ﬁne tu ned baseline. The Llama2-ﬁnetun ed mod el ac h iev es 94 . 8% accuracy but relies o n a signiﬁcantly larger p a r ameter coun t. By contrast, the pro posed Qwen 2.5-0 . 5B-Instruct model ﬁne-tuned with CoT supervision and GRPO achieves an accuracy of 98 . 0% , outpe r forming all baselines wh ile using substantially f ewer p a r ameters. Com pared with the same back- bone trained witho ut CoT , this cor responds to an absolute improvement of 5 . 8 perc entage points, demonstrating that explicit reasoning sup ervision p lays a critical role in reducing temporal- logic composition er rors. Th ese results indicate th at for NL-to-STL tr anslation, semantic align ment and structural reasoning are more decisive than raw model scale, and that the p roposed reasoning-enha n ced train ing pipeline is highly effecti ve for produ cing executable form al speciﬁcations. B. S imulation Naviga tion Experiments In this su b section, we ev alua te the propo sed NL n avigation framework in simulation. All simulation s ar e co nducted in a 100 , m × 100 , m planar w o rkspace at a ﬁxed altitude, with static o bstacles a nd n o-ﬂy zon es. The UA V d y namics follow the discrete-tim e linear model described in Section II with sampling time ∆ t = 1 , s , and th e co ntrol lim its enfo rce a max - imum speed o f v max = 5 , m / s and a maximum ac c eleration of a max = 1 , m / s 2 throug hout the plan ning horizon . 1) F easible STL T ask Examples: W e ﬁrst p resent rep resen- tati ve task instances who se NL instruction s can b e translated into feasible STL speciﬁcations withou t any repair . These examples ar e used to illustrate th e correspond ence between languag e -lev el task descr iptions, the translated STL form ulas, and th e resulting dynam ically feasible trajec tories. Fig. 5. Simulated traject ory for T ask 1. The U A V reaches the goal region within the speciﬁed time windo w while avo iding obstacles. T ask 1 ( Disjunctive intermediate goals with obstac le avoidance). Th e NL instruction is: “The U A V must av oid th e obstacle ( R o 1 ) and ( R o 2 ) before reach ing a goal ( R g ) . Alon g the way , the U A V mu st reach on e o f two intermediate targets ( R t 1 ) or ( R t 2 ) . ” This instru ction is translated by the p roposed tran slation module into the following STL speciﬁcation : ϕ 1 =  F [0 ,T ] R t 1 ∨ F [0 ,T ] R t 2  ∧ F [0 ,T ] R g ∧ G [0 ,T ] ¬R o 1 ∧ G [0 ,T ] ¬R o 2 . (40) The resulting ST L speciﬁcation is d irectly enforced b y the MILP-based p lanner, wh ic h successfu lly ﬁnds a dy n amically feasible trajectory satisfying all co nstraints. Fig. 5 v isualizes the planned trajecto r y , where the UA V reaches on e of the intermediate targets an d sub sequently enters the goal region while maintaining safe clearance fro m the obstacle region. T ask 2 (Sequential authorization and constrained pas- sage). The NL instruction is: “The U A V must enter designa te d autho rization r e- gions R k before it is allowed to pass thro ugh the correspo n ding r e stricted air corrid o rs or gates R d and ﬁna lly re a ch the ta rget location R g . Th rough out the mission, the U A V is r equired to av o id obstacle ( R o 1 ) , ( R o 2 ) an d ( R o 3 ) . ” Fig. 6. Simulate d trajectory for T ask 2. The U A V sequentially reache s the speciﬁed targe t regions while av oiding the no-ﬂy zone at all times. This instru ction is translated by the p roposed translation module into the following STL speciﬁcation : ϕ 2 =  ¬R d U [0 ,T ] R k  ∧ F [0 ,T ] R g ∧ G [0 ,T ] ¬R o 1 ∧ G [0 ,T ] ¬R o 2 ∧ G [0 ,T ] ¬R o 3 . (41) Here, the until operato r cap tu res the temp oral depende ncy that the UA V is prohib ited from entering the restricted corridor or gate region R d until it has visited the corresp onding authorizatio n region R k . Continuou s av o idance of no-ﬂy zones and o bstacles is encoded as global safety c onstraints. T he planner succe ssfully synthesizes a trajector y that satisﬁes the seque ntial access requ ir ement w h ile respectin g all safety constraints, as illustrated in Fig. 6. 2) Speciﬁcatio n Rep air E xample: W e next conside r a rep- resentative UA V task in which a NL instruction ind uces an STL speciﬁcation that is infe a sible und er the giv en dynamics and environmental constraints. This examp le dem onstrates how the propo sed lang uage-g uided speciﬁcation r epair mech- anism restores feasibility by selecti vely relax ing temporal requirem ents, while preserving all safety -critical constraints and g eometric task deﬁnition s. Fig. 7. Simulated traj ectory for the repaire d task. The U A V reac hes the goal regi on after temporal repair while continuo usly av oiding the obstacle. Fig. 8. Executed UA V trajec tory for the real-world search-and-rescu e expe riment. T ask 3 (Time-constrained goa l reaching with o bstacle avoidance). T he NL in struction provided by th e hu man oper- ator is: “The U A V m ust alw ay s a void the o bstacle region ( R o ) an d reach the goa l region ( R g ) within 20 s . ” This instru ction is translated by the p roposed tran slation module into the following STL speciﬁcation : ϕ 3 = G [0 ,T ] ¬R o ∧ F [0 , 20] R g , (42) where R o denotes the obstacle region and R g denotes the goal region . The glo bal operator G enfo rces co ntinuou s ob- stacle av oida n ce over the entire plan ning h orizon, while the ev e n tually op erator F [0 , 20] requires the UA V to r each the goal within a strict deadline o f 20 s . Under th e system d ynamics an d contro l limits descr ibed earlier in th is section, the re sulting MI LP p roblem is infeasible. The o b stacle lies directly be tween the initial p osition an d the goal, forcin g the UA V to execute a detou r . Giv en th e boun ded velocity an d acceler ation, the solver determines that the U A V cannot reach R g within the spec iﬁed 20 s without violating either the dynamic constraints o r the obstacle-avoidance r e - quiremen t. Upon detecting infeasibility , the speciﬁcation repair proce- dure is autom atically triggered . T he MILP solver comp u tes an IIS, and the conﬂictin g co nstraints are tr aced back to the STL subform ulas using th e constraint- to -speciﬁcation mappin g. In this case, the conﬂict is localized to th e tempo r al bound of the reachability r e quiremen t F [0 , 20] R g . Guided by the LLM, the rep a ir m odule is instructed to relax only the tem poral compon e nt o f the task, wh ile keeping all geometric pr e d icates uncha nged. In particu lar , the size and location o f the g oal r egion R g and the obstacle-avoidance constraint G [0 ,T ] ¬R o are treated as saf ety-critical and are no t modiﬁed. A slack variable is introd uced on the upper bound of the e ventu ally op erator, and a weigh ted ℓ 1 penalty is used to minim ize the amoun t of temp oral relaxation . As a result, th e r epaired speciﬁcatio n beco mes: ϕ repair 3 = G [0 ,T ] ¬R o ∧ F [0 , 30] R g , (43) correspo n ding to an extension of the rea c hability d eadline from 20 s to 30 s . The repaired STL speciﬁcation is th en enforce d by the MILP-based planner, wh ich successfully syn thesizes a d ynam- ically feasible trajectory . As shown in Fig. 7, th e U A V detour s around R o , maintain s co ntinuou s safety , an d reaches the original goal region R g within the r elaxed tempo ral wind ow . This examp le illustrates that the pro posed f r amew ork can reconcile high- le vel time-critical langu age instructio ns with low-le vel dynamical feasibility th rough targeted, interpretable speciﬁcation repa ir, without com promising safety constraints or task semantics. C. Real- W o rld Experiments T o validate th e propo sed fram ew o r k un der real-world co n- ditions, we conduct a real- W orld experimen t in volving an U A V perfor ming a search - and-rescu e task guided by NL instructions. 1) Experimental Se tu p: The exper iments are carried out using a DJI Matrice 300 R T K quadro tor eq uipped with an (a) UAV platform (b) Annotated test site map Fig. 9. Real-world experimenta l setup for the search-a nd-rescue task. (a) DJI Matrice 300 R TK equip ped with an onboard 4G/5G wirele s s signal acquisition and locali zation devi ce. (b) Annotat ed test site map sho wing designated rescue areas and no-ﬂy zones. onboa r d 4G/5 G wireless signal acquisition a n d localiza tio n de- vice. The U A V operates in an outd o or environmen t with a pre- deﬁned bound ed workspace th at includes designated search- and-rescu e region s an d safety-restricted no -ﬂy zones. State estimation is p rovided by the onboard R TK position ing system and fused in e rtial measuremen ts, while LLM translation and STL plan n ing are per formed offboard and tr ansmitted to the U A V via a wireless commu n ication link . Fig. 9 illustrates the real-world exper im ental setup, includin g the U A V p latform and the annotated test site with rescue areas an d no- ﬂy zo nes. 2) Sear ch-a nd-Rescue T ask Description : W e consider a representative search- and-rescu e scenario in which the UA V is instructed to search multiple designated regions within a ﬁxed time window wh ile co ntinuou sly av oidin g all no - ﬂy zones. The task is speciﬁed through th e following NL instruction: • “Search the three rescue areas R s 1 , R s 2 , and R s 3 within 60 seco n ds, while av oidin g all no-ﬂy zon es. ” This instru ction is translated by the p roposed tran slation module into the following STL speciﬁcation : ϕ SAR = F [0 , 60] R s 1 ∧ F [0 , 60] R s 2 ∧ F [0 , 60] R s 3 ∧ G [0 , 60] ¬R nf 1 ∧ G [0 , 60] ¬R nf 2 ∧ G [0 , 60] ¬R nf 3 ∧ G [0 , 60] ¬R nf 4 ∧ G [0 , 60] ¬R nf 5 . (44) Here, R s 1 , R s 2 , an d R s 3 denote the three designated search regions, while R nf 1 – R nf 5 represent the in dividual no-ﬂy zones in the environment. The glob al oper ator G [0 , 60] enforce s continuo us av o idance of all no-ﬂy regions over the entire mission hor izon, en coding safety-critical constraints th at must never be violated. Th e e ventually o p erators F [0 , 60] require that each sear ch region be visited at least once with in the 60 -second time window , with o ut impo sing a strict visitation order amon g them. The resulting S TL speciﬁcation captures the essential requirem ents of the search-a n d-rescue task, namely time - bound ed coverage of all designated search areas and persistent av oidan ce of restricted airspace, and is sub sequently enforced by the MIL P- based planner to gener ate a dy namically feasible patrol tr ajectory . 3) Experimental Resu lts: Fig. 8 shows the executed pa- trol tr ajectory d uring th e search -and-r e scue task. The U A V successfully completes the patrol within the prescribed time window while respec ting all safe ty constraints speciﬁed by the STL for mula. Dur ing the mission, the U A V detected a victim waiting for rescue inside the region R s 1 and repo rted the ﬁnding to the gr o und op erator for su bsequent response. The experiment demonstrates th at the propo sed lang uage-g uided planning fram ew o r k can be deployed on a real UA V system and can reliably execute co mplex, tempo rally constrain ed tasks derived fro m NL instructions. V I I . C O N C L U S I O N This pape r p resented a u n iﬁed framework fo r NL low- altitude UA V n avigation b y tran slating free-f orm instructions into STL sp e ciﬁcations a nd synthesizing dy namically fea- sible trajectories und er formal constrain ts. By integrating a reasoning -enhanc e d LLM with MILP-based STL- constrained planning , the pr o posed approach enables r obust NL-to-STL translation while rigorously enforcing safety-critical require- ments. A solver -in -the-loop speciﬁcation repair mechanism was further introduced, in which an LL M provides sema n - tic g uidance to selecti vely relax non-safety-c ritical task con- straints while strictly p reserving safety guaran tees. E xtensive simulation resu lts an d real-world ﬂig ht experim ents demon- strate that th e pro posed framework achieves safe, interpretable, and adaptable UA V n avigation in complex low-altitude envi- ronmen ts. Future work w ill f ocus on extending th e fr amew o rk to partially observable en vir onments, mu lti-U A V coord ination, and o nline adaptatio n under d ynamic task revisions. R E F E R E N C E S [1] S. K. K. Hari, S. Rathina m, S. Darbha, K. Kalyanam, S. G. Manyam, and D . Casbeer , “Optimal UA V route planning for persistent monitoring missions, ” IEEE T ransactions on Robotic s , vol. 37, no. 2, pp. 550–566, 2020. [2] Y . Ping, T . Liang, H. Ding, G. Lei, J . W u, X. Zou, K. Shi, R. Shao, C. Zhang, W . Zhang, W . Y uan, and T . Z hang, “Multimodal large languag e models-enable d UA V swarm: T owa rds ef ﬁcient and int ellig ent autonomous aerial systems, ” IE EE W irele ss Communicat ions , pp. 1–9, 2025. [3] M. Chen, L. Y ang, J. Cao, G. Z hu, W . Y uan, H. Jiang, and D. Niyato, “Car go U A Vs pick-up systems for low-a ltitude economy with commu- nicat ion quality , battery ener gy , and time window constraints, ” IEEE T ransacti ons on Mobile Computing , pp. 1–18, 2025. [4] T . Liang, H. Ding, Y . Ping, T . Zhang, L . Zhou, Q. Zhang, and T . Q. Quek, “Satel lite-assisted UA V control: Sensing and communicat ion scheduling for energ y ef ﬁcient data collect ion, ” IEEE Internet of Things Journal , 2025. [5] R. Zhang, G. Liu, Y . Liu, C. Zhao, J. W ang, Y . Xu, D. Niyato, J. Kang, Y . Li, S. Mao et al. , “T oward edge general intellig ence with agentic AI and agentiﬁca tion: Concepts, tec hnologies, and future directi ons, ” arXiv pre print arXiv:2508.18725 , 2025. [6] H. Liu, G. W u, L . Zhou, W . Pedrycz, and P . N. Suganthan, “T angent- based path planning for UA V in a 3-D low altit ude urban en vironment, ” IEEE T ransac tions on Intellig ent T ransportation Systems , vol. 24, no. 11, pp. 12 062–12 077, 2023. [7] R. Shao, W . Li, L. Zhang, R. Z hang, Z. Liu, R. Chen, and L. Nie, “Large VLM-based vision-lan guage-a ction models for robotic manipula tion: A surve y , ” arXiv preprint arXiv:2508.13073 , 2025. [8] S. T elle x, N. Gopalan, H. Kress-Gazit , and C. Matuszek, “Robots that use langu age, ” A nnual R ev iew of Contr ol, R obotic s, and Autonomous Systems , vol. 3, no. 1, pp. 25–55, 2020. [9] B. Quartey , E. Rosen, S. T ellex, and G. Konida ris, “V eriﬁably follo wing comple x robot instructions w ith founda tion model s , ” in 2025 IEEE Internati onal Confer ence on Robotics and Automatio n (ICRA) . IEEE, 2025, pp. 1–8. [10] C. Belta and S. Sadraddini , “Fo rmal methods for control synthesis: An optimiza tion perspect ive , ” Annual Revie w of Contr ol, Robotics, and Autonomou s Systems , vol. 2, no. 1, pp. 115–140, 2019. [11] S. Liu, H. Zhang, Y . Qi, P . W ang, Y . Zhang, and Q. W u, “ Aeri- alVLN: V ision-and-lan guage navigat ion for U A Vs, ” in Proce edings of the IEEE/CVF International Confer ence on Computer V ision , 2023, pp. 15 384–15 394. [12] M. Chandarana , E. L. Meszaros, A. Truj illo, and B. D. Allen, “’Fly Like This’: Natural langua ge interface for uav mission planning, ” in Internati onal Confer ence on Advances in Computer-Human Interaction s , no. NF1676L-26108, 2017. [13] M. Chandarana, E. L. Meszaro s, A. Tr ujill o, and B. Danett e Allen, “Nat- ural language based multimodal interf ace for U A V mission plannin g, ” in Proc eedings of the Human F actor s and Ergonomics Society Annual Meeti ng , vol. 61, no. 1. SAGE Publicati ons Sage CA: Los Angeles, CA, 2017, pp. 68–72. [14] F . Y ao, Y . Liu, W . Zhang, Z . Zhu, C. Li, N. Liu, P . Hu, Y . Y ue, K. W ei, X. He et al. , “AeroV erse-Re vie w: Comprehensi ve survey on aerial embodied vision-and-lang uage naviga tion, ” The Inno vation Informatics , vol. 1, no. 1, pp. 100 015–1, 2025. [15] P . Anderson, Q. W u, D. T ene y , J . Bruce, M. Johnson, N. S ¨ underhau f, I. Reid, S. Gould, and A. V an Den Hengel , “V ision-and -langua ge navi gation: Interpretin g visually-ground ed navigati on instructions in real en vironments, ” in Proce edings of the IEEE confer ence on computer vision and pattern re cogn ition , 2018, pp. 3674–3683. [16] K. Narasimhan, T . Kulkarni, and R. Barzila y , “Language understanding for text -based games using deep reinforcement learning, ” in Pr oceedings of the 2015 Confere nce on Empirical Methods in Natural Languag e Pr ocessing , L. M ` arquez, C. Callison-Burch, and J. Su, Eds. Lisbon, Portugal : Association for Computational Linguistics, Sep. 2015, pp. 1–11. [Online]. A vail able: https:/ /acla nthology .org/D15- 1001/ [17] X. W ang, Q. Huan g, A. Celikyil maz, J. Gao, D. Shen, Y .-F . W ang, W . Y . W ang, and L. Zhang, “Reinfor ced cross-modal matchi ng and self-superv ised imitation lear ning for visi on-langu age navigati on, ” in Pr oceedings of the IEE E/CVF confer ence on compute r vision and pattern rec ogni tion , 2019, pp. 6629–6638. [18] Y . Hong, Q. Wu, Y . Qi, C. Rodrigue z-Opazo , and S. Gould, “VLN BER T: A recurrent vision-and-l anguage BER T for navigat ion, ” in Pro- ceedi ngs of the IEEE/CVF confe re nce on Computer V ision and P attern Recogn ition , 2021, pp. 1643–1653. [19] W . Zhang, C. Gao, S. Y u, R. Peng, B. Zhao, Q. Zhang, J. Cui, X . Chen , and Y . Li, “CityNav Agent: Aerial vision-an d-langua ge navigat ion with hierarc hical semantic planning and global memory , ” arXiv preprint arXiv:2505.05622 , 2025. [20] P . Saxe na, N. Raghuv anshi, and N. Gov eas, “U A V -VLN: End-to- End vision language guide d naviga tion for ua vs, ” arXiv pre print arXiv:2504.21432 , 2025. [21] J. Lee, T . Miyanishi, S. Kurita, K. Sakamoto, D. Azuma, Y . Matsuo, and N. Inoue, “Cityna v: Language- goal aerial navigati on dataset with geograph ic information, ” arXiv prep rint arXiv:2406.14240 , 2024. [22] R. Z hang, H. Du, Y . Liu, D. Niyato, J. Kang, S. Sun, X. Shen, and H. V . Poor, “Interac tiv e AI with retr ie val-a ugmented generation for next generat ion netwo rking, ” IEEE Network , vol. 38, no. 6, pp. 414–424, 2024. [23] S. Sanyal and K. Roy , “ Asma: An adapti ve safet y margin algorithm for vision-langua ge drone navigat ion via scene -aw are control barrie r functio ns, ” IEEE Robotics and Automation Letter s , 2025. [24] Y . Zhang, V . N. Fernandez-A yala, and D. V . Dimarogonas, “Multi-robot human-in-t he-loop contro l unde r spati otemporal speci ﬁ cations, ” in 2024 IEEE Internat ional Con fer ence on Robotics and Automation (ICRA) , 2024, pp. 4841–4847. [25] S. Xu, X. L uo, Y . Huang, L. Leng, R. Liu, and C. Liu, “Nl2hltl 2plan: Scaling up natura l la nguage un derstand ing for mul ti-robot s through hierarc hical temporal logic task spe ciﬁcat ions, ” IEEE Robotics and Automat ion Letters , 2025. [26] Y . Wu, Z. Xiong, Y . Hu, S. S. Iyengar , N. Jiang, A. Bera, L . T an, and S. Jaganna than, “SELP: Generati ng s afe and efﬁci ent task plans for robot agents with large language m odels, ” in 2025 IE E E Internationa l Confer ence on R obotics and Automation (ICRA ) . IEEE, 2025, pp. 2599–2605. [27] H. Kress-Gazit, G. E. Faine kos, and G. J . P appas, “From structured english to robot motion, ” in 2007 IEEE/RSJ Internati onal Confer ence on Intellig ent Robots and Systems . IEEE, 2007, pp. 2717–2722. [28] N. Gopalan , D. Arumugam, L. L. W ong, and S. T ellex, “Sequence- to- Sequence language grounding of Non-Markovi an task speciﬁcation s. ” in Robotics: Science and Systems , vol. 2018, 2018. [29] C. W ang, C. Ross, Y .-L. K uo, B. Katz, and A. Bar bu, “Learn ing a natura l-language to L TL e xec utable semantic parser for grounded robotic s, ” in Confer ence on Robot Learning . PMLR, 2021, pp. 1706– 1718. [30] R. Pate l, E. P avl ick, and S. T ellex, “Groun ding languag e to Non- Marko vian tasks with no supervision of task speciﬁcation s. ” in Robotics: Scienc e and Systems , vol. 2020, 2020. [31] J. Pan, G. Chou, and D. Berenson, “Data -efﬁcie nt learni ng of natural languag e to linear temporal logic translat ors for robot task speciﬁcati on, ” arXiv prepri nt arXiv:2303.08006 , 2023. [32] F . Fuggitti and T . Chakraborti, “Nl2ltl–a pyt hon package for conv erting natural la nguage in structi ons to line ar temporal log ic formulas, ” in Pr oceedings of the AA AI Confer ence on Artiﬁcial Intellig ence , vol. 37, no. 13, 2023, pp. 16 428–16 430. [33] J. X. Liu, Z. Y ang, I. Idrees, S. Liang, B. Schornstei n, S. T ellex, and A. Shah, “Grounding complex natural lang uage commands for temporal tasks in unseen en vironments, ” in Confer ence on Robot Learning . PMLR, 2023, pp. 1084–1110. [34] R. Zhang, S. T ang, Y . Liu, D. Niyato, Z . Xiong, S. Sun, S. Mao, and Z. Han, “T ow ard agent ic AI: Generat ive information retrie v al inspired intel ligent communications and net workin g, ” IEEE Communicat ions Magazi ne , vol. 64, no. 1, pp. 197–204, 2026. [35] R. Zhang, H. Du, Y . Liu, D. Niyato, J. Kang, Z . Xiong, A. J amalipour , and D. I. Kim, “Gene rati ve AI agents with large language model for satell ite networks via a mixture of exper ts transmission, ” IE EE J ournal on Selected Areas in Communicati ons , 2024. [36] S. G hosh, D. Sadigh, P . Nuzzo, V . Raman, A. Donz ´ e, A. L. Sangio vanni- V incente lli, S. S. Sastry , and S. A. Seshia, “Diagnosi s and repair for synthesis from s ignal temporal logic speciﬁcation s, ” in Pr oceedings of the 19th International Confere nce on Hybrid Systems: Computation and Contr ol , 2016, pp. 31–40. [37] A. T . Buyukkocak and D. Aksaray , “T emporal relaxation of signa l temporal logic speciﬁcat ions for resilient cont rol synthe sis, ” in 2022 IEEE 61st Confer ence on Decision and Contr ol (CDC) . IEEE, 2022, pp. 2890–2896. [38] ——, “Resilie nt online planning for mobile robots with minimal re- laxat ion of s ignal tempora l logic speci ﬁcation s, ” IEEE Robotics and Automat ion Letters , 2025. [39] Y . Chen, R. Gan dhi, Y . Zhang, and C. Fan, “Nl2tl: Transforming natural langua ges to temporal logi cs using large language models, ” arXiv pre print arXiv:2305.07766 , 2023. [40] Y . Mao, T . Zhang, X. Cao, Z. Chen, X. L iang, B. Xu, and H. Fang, “Nl2stl: Transformat ion from logic natural language to signal temporal logics using llama2, ” in 2024 IEE E Internati onal Confer ence on Cyber- netic s and Intellig ent Systems (CIS) and IEE E Internati onal Confere nce on R obotics, Automation and Mechatr onics (RAM) . IEEE, 2024, pp. 469–474. [41] A. Donz ´ e and O. Maler , “Robust satisfacti on of temporal logic over real-v alued signals, ” in Internati onal confe rence on formal modeling and analysis of timed systems . Springer , 2010, pp. 92–106. [42] DeepSeek-AI, A. L iu, B . Feng , B. Xue, B. W ang, B. W u, and et al., “Deepseek-v3 technic al report. ” [Online]. A v ailab le: https:/ /arxi v .org/abs/24 12.19437 [43] DeepSeek-AI, D. Guo, D. Y ang, H. Z hang, J. Song, R. Zhang, , and et al ., “Dee pSeek-R1: Incenti vizing reasoning ca pabili ty in LLMs via reinforce m ent learnin g, ” 2025. [Online]. A v ailable: https:/ /arxi v .org/abs/25 01.12948 [44] E. J. Hu, Y . Shen, P . W allis, Z. Allen-Zhu, Y . Li, S. W ang, L. W ang, W . Chen et al. , “LORA: Low-ra nk adaptat ion of lar ge language m odels. ” ICLR , vol. 1, no. 2, p. 3, 2022. [45] J. Schulman, F . W olski, P . Dhariwal, A. Radford, and O. Klimov , “Proximal polic y optimizati on algori thms, ” 2017. [Online]. A va ilable: https:/ /arxi v .org/abs/17 07.06347 [46] R. Z hang, Y . Liu, S. T ang, J. W ang, D. Niyato, G. Sun, Y . L i, and S. Sun, “Cov ert prompt transmission for secure lar ge language model service s , ” IEEE Jou rnal on Select ed Areas in Communicat ions , pp. 1–1, 2025. [47] K. Papine ni, S. Roukos, T . W ard, and W . -J. Zhu, “BLEU: a method for automati c ev aluatio n of machine translat ion, ” in Pr oceedings of the 40th annual meet ing of the A ssociatio n for Comput ational Lingui stics , 2002, pp. 311–318.

LLM-Enabled Low-Altitude UAV Natural Language Navigation via Signal Temporal Logic Specification Translation and Repair

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment