Can an Actor-Critic Optimization Framework Improve Analog Design Optimization?

Can an Actor -Critic Optimization Frame w ork Impro ve Analog Design Optimization? Sounak Dutta ∗ , Fin Amin ∗ , Sushil Panda, Jonathan Rabe, Y uejiang W en, Paul Franzon Department of Electrical and Computer Engineering North Carolina State Uni versity {sdutta6, samin2, spanda4, jcrabe, wyuejia, paulf}@ncsu.edu Abstract —Analog design often slows do wn because even small changes to device sizes or biases require expensiv e simulation cycles, and high-quality solutions typically occupy only a narro w part of a very large search space. While existing optimizers reduce some of this burden, they largely operate without the kind of judgment designers use when deciding where to sear ch next. This paper pr esents an actor -critic optimization framework (A COF) for analog sizing that brings that f orm of guidance into the loop. Rather than treating optimization as a purely black-box search problem, A COF separates the r oles of proposal and evaluation: an actor suggests promising regions of the design space, while a critic reviews those choices, enf orces design legality , and r edirects the search when progress is hampered. This structure preserves compatibility with standard simulator -based ﬂows while making the search pr ocess more deliberate, stable, and interpretab le. Across our test circuits, A COF impro ves the top-10 ﬁgure of merit by an average of 38.9% over the strongest competing baseline and r educes regret by an a verage of 24.7%, with peak gains of 70.5% in FoM and 42.2% lower regr et on individual circuits. By combining iterative reasoning with simulation-driv en search, the framework offers a more transparent path toward automated analog sizing across challenging design spaces. Index T erms —analog circuit optimization, Bayesian optimiza- tion, Large language models, SPICE simulation I . I N T RO D U C T I O N A N D M OT I V A T I O N The demands on modern analog design continue to increase with new technology nodes, tighter speciﬁcations, and faster dev elopment cycles. Each node changes transistor behavior , in validates earlier biasing strategies, and requires renewed ex- ploration of the design space. From an optimization standpoint, this is no longer merely schematic tuning, but the search for a feasible region where gain, bandwidth, phase mar gin, and power can be satisﬁed together . Because performance varies nonlinearly with sizing, designers need SPICE simulations, and more knobs mean more costly trials. Ev en after the topology is ﬁxed, analog sizing is rarely straightforward. It typically requires repeated ev aluation and careful adjustment to satisfy se veral coupled speciﬁcations while managing trade-offs among performance, power , and area. In practice, analog design is still a slow and labor-intensi ve process, requiring extensi ve simulation and considerable designer e xpertise [1]–[3]. ∗ Equal contrib ution. Pr eprint. Under r eview . Actor-Critic Interaction Proposal → Audit → Search → Reflection PROPOSAL AUDIT SEARCH REFLECTION NEXT ROUND PROPOSAL AUDIT Actor - Starts broad search. - Ranges: R6_Ohm = 20k-35k, - L56_um = 0.32-0.90, W21_um = 45-90, - C1_pF = 4.0-9.0. Critic - Audit passed. The proposal stays inside the PDK bounds and preserves the actor’s intent, so no repair is needed. - The round proceeds with the actor’s ranges as written. - Changed 0 / 21 knobs. BO + Simulation - Search only inside the critic-approved region. - Evaluate candidate designs with the simulator. - Return round data for reflection. Critic - valid_rate(prev)=1.000, valid_rate(curr)=0.940 (dropped). - Improvement stalled, so let's tighten the ranges: - R6_Ohm: tighten to 21.5k-27k (center ~23-25k) to align with higher-performing cases and recover UGBW/FOM. - L56_um: 0.45-0.60, pair Lout_um to 0.34-0.43 to rebalance bandwidth without PM loss. - Balance: Ib_uA to 8.5-9.2 and W21_um to 52-60, with C1_pF at 7.8-8.8 to seek UGBW gains while maintaining high PM and validity. Actor - Next proposal follows the memo. - It narrows the search. - The key changes are now explicit: R6_Ohm = 21.5k-27k, L56/Lout tightened, Ib_uA = 8.5-9.2, W21_um = 52-60, C1_pF = 7.8-8.8. Critic - Modifies actor's proposal. - It narrows/expand the search for better exploitation/exploration. - The key changes are now explicit: R6_Ohm = 21.5k-23k, L56 = 0.51-0.57 , Ib_uA = 8.8-9.0, W21_um = 54.8-57.6, C1_pF = 8.4-8.8. .......... - Changed 21 / 21 knobs. Fig. 1. Example of interactions between our actor and critic components o ver rounds during analog sizing optimization. Our frame work deﬁnes a closed- loop optimization process that alternates between proposal, audit, search, and reﬂection. For that reason, analog sizing has long been vie wed as a constrained optimization problem. From that perspectiv e, the central challenge is not simply to ev aluate better candidate points, but to direct the search to ward regions of the design space that are more likely to produce feasible and high- performing solutions under competing design constraints [2]– Actor Critic Optimizer Simulator Proposes search region  Ω n = π ( e n − 1 , Ω) Validates the proposed search region Ω + n =  (  Ω n , e n − 1 ) Region-constrained sampling  n = BO(Ω + n ) Evaluates circuit performance and computes the FoM  n = {( x , f ( x ))} x ∈  n Round summary e n = g (  n ) Proposed search region  Ω n Approved ranges Ω + n Candidate set  n ⊆ Ω + n Evaluated round data  n Feedback summary e n Fig. 2. An overview of our actor–critic optimization framew ork (A COF). At each time step, n , the actor proposes a candidate search region ˜ Ω n , the critic adjusts it to a PDK-legal region Ω + n , BO selects candidates x ∈ Ω + n for simulation, and simulator outcomes are summarized into e n for the ne xt round. A COF is agnostic to the speciﬁc choice of BO. [5]. Human designers address this problem differently . They do not treat the search space as uniformly promising; they use judgment to focus attention, interpret trade-of fs, and revise direction when progress stalls. This paper is moti vated by the idea that such guidance should become part of the optimization loop itself. Recent LLM-based studies in electronic design suggest that language models can contribute to design automation as reasoning agents rather than as mere te xt generators [6]–[11]. Building on that perspectiv e, our contribution is inv estigating casting analog sizing as an actor–critic process in which one agent proposes promising regions of the design space, another ev aluates whether the search direction should be preserved or re vised, and a con ventional optimizer explores within the resulting region. In this way , automated analog sizing can become not only more sample-efﬁcient, but also more interpretable and more aligned with the way experienced designers reason about difﬁcult trade-offs. I I . P R I O R A RT A. Bayesian Optimization for Analog Cir cuit Sizing Bayesian optimization (BO) has advanced analog circuit sizing in several important ways [12]. WEIBO [4] formulates sizing as constrained BO with Gaussian-process (GP) surrogates and weighted expected improv ement, and extends the approach to multi-objectiv e design. Local BO [5] scales this idea to larger spaces through trust-region-based local models and batch candidate ev aluation. tSS-BO [13] tar gets the high-dimensional setting by restricting search to an ef fective truncated subspace while retaining local GP guidance and Bayesian selection. B. Language Driven Optimization Recent LLM-based work in analog design has begun to open se veral distinct directions rather than one uniﬁed path. LayoutCopilot [6] reimagines analog layout interaction by turning natural-language design requests into ex ecutable layout actions through a multi-agent framew ork, while LLANA [7] brings language models into the BO loop to generate design-dependent layout constraints, particularly net-weighting parameters for layout synthesis. AmpAgent [8] approaches ampliﬁer design as a coordinated reasoning process across literature understanding, formula deri vation, and device sizing. In contrast, AnalogCoder [9] treats analog design as a code- generation problem, using a training-free workﬂo w to produce and iterativ ely correct Python-based circuit implementations. ADO-LLM [10] more directly couples an LLM with Bayesian optimization for circuit sizing, using the model to propose promising design candidates alongside a GP-based search process. LEDR O [14], in turn, shifts the emphasis from asking the LLM for a single sizing answer to asking it to progressiv ely narrow the search region so that optimization can proceed within a more meaningful part of the design space. More recent frameworks e xtend this picture in complementary ways: EEsizer [15] formulates transistor sizing as a closed-loop LLM- guided process built around simulation, analysis, and iterativ e parameter updates, while LLM-USO [16] focuses on structured knowledge reuse across related circuits. C. T akeways fr om Prior Art These prior works point to the same recurring pressures: GP-based BO becomes harder to scale as the sample set and parameter dimension grow . Constrained search remains difﬁcult, and exploration becomes increasingly fragile in high- dimensional spaces. Furthermore, these methods mainly act as black boxes [5], [7], [13]. Recent LLM-based methods hav e begun to address these limitations in different ways: some strengthen the reasoning and decomposition side of analog design [8], some couple LLMs with BO to improve candidate generation within the optimization loop [10] and some demonstrate the broader value of self-reﬂection [17] for iterativ e improvement in workﬂo ws using optimizer [14]. But in these systems, guidance is still dri ven mainly by simulator feedback or by the model’ s own reﬂective process, leaving a natural opening for an independent critic that can examine a proposal from outside the actor’ s reasoning stream. That distinction matters because prompt-heavy LLM systems already report performance dilution under long, multitask context [6]. In relation to this, long-context retriev al studies show that current LLMs can struggle e ven on simple fact-retrie val tasks as context grows [18]. Our method is designed around e xactly this gap. Considering these takeaw ays, we introduce ACOF in the following section. T ABLE I R E SU LT S AC RO S S T H E F O U R T E S T C I RC U I T S , AG G R EG ATE D OV E R 3 RU N S P E R C I RC U I T . E AC H C I RC U I T F O RM S O N E M U L T I ROW B L OC K A T L E FT . V A L U E S A R E M E AN W I TH S U B SC R I PT E D S EM ( S T A N DA R D E R RO R O F T H E M E A N ). B O LD I N D IC A T E S T H E B E S T B AS E L I NE M E TH O D W I T H IN E AC H C I R C UI T A ND M E TR I C . Q W E N W A S U S E D A S T H E L LM . T op-10 Design Quality Reliability (%) Exploration/Exploitation Method Gain (dB) ↑ UGBW (MHz) ↑ PM (deg) ↑ Po wer (mW) ↓ FoM ↑ Sim. V alid ↑ Phys. Feasible ↑ Regions ↑ Regret ↓ 130nm 12 P arams T arget Spec. 85.00 900.00 100.00 0.554 0.00 - - - - A COF 83.65 0 . 64 785.30 39 . 71 101.35 1 . 13 0.658 0 . 045 -0.19 0 . 01 97.0 0 . 1 86.2 1 . 4 2.0 0 . 0 0.25 0 . 01 Single-LLM 83.15 0 . 55 717.63 45 . 51 111.89 0 . 74 0.657 0 . 029 -0.24 0 . 02 95.4 0 . 9 78.9 1 . 8 2.7 0 . 3 0.30 0 . 01 Pure-BO 76.24 1 . 45 673.92 22 . 65 128.23 3 . 27 0.587 0 . 014 -0.39 0 . 02 81.0 0 . 3 26.9 0 . 1 3.0 0 . 8 0.37 0 . 02 Human Expert 72.77 851.48 173.59 0.821 -0.45 - - - - 180nm 12 P arams T arget Spec. 95.00 500.00 100.00 0.194 0.00 - - - - A COF 90.99 0 . 55 195.59 10 . 25 119.77 2 . 23 0.200 0 . 001 -0.52 0 . 02 100.0 0 . 0 90.2 1 . 9 5.0 1 . 2 0.59 0 . 02 Single-LLM 90.88 0 . 20 240.69 78 . 42 117.72 1 . 86 0.324 0 . 104 -0.64 0 . 01 100.0 0 . 0 84.9 1 . 0 2.0 0 . 0 0.64 0 . 01 Pure-BO 86.79 0 . 29 253.71 7 . 97 127.97 1 . 23 0.359 0 . 024 -0.74 0 . 02 96.1 0 . 5 36.1 2 . 6 2.0 0 . 0 0.69 0 . 04 Human Expert 81.62 365.69 127.42 0.361 -0.68 - - - - 130nm 17 P arams T arget Spec. 85.00 20.00 100.00 0.277 0.00 - - - - A COF 88.56 2 . 77 29.45 7 . 68 104.59 22 . 57 0.290 0 . 008 -0.13 0 . 05 97.7 0 . 8 81.7 10 . 3 3.7 0 . 7 0.26 0 . 02 Single-LLM 87.06 1 . 53 20.62 1 . 04 71.73 2 . 23 0.475 0 . 141 -0.44 0 . 17 93.9 1 . 6 82.3 6 . 5 2.7 0 . 5 0.45 0 . 12 Pure-BO 85.69 0 . 40 50.49 15 . 61 74.93 5 . 47 0.656 0 . 100 -0.73 0 . 01 84.6 0 . 6 68.4 0 . 6 1.3 0 . 5 0.65 0 . 03 Human Expert 78.33 9.93 80.06 0.560 -0.91 - - - - 180nm 21 P arams T arget Spec. 85.00 1200.00 120.00 1.320 0.00 - - - - A COF 94.57 2 . 38 1095.36 18 . 60 84.20 3 . 69 1.329 0 . 014 -0.24 0 . 02 99.3 0 . 3 85.6 2 . 4 2.7 0 . 3 0.32 0 . 01 Single-LLM 102.43 4 . 70 759.88 178 . 60 90.91 10 . 95 1.232 0 . 090 -0.44 0 . 07 99.7 0 . 2 74.6 3 . 3 1.0 0 . 8 0.47 0 . 05 Pure-BO 98.06 2 . 13 469.33 59 . 13 123.24 7 . 15 1.243 0 . 068 -0.61 0 . 01 91.4 0 . 1 19.3 1 . 0 1.7 0 . 7 0.58 0 . 01 Human Expert 72.69 1138.87 124.78 2.210 -0.51 - - - - I I I . O U R A P P RO AC H W e formulate analog circuit design optimization as x ⋆ = arg max x ∈ Ω f ( x ) , where x denotes the vector of tunable circuit parameters, Ω denotes the global feasible design domain induced by technology bounds and hard circuit constraints, which is primarily deﬁned by the PDK and f ( x ) denotes the scalar ﬁgure of merit (F oM) obtained from circuit simulation. Our approach, the actor –critic optimization framework (A COF), is or ganized as a role-separated reasoning loop with two logical agents: an actor , which proposes where the search should move next, and a critic , which re views that proposal before optimization proceeds. This structure turns the familiar design-revie w pattern of analog practice into an iterativ e computational procedure. In our pipeline the actor does not output a single design point; instead, it proposes a search region within which the downstream optimizer should explore. At initialization, the actor is seeded by a small set of v alid calibration examples, denoted by C 0 [14], and produces the ﬁrst candidate region ˜ Ω 1 = π ( C 0 , Ω) . After the ﬁrst round, the actor is updated by a round summary e n − 1 that condenses what the previous round re vealed, including the best design found so far , representativ e high-quality samples, and the critic’ s reﬂection on that round, so that for later rounds it proposes ˜ Ω n = π ( e n − 1 , Ω) . In the implemented loop, the actor is therefore driv en not by the pre vious corrected region itself, b ut by the evidence generated from searching within it. Instead of allowing optimization to unfold as a blind sequence of samples, A COF inserts an explicit stage of judgment between proposal and e valuation. As the search progresses, the critic also produces short reﬂection memos [17] that capture recurring patterns, useful corrections, and promising directions, thereby gi ving the loop continuity across rounds. An overvie w is gi ven in Fig. 2. Before an y BO step is performed, the critic audits the actor’ s proposal and con verts it into the region actually used for search, written as Ω + n = V ( ˜ Ω n , e n − 1 ) , where V ( · ) denotes the critic’ s validation and correction step. Here Ω + n is the post-audit region for round n : it is the actor’ s proposal after being checked, corrected when necessary , and accepted for optimization. Conceptually , this stage both preserves the actor’ s intent when that intent is coherent and stabilizes the search when the proposal is malformed, ov erly narrow , overly loose, or otherwise inconsistent with the feasible domain. In this sense, the critic functions less as a second optimizer than as a gatekeeper that keeps exploration disciplined. Once Ω + n has been established, BO and simulation are carried out within that region under a ﬁxed e valuation budget. X n ∼ BO(Ω + n ) denotes the candidate designs selected for ev aluation in round n , then simulation produces the round data set D n = { ( x, f ( x )) } x ∈X n , where f ( x ) is the desired ﬁgure of merit. BO is therefore not used to search the full design space at e very step, but to search efﬁciently within the region appro ved −5 0 5 ACOF Avg FoM = -2.13, Max FoM = -0.34 Avg FoM = -1.51, Max FoM = -0.28 Avg FoM = -1.90, Max FoM = -0.35 Avg FoM = -2.01, Max FoM = -0.31 Avg FoM = -1.36, Max FoM = -0.32 −5 0 5 Single-LLM Avg FoM = -3.34, Max FoM = -0.55 Avg FoM = -3.13, Max FoM = -0.51 Avg FoM = -2.54, Max FoM = -0.64 Avg FoM = -2.66, Max FoM = -0.65 Avg FoM = -2.58, Max FoM = -0.55 4 8 12 R1–R2 −5 0 5 Pure-BO Avg FoM = -3.56, Max FoM = -0.68 4 8 12 R3–R4 Avg FoM = -3.49, Max FoM = -0.59 4 8 12 R5–R6 Avg FoM = -3.34, Max FoM = -0.62 4 8 12 R7–R8 Avg FoM = -3.57, Max FoM = -0.44 4 8 12 R9–R10 Avg FoM = -3.62, Max FoM = -0.71 −4.0 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 FoM UMAP Component 1 UMAP Component 2 Exploration by Round Across Methods -- 180nm 21 Params (GPT 4o Mini) Fig. 3. UMAP projections of the sizing parameters illustrate ho w the design space was explored over optimization rounds. T o aid visualization, we outlined regions being explored in the current rounds in red and rendered the cumulative explored designs from past rounds in gray . ACOF judiciously explores the design space by tar geting regions corresponding to high performance–as indicated by the higher round-wise A vg and Max FoM. Our technique successfully explored the re gions abov e UMAP Component 2 > 1 . On the other hand, the Single-LLM and Pure-BO baselines were apprehensi ve to exploration and achiev ed worse FoMs. for that round. After simulation, the framew ork compresses the round’ s outcomes into a summary e n = g ( D n ) , where g ( · ) denotes the round-lev el reﬂection process. This summary captures the practical meaning of the round: which parts of the re gion were productiv e, whether progress was made, and what kind of adjustment should guide the next proposal. The critic’ s memo is part of this summary , so the feedback is not purely numerical but also interpretive. The next round then follo ws the same pattern, with ˜ Ω n +1 = π ( e n , Ω) and Ω + n +1 = V ( ˜ Ω n +1 , e n ) . T aken together , these steps deﬁne a closed-loop optimization process that alternates between proposal, audit, search, and reﬂection (see Fig. 1). The actor proposes where to search, the critic determines whether that proposal is suitable for actual optimization, BO explores within the approved region, and the resulting evidence is distilled into guidance for the next round. This separation between proposing a region and approving a region is the core operational idea of A COF: it allo ws the search space to be reﬁned progressiv ely while remaining anchored to the feasible structure of the circuit. A. Why Use an Actor-Critic Structur e? The distinction between policy-based and value-based meth- ods provides a useful lens for understanding why an actor– critic structure is a natural design choice in optimization. In reinforcement learning (RL), policy methods learn a direct mapping from states to actions, whereas value-based methods learn an estimate of e xpected return to guide action selection, as in REINFORCE/TRPO versus Q-learning/DQN [19]–[22]. The historical trend was not a simple shift from polic y learning to v alue learning, since v alue-based deep RL was already central in the mid-2010s through DQN. A more meaningful shift was tow ard methods in which action proposals are shaped by an explicit ev aluator . Actor-critic methods made this especially clear by pairing a policy with a learned critic that scores actions or states and stabilizes learning through lo wer-v ariance updates [20], [23]. More broadly , this points to a lesson larger than any one RL algorithm: many successful learning systems beneﬁt when generation and judgment need not share the same mechanism. Methods such as Soft Actor-Critic, TD3, and MuZero push this logic further by placing learned v alue estimates, critics, and planning modules at the center of decision making [23]–[25]. In parallel, re ward-learning and preference- based framew orks, including RLHF , further reinforce the idea that learning what should ev aluate beha vior can be as important as learning the behavior itself [26], [27]. Our approach is motiv ated by the same principle, but applied to analog sizing rather than sequential control. Our framework is not an RL algorithm in the standard sense; instead, it adopts the structural lesson behind actor-critic methods by separating the proposal of a search region from its ev aluation. In analog optimization, this separation is useful because a proposed region may be promising in spirit yet still be ill-posed, physically implausible, or wasteful under a ﬁxed simulation budget. An independent critic therefore serves as an explicit e valuati ve mechanism that helps stabilize and redirect the search before expensi ve simulation is spent. I V . E X P E R I M E N T S W e ev aluate three optimization frame works for their effec- ti veness in identifying high-quality circuit designs. In all cases, the objecti ve is to maximize a scalar FoM that captures the target performance requirements. For each circuit, run, and method, we form a run-le vel summary , and then we report the av erage and standard error of the run-lev el summaries across the three runs. Our analysis considers three groups of metrics: design quality , reliability , and exploration/e xploitation. (a) T wo-stage ampliﬁer (b) F olded cascode single-ended ampliﬁer Fig. 4. The circuit schematics of our benchmarks. For an OpAmp, gain, bandwidth, phase margin, and power are coupled design speciﬁcations. Higher DC gain can introduce additional poles; this affects phase margin. Also, improving UGBW usually requires a higher bias current and increases power . Since low po wer is desirable, the circuit must balance gain, UGBW , and power , while phase margin sets the stability constraint and often trades off with frequenc y response and bandwidth. A. Benchmarks W e benchmark four circuits of varying complexity across two technology nodes. In SKY130 (130 nm), we consider a two- stage ampliﬁer and a cascode wide-swing ampliﬁer with design vectors x ∈ R 12 and x ∈ R 17 , respectiv ely . In GF180MCU (180 nm), we consider a two-stage ampliﬁer and a folded- cascode single-ended ampliﬁer with design v ectors x ∈ R 12 and x ∈ R 21 , respectiv ely (see Fig. 4). Each design v ector includes the tunable devices, bias, and compensation parameters. B. Implementations and Baselines The ﬁrst setup employs the proposed A COF. F or comparison, we use a second setup that replaces the actor–critic loop with a single LLM coupled to the same BO backend, and a third setup that uses BO alone with no LLM guidance. All three methods follow the same round schedule and the same per -round simulation budget under identical circuit-simulation conditions. In A COF , the actor and critic are prompted separately . The actor is gi ven topology-speciﬁc context, and the calibration set from the ﬁrst round or a packed summary of the strongest v alid designs from the pre vious round, including gain, UGBW , phase margin, po wer , and operating-re gion information. It then proposes numeric subranges for all tunable parameters. The critic receiv es those proposed ranges and returns a structured audit that either accepts the actor ranges or repairs them minimally within the legal bounds before BO is launched. After BO ev aluates the candidate designs in the critic-appro ved region, the round summary is written back into the next actor prompt so that the search region can be re vised progressi vely rather than reinitialized from scratch. See Fig. 1 for an example. For clarity and reproducibility , we will release the full prompting templates upon acceptance. Inspired by LEDR O [14], we implement a Single-LLM baseline to interface with BO. At each step it performs the actor’ s candidate search region step but forgoes the correction/reﬁnement by the critic. Lastly , as an LLM-free comparison, we report the performance of Pure-BO , by optimizing the design using BO, [28]. Before the rounds be gin, an initial set of designs is generated using BO to provide C 0 = 200 starting points, which are used to condition the LLM baselines. In each round, we perform the optimization procedures and use Ngspice to compute the FoMs [29]. In our experiments, we conﬁgure our LLM-based methods using the open-source Qwen2.5-14B-Instruct [30]. T o see ho w performance changes with respect to stronger LLM capabilities, we rerun our experiments on the 21-parameter folded-cascode benchmark using GPT -4o-mini [31] and GPT -5 [32]. C. P erformance Metrics Design Quality . W e measure and maximize FoM computed from gain G ( x ) , unity-gain bandwidth B W ( x ) , phase margin ϕ ( x ) , and po wer W ( x ) . W e deﬁne circuit-speciﬁc tar gets ( G ∗ , B W ∗ , ϕ ∗ , W ∗ ) , reported as Target Spec. in Fig. I, and use them to construct normalized component scores s G ( x ) , s B W ( x ) , s ϕ ( x ) , and s W ( x ) . Each score is zero when its corresponding target is met and penalizes deviations otherwise; we then combine them as f ( x ) = 3 · s G ( x ) + s B W ( x ) + s ϕ ( x ) + s W ( x ) . These quantities are not reported from a single best design. Instead, for each run and method, we sort all simulated designs by FoM and average each metric over the top-10 designs in our experiments. This gi ves a more stable summary of the quality of the best part of the search than reporting only a single extreme point. Reliability . W e report two run-level rates. Sim. V alid is the fraction of all attempted designs in a run whose simulator output was marked valid by the pipeline. Phys. F easible is the fraction of attempted designs that satisfy our physical sanity checks. In our implementation, a design is counted as physically meaningful only if it yields positiv e bandwidth, positi ve phase margin, positi ve power , and nonnegati ve gain. Both reliability metrics are computed over the full set of attempts. Exploration/Exploitation. T o summarize how broadly each method visits distinct parts of the design space, we report Re gions . For each circuit, we ﬁrst normalize the activ e design parameters to [0 , 1] using that circuit’ s own parameter ranges. W e then pool all sampled designs from a given run and method, 0 200 400 600 800 1000 1200 Optimization Steps −8 −7 −6 −5 −4 −3 −2 −1 0 1 FoM EMA-smoothed FoM -- 180nm 21 Params (GPT 4o Mini) ACOF Single-LLM Pure-BO (a) GPT 4o Mini. A v erage and SEM of T op-10 F oM: A COF − 0 . 359 ± 0 . 018 , Single-LLM − 0 . 643 ± 0 . 015 , Pure-BO − 0 . 652 ± 0 . 026 . 0 200 400 600 800 1000 1200 Optimization Steps −8 −7 −6 −5 −4 −3 −2 −1 0 1 FoM EMA-smoothed FoM -- 180nm 21 Params (GPT 5) ACOF Single-LLM Pure-BO (b) GPT 5. A verage and SEM of T op-10 FoM: ACOF − 0 . 271 ± 0 . 024 , Single-LLM − 0 . 319 ± 0 . 035 , Pure-BO − 0 . 639 ± 0 . 019 . Fig. 5. Exponentially-smoothed FoM trajectories for the 180nm 21-parameter folded-cascode benchmark using v ariants of GPT as the LLM. Subcaptions report the across-run mean of the per-run top-10 mean FOM. Optimization steps for the LLM-based methods begin at 200 because both are initialized with C 0 = 200 seed designs generated by Pure-BO. After initialization, the LLM-based methods update the optimization ranges ev ery 100 steps. T ABLE II R E SU LT S O N T H E 1 8 0 NM 2 1 - P A R AM E T E R F O LD E D - CA S C O DE B E NC H M A RK U S IN G V A RI A N T S O F G P T A S T HE L L M. Model Method T op-10 FoM ↑ Phys. Feasible (%) ↑ Regret ↓ GPT -4o Mini A COF -0.359 0 . 018 67.1 3 . 5 0.395 0 . 019 Single-LLM -0.643 0 . 015 37.8 2 . 3 0.579 0 . 021 GPT -5 A COF -0.271 0 . 024 99.4 0 . 0 0.313 0 . 019 Single-LLM -0.319 0 . 035 80.9 8 . 9 0.381 0 . 024 and apply HDBSCAN [33], an unsupervised clustering method, in the normalized design space. The reported v alue is the number of discov ered regions. A larger value indicates that the method visited more distinct regions of the parameter space during the run. This metric is computed from the full run-lev el point cloud. W e also report Re gret , deﬁned relative to the ideal FoM value of 0 . For each run and method, at each step we compute the gap between 0 and the best FoM found up to that point, and report the average of that quantity o ver the run. Lower regret means that the method reached strong designs earlier and stayed closer to the ideal score throughout the run. V . D I S C U S S I O N A N D C O N C L U S I O N Across all four benchmarks in T able I, A COF is the only method that remains on top in both F oM and regret. That pattern is more rev ealing than an y one ra w speciﬁcation because the table reports averages over the top-10 designs, so the indication comes from the quality of a region rather than from a single fortunate sample. The baselines do lead on isolated metrics from time to time–Pure-BO on phase margin in some cases and on bandwidth in two circuits, and Single-LLM on gain for the 180 nm 21-parameters benchmark–but those wins do not carry through to the joint objecti ve. A COF repeatedly con ver ges to design sets in which gain, bandwidth, stability , and po wer sit in a better overall balance. Fig. 5 shows a similar overall trend. After the shared initialization budget, A COF rises to stronger FoM lev els and remains there through most of the search. In Fig. 5a, A COF is 44.2% closer to the ideal FoM of 0 than Single-LLM and 44.9% closer to 0 than Pure-BO. In Fig. 5b, the LM-guided portion shifts upward further: A COF improves again, Single-LLM also beneﬁts from the stronger model, and Pure-BO remains nearly unchanged. This result shows that the framework is not tied to one particular LM and that a better underlying model can be con verted into better optimization behavior without changing the search procedure. The reliability columns in T able I tell a similar story . A COF stays at or near perfect simulator v alidity throughout the study , but the more informati ve quantity is physical feasibility . On the 180 nm 21-parameters circuit, nearly every A COF proposal simulates, and a much larger fraction remains physically meaningful than for Single-LLM or Pure-BO. The same trend is visible on the 130 nm 12-parameters and 180 nm 12-parameters benchmarks as well. V ie wed alongside the actor-critic exchange in Fig. 1, this suggests that the critic is af fecting what gets tried next, not merely repairing malformed outputs. In practice, that means fe wer simulations are spent on points that are admissible yet unhelpful from an analog-design standpoint. The same search trend is visible in both the T able I and Fig. 3. On the 180 nm 21-parameters circuit, the red contours for ACOF migrate across several separated pockets of the embedding over successiv e round pairs, including pockets at UMAP Component 2 > 1 in Fig. 3–a region that the baselines touch lightly . What matters is that these relocations remain producti ve. In ev ery round pair, the A COF panels report a better a verage FoM than the corresponding Single-LLM and Pure-BO panels, and the best FoM found within each pair is stronger as well. T able I quantiﬁes this behavior using Qwen as the LLM. On this benchmark ACOF covers the largest number of regions and still records the best FoM and regret. A similar relation between region coverage and solution quality appears on the 180 nm 12-parameters and 130 nm 17- parameters circuits, where A COF also visits more regions than either baseline. The 130 nm 12-parameters case is instructive in the opposite direction: Pure-BO touches the most regions there, yet its physical-feasibility rate falls to 26 . 9% . The useful distinction is therefore not mo vement alone, b ut movement that lands in workable territory . A COF shifts when a ne w region is needed and stays with a region when it is still yielding good designs. T aken together , T able I, Fig. 3, and Fig. 5 support the same conclusion. A COF spends more of the e valuation budget in parts of the design space that are both promising and physically meaningful. This effect is clearest on the 180 nm 21-parameters benchmark, where the larger search space and tighter metric coupling make selectiv e exploration especially valuable. This has clear implications for robust and sample-ef ﬁcient analog design automation. R E F E R E N C E S [1] M. F . Barros, J. M. Guilherme, and N. C. Horta, Analog circuits and systems optimization based on evolutionary computation techniques . Springer , 2010, vol. 9. [2] M. Ahmadzadeh and G. G. Gielen, “Using probabilistic model rollouts to boost the sample efﬁcienc y of reinforcement learning for automated analog circuit sizing, ” in Pr oceedings of the 61st ACM/IEEE Design Automation Conference , 2024, pp. 1–6. [3] M. Ahmadzadeh, J. Lappas, N. W ehn, and G. Gielen, “ Anacraft: Duel-play probabilistic-model-based reinforcement learning for sample-efﬁcient pvt-robust analog circuit sizing optimization, ” IEEE T ransactions on Computer-Aided Design of Integr ated Cir cuits and Systems , 2025. [4] W . L yu, P . Xue, F . Y ang, C. Y an, Z. Hong, X. Zeng, and D. Zhou, “ An efﬁcient bayesian optimization approach for automated optimization of analog circuits, ” IEEE T ransactions on Circuits and Systems I: Re gular P apers , vol. 65, no. 6, pp. 1954–1967, 2017. [5] K. T ouloupas, N. Chouridis, and P . P . Sotiriadis, “Local bayesian optimization for analog circuit sizing, ” in 2021 58th ACM/IEEE design automation confer ence (DA C) . IEEE, 2021, pp. 1237–1242. [6] B. Liu, H. Zhang, X. Gao, Z. Kong, X. T ang, Y . Lin, R. W ang, and R. Huang, “Layoutcopilot: An llm-powered multi-agent collaborative framew ork for interactive analog layout design, ” IEEE T ransactions on Computer-Aided Design of Integr ated Cir cuits and Systems , 2025. [7] G. Chen, K. Zhu, S. Kim, H. Zhu, Y . Lai, B. Y u, and D. Z. Pan, “Llm- enhanced bayesian optimization for efﬁcient analog layout constraint generation, ” arXiv pr eprint arXiv:2406.05250 , 2024. [8] C. Liu, W . Chen, A. Peng, Y . Du, L. Du, and J. Y ang, “ Ampagent: An llm-based multi-agent system for multi-stage ampliﬁer schematic design from literature for process and performance porting, ” arXiv pr eprint arXiv:2409.14739 , 2024. [9] Y . Lai, S. Lee, G. Chen, S. Poddar, M. Hu, D. Z. Pan, and P . Luo, “ Analogcoder: Analog circuit design via training-free code generation, ” in Pr oceedings of the AAAI Conference on Artiﬁcial Intelligence , vol. 39, no. 1, 2025, pp. 379–387. [10] Y . Y in, Y . W ang, B. Xu, and P . Li, “ Ado-llm: Analog design bayesian optimization with in-context learning of large language models, ” in Pr o- ceedings of the 43r d IEEE/A CM International Confer ence on Computer- Aided Design , 2024, pp. 1–9. [11] N. Rouf, F . Amin, and P . D. Franzon, “Can low-rank knowledge distillation in llms be useful for microelectronic reasoning?” in 2024 IEEE LLM Aided Design W orkshop (LAD) , 2024, pp. 1–6. [12] Y . W en, J. Dean, B. A. Floyd, and P . D. Franzon, “High dimensional optimization for electronic design, ” in Pr oceedings of the 2022 ACM/IEEE W orkshop on Machine Learning for CAD , 2022, pp. 153–157. [13] T . Gu, J. W ang, Z. Bi, C. Y an, F . Y ang, Y . Qin, T . Cui, and X. Zeng, “tss- bo: Scalable bayesian optimization for analog circuit sizing via truncated subspace sampling, ” in 2024 Design, Automation & T est in Eur ope Confer ence & Exhibition (D ATE) . IEEE, 2024, pp. 1–6. [14] D. V . K ochar, H. W ang, A. P . Chandrakasan, and X. Zhang, “Ledro: Llm- enhanced design space reduction and optimization for analog circuits, ” in 2025 IEEE International Conference on LLM-Aided Design (ICLAD) . IEEE, 2025, pp. 141–148. [15] C. Liu and D. Chitnis, “Eesizer: Llm-based ai agent for sizing of analog and mix ed signal circuit, ” IEEE Tr ansactions on Cir cuits and Systems I: Re gular P apers , 2025. [16] N. K. Somayaji and P . Li, “Llm-uso: Large language model-based univ ersal sizing optimizer, ” IEEE T ransactions on Computer-Aided Design of Inte grated Cir cuits and Systems , 2025. [17] N. Shinn, F . Cassano, A. Gopinath, K. Narasimhan, and S. Y ao, “Reﬂe x- ion: Language agents with verbal reinforcement learning, ” Advances in Neural Information Pr ocessing Systems , vol. 36, pp. 8634–8652, 2023. [18] E. Nelson, G. K ollias, P . Das, S. Chaudhury , and S. Dan, “Needle in the haystack for memory based lar ge language models, ” arXiv preprint arXiv:2407.01437 , 2024. [19] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Intr oduction , 2nd ed. Cambridge, MA: MIT Press, 2018. [Online]. A vailable: https://incompleteideas.net/book/the- book- 2nd.html [20] R. S. Sutton, D. A. McAllester, S. P . Singh, and Y . Mansour, “Polic y gradient methods for reinforcement learning with function approximation, ” in Advances in Neural Information Pr ocessing Systems 12 (NeurIPS 1999) , S. A. Solla, T . K. Leen, and K.-R. Müller, Eds. MIT Press, 2000, pp. 1057–1063. [21] V . Mnih, K. Ka vukcuoglu, D. Silver , A. A. Rusu, J. V eness, M. G. Bellemare, A. Graves, M. Riedmiller , A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. K umaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning, ” Nature , vol. 518, no. 7540, pp. 529–533, 2015. [Online]. A v ailable: https://www .nature.com/articles/nature14236 [22] J. Schulman, S. Levine, P . Abbeel, M. Jordan, and P . Moritz, “T rust region polic y optimization, ” in Proceedings of the 32nd International Confer ence on Machine Learning , ser . Proceedings of Machine Learning Research, vol. 37. PMLR, 2015, pp. 1889–1897. [Online]. A vailable: https://proceedings.mlr .press/v37/schulman15.html [23] T . Haarnoja, A. Zhou, P . Abbeel, and S. Levine, “Soft actor- critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor , ” in Pr oceedings of the 35th International Confer ence on Machine Learning , ser . Proceedings of Machine Learning Research, vol. 80. PMLR, 2018, pp. 1861–1870. [Online]. A vailable: https://proceedings.mlr .press/v80/haarnoja18b .html [24] S. Fujimoto, H. van Hoof, and D. Meger , “ Addressing function ap- proximation error in actor-critic methods, ” in Proceedings of the 35th International Conference on Machine Learning (ICML) , 2018, pp. 1587– 1596. [25] J. Schrittwieser, I. Antonoglou, T . Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T . Graepel, T . Lillicrap, and D. Silver , “Mastering atari, go, chess and shogi by planning with a learned model, ” Nature , vol. 588, no. 7839, pp. 604–609, 2020. [Online]. A v ailable: https://www .nature.com/articles/s41586- 020- 03051- 4 [26] P . F . Christiano, J. Leike, T . B. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences, ” in Advances in Neural Information Pr ocessing Systems 30 (NeurIPS 2017) , I. Guyon, U. V . Luxbur g, S. Bengio, H. W allach, R. Fergus, S. Vishw anathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 4299–4307. [Online]. A v ailable: https://papers.neurips.cc/paper/ 7017- deep- reinforcement- learning- from- human- preferences.pdf [27] L. Ouyang, J. W u, X. Jiang, D. Almeida, C. L. W ainwright, P . Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray , J. Schulman, J. Hilton, F . Kelton, L. Miller , M. Simens, A. Askell, P . W elinder , P . F . Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback, ” in Advances in Neural Information Pr ocessing Systems 35 (NeurIPS 2022) , S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds. Curran Associates, Inc., 2022, pp. 27 730–27 744. [Online]. A vailable: https://proceedings.neurips.cc/paper_ﬁles/paper/ 2022/ﬁle/b1efde53be364a73914f58805a001731- Paper - Conference.pdf [28] H. W ang, “Bayesian-optimization, ” https://github .com/wangronin/ Bayesian- Optimization, 2023, gitHub repository . [29] H. V ogt, G. Atkinson, and P . Nenzi, Ngspice User’ s Manual , Sep. 2025, version 45 (ngspice release version). [Online]. A v ailable: https://ngspice.sourceforge.io/docs/ngspice- 45- manual.pdf [30] Qwen T eam, “Qwen2.5-14b-instruct, ” https://huggingface.co/Qwen/ Qwen2.5- 14B- Instruct, 2024, model card, accessed: 2026-03-15. [31] A. Hurst, A. Lerer , A. P . Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostro w , A. W elihinda, A. Hayes, A. Radford et al. , “Gpt-4o system card, ” arXiv pr eprint arXiv:2410.21276 , 2024. [32] A. Singh, A. Fry , A. Perelman, A. T art, A. Ganesh, A. El-Kishky , A. McLaughlin, A. Low , A. Ostrow , A. Ananthram et al. , “Openai gpt-5 system card, ” arXiv preprint , 2025. [33] L. McInnes, J. Healy , S. Astels et al. , “hdbscan: Hierarchical density based clustering. ” J . Open Source Softw . , v ol. 2, no. 11, p. 205, 2017.

Can an Actor-Critic Optimization Framework Improve Analog Design Optimization?

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment