MultiGA: Leveraging Multi-Source Seeding in Genetic Algorithms

Reading time: 5 minute
...

📝 Original Info

  • Title: MultiGA: Leveraging Multi-Source Seeding in Genetic Algorithms
  • ArXiv ID: 2512.04097
  • Date: 2025-11-21
  • Authors: Isabelle Diana May-Xin Ng, Tharindu Cyril Weerasooriya, Haitao Zhu, Wei Wei

📝 Abstract

Large Language Models (LLMs) are widely used across research domains to tackle complex tasks, but their performance can vary significantly depending on the task at hand. Evolutionary algorithms, inspired by natural selection, can be used to refine solutions iteratively at inference-time. To the best of our knowledge, there has not been exploration on leveraging the collective capabilities of multi-source seeding for LLM-guided genetic algorithms. In this paper, we introduce a novel approach, MultiGA, which applies genetic algorithm principles to address complex natural language tasks and reasoning problems by sampling from a diverse population of LLMs to initialize the population. MultiGA generates a range of outputs from various parent LLMs, open source and closed source, and uses a neutral fitness function to evaluate them. Through an iterative recombination process, we mix and refine these generations until an optimal solution is achieved. We benchmark our approach using text-to-SQL code generation tasks, trip planning, GPQA benchmark for grad-level science questions, and the BBQ bias benchmark. Our results show that MultiGA converges to the accuracy of the LLM best fit for the task, and these insights lay the foundation for future research looking closer at integrating multiple LLMs for unexplored tasks in which selecting only one pre-trained model is unclear or suboptimal.

💡 Deep Analysis

Figure 1

📄 Full Content

MultiGA: Leveraging Multi-Source Seeding in Genetic Algorithms Isabelle Diana May-Xin Ng1,2, Tharindu Cyril Weerasooriya1, Haitao Zhu1, Wei Wei1 1Center for Advanced AI, Accenture, 2UC Berkeley Large Language Models (LLMs) are widely used across various research domains to tackle complex tasks, but their performance can vary significantly depending on the task at hand. Compared to fine- tuning, inference-time optimization methods offer a more cost-effective way to improve LLM output. Evolutionary algorithms can be used to refine solutions iteratively, mimicking natural selection. To the best of our knowledge, there has not been exploration on leveraging the collective capabilities of multi-source seeding for LLM-guided genetic algorithms. In this paper, we introduce a novel approach, MultiGA, which applies genetic algorithm principles to address complex natural language tasks and reasoning problems by sampling from a diverse population of LLMs to initialize the population. MultiGA generates a range of outputs from various parent LLMs, open source and closed source, and uses a neutral fitness function to evaluate them. Through an iterative recombination process, we mix and refine these generations until an optimal solution is achieved. Our results show that MultiGA converges to the accuracy of the LLM best fit for the task, and these insights lay the foundation for future research looking closer at integrating multiple LLMs for unexplored tasks in which selecting only one pre-trained model is unclear or suboptimal. We benchmark our approach using text-to-SQL code generation tasks, trip planning, GPQA benchmark for grad-level science questions, and the BBQ benchmark that measures bias in models. This work contributes to the growing intersection of evolutionary computation and natural language, highlighting the potential of biologically inspired algorithms to improve generative artificial intelligence selectivity and accuracy. 1. Introduction The development of small language models (SLMs) and pretrained language models (PLMs) marked early progress in natural language processing, opening new possibilities for text understanding and generation. Models such as BERT (Devlin et al., 2019) and RoBERTa (Liu et al., 2019) demonstrated the power of large-scale pretraining, while ULMFiT (Howard & Ruder, 2018) showcased the utility of fine-tuning for downstream tasks. However, these models struggle with unfamiliar prompts and PLMs often require extensive task-specific engineering, which limits their applicability (Zhu & Zeng, 2022). The emergence of large language models in 2020 marked a turning point: models such as GPT-3 demonstrated, for the first time, strong generalization across a wide range of tasks (Brown et al., 2020). These early LLMs achieved substantially higher accuracy than smaller pretrained models on closed-book question answering benchmarks like TriviaQA (Joshi et al., 2017), and showed significant gains in arithmetic reasoning and other challenging domains (Brown et al., 2020). Furthermore, LLMs support one-shot and few-shot prompting, reducing the need for fine-tuning and making them attractive for workflows that require flexible reasoning (Zhao et al., 2023). Prompting techniques such as Chain-of-Thought (CoT) (Wei et al., 2023) and Tree-of-Thoughts (ToT) (Yao et al., 2023) further enhance performance by structuring reasoning into sequential or branching steps, effectively breaking complex tasks into manageable components for the LLM to interpret. This idea of decomposing tasks has driven the rise of multi-agent workflows, particularly in industry applications. Consider text-to-SQL: solving a single query may require preprocessing natural language, linking question terms to database schema, generating SQL code, and validating outputs. Complex Corresponding author(s): Isabelle Diana May-Xin Ng (isabelle.ng@berkeley.edu). Tharindu Cyril Weerasooriya (t.weerasooriya@accenture.com). Work done during internship at Accenture. Work to appear at the Workshop on Efficient Reasoning at NeurIPS 2025. © 2025 Accenture. All rights reserved arXiv:2512.04097v1 [cs.NE] 21 Nov 2025 MultiGA: Leveraging Multi-Source Seeding in Genetic Algorithms Population Select Parents Select Population Initialization Population Evaluate Fitness If target fitness ϕ is reached, terminate Evaluate Fitness Recombination Return best candidate If generation budget T is exhausted, terminate Figure 1 | Overview of the MultiGA framework. Populations are initialized with multiple LLMs, while an independent LLM 𝐸handles fitness evaluation (scoring candidates) and recombination (combining two parent solutions). The process terminates once target fitness 𝜙or maximum number 𝑇generations is reached. Then, the top candidate solution is returned. pipelines like this often adopt multiple agents, where each specialized LLM agent handles a subtask. Nevertheless, accuracy remains a persistent challenge, especially for developers who rely on

📸 Image Gallery

Accenture.svg.png MultiGA_2.png Recombination.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut