AI Agents Evolve to Boost Coding Skills

In April, Microsoft’s CEO revealed that AI now writes close to a third of the company’s code. Last October, Google’s CEO reported that AI generates around a quarter of new Google code. Other tech companies are likely not far behind. This trend is driven by the development of advanced coding agents that can recursively improve themselves using evolutionary techniques.

Researchers have long aimed to create coding agents that can enhance their own capabilities. A recent study, described in a preprint on arXiv, introduces Darwin Gödel Machines (DGMs), a system that leverages large language models (LLMs) and evolutionary algorithms to achieve this goal. DGMs start with a coding agent that can read, write, and execute code, using an LLM for the reading and writing tasks.

The process involves creating variations of the coding agent, testing their performance, and selecting the best performers for the next iteration. DGMs maintain a population of agents, allowing for open-ended exploration and the potential for initially suboptimal changes to lead to breakthroughs in later iterations.

Jenny Zhang, a computer scientist at the University of British Columbia and lead author of the study, noted that the agents could write complex code by themselves, including editing multiple files, creating new files, and building intricate systems. The researchers ran DGMs for 80 iterations using two coding benchmarks: SWE-bench and Polyglot. The agents' performance improved significantly, with scores rising from 20 percent to 50 percent on SWE-bench and from 14 percent to 31 percent on Polyglot.

DGMs outperformed an alternate method that used a fixed external system for improving agents. The best SWE-bench agent, while not as good as the best human-designed agent, was generated automatically and could potentially surpass human expertise with more time and computational resources. Zhengyao Jiang, a cofounder of Weco AI, sees this as a significant proof of concept for recursive self-improvement and suggests further progress could be made by modifying the underlying LLM or even the chip architecture.

The potential of DGMs extends beyond coding benchmarks. They could be applied to specific applications like drug design, improving the agents' ability to design better drugs. However, the safety of self-improving systems is a concern. Zhang and her team added guardrails to keep the DGMs in sandboxes without internet access and logged all code changes. They also explored ways to reward AI for making itself more interpretable and aligned with human directives.

The risks associated with recursive self-improvement, such as the potential for AI to become uninterpretable or misaligned, are a topic of ongoing discussion. In 2017, experts at the Asilomar conference signed the Asilomar AI Principles, calling for restrictions on AI systems designed to recursively self-improve. Despite these concerns, experts like Jürgen Schmidhuber and Zhengyao Jiang remain optimistic about the future of AI, emphasizing the importance of responsible development and human oversight.