Brookhaven Lab Unveils VISION: AI Assistant for Scientists

Newswise — UPTON, N.Y. — A team of scientists at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory has created a groundbreaking voice-controlled artificial intelligence (AI) assistant, known as VISION. This innovative tool, developed by researchers at the Lab’s Center for Functional Nanomaterials (CFN) with support from experts at the National Synchrotron Light Source II (NSLS-II), aims to break down everyday barriers for busy scientists.

VISION, the Virtual Scientific Companion, is designed to bridge knowledge gaps at complex instruments, carry out more efficient experiments, save scientists’ time, and accelerate scientific discovery. Users can simply tell VISION what they want to do at an instrument, and the AI companion will handle the task, whether it’s running an experiment, launching data analysis, or visualizing results.

Esther Tsai, a scientist in the AI-Accelerated Nanoscience group at CFN, is enthusiastic about the potential of AI in science. “What we can’t deny is that brilliant scientists spend a lot of time on routine work. VISION acts as an assistant that scientists and users can talk to for answers to basic questions about the instrument capability and operation,” Tsai explained.

The development of VISION highlights the close partnership between CFN and NSLS-II, two DOE Office of Science user facilities at Brookhaven Lab. They collaborate with facility users on the setup, scientific planning, and analysis of data from experiments at three NSLS-II beamlines, highly specialized measurement tools that enable researchers to explore the structure of materials using beams of X-rays.

Tsai, inspired to alleviate bottlenecks that come with using NSLS-II's in-demand beamlines, received a DOE Early Career Award in 2023 to develop this new concept. She now leads the CFN team behind VISION, which has collaborated with NSLS-II beamline scientists to launch and test the system at the Complex Materials Scattering (CMS) beamline at NSLS-II. This marks the first voice-controlled experiment at an X-ray scattering beamline and represents significant progress towards AI-augmented discovery.

VISION leverages the growing capabilities of large language models (LLMs), the technology at the heart of popular AI assistants such as ChatGPT. An LLM is an expansive program that creates text modeled on natural human language. VISION exploits this concept, not just to generate text for answering questions but also to generate decisions about what to do and computer code to drive an instrument.

Internally, VISION is organized into multiple “cognitive blocks,” or cogs, each comprising an LLM that handles a specific task. Multiple cogs can be put together to form a capable assistant, with the cogs carrying out work transparently for the scientist. A user can go to the beamline and say, ‘I want to select certain detectors’ or ‘I want to take a measurement every minute for five seconds’ or ‘I want to increase the temperature,’ and VISION will translate that command into code.

Those examples of natural language inputs, whether speech, text, or both, are first fed to VISION’s “classifier” cog, which decides what type of task the user is asking about. The classifier routes to the right cog for the task, such as an “operator” cog for instrument control or “analyst” cog for data analysis. Then, in just a few seconds, the system translates the input into code that’s passed back to the beamline workstation, which the user can review before executing.

VISION’s use of natural language is its key advantage. Since the system is tailored to the instrument the researcher is using, users are liberated from spending time setting up software parameters and can instead focus on the science they are pursuing. “VISION acts as a bridge between users and the instrumentation, where users can just talk to the system and the system takes care of driving experiments,” said Noah van der Vleuten, a co-author who helped develop VISION’s code generation capability and testing framework.

The ability to speak to VISION, not only type a prompt, could make workflows even faster, team members noted. In the fast-paced, ever-evolving world of AI, VISION’s creators also set out to build a scientific tool that can keep up with improving technology, incorporate new instrument capabilities, and scale up as needed to seamlessly navigate multiple tasks.

“A key guiding principle is that we wanted to be modular and adaptable, so we can quickly swap out or replace with new AI models as they become more powerful,” said Shray Mathur, first author on the paper who worked on VISION’s audio-understanding capabilities and overall architecture. “As underlying models become better, VISION becomes better. It’s very exciting because we get to work on some of the most recent technology, and deploy it immediately. We’re building systems that can really benefit users in their research.”

This work builds on a history of AI and machine learning (ML) tools developed by CFN and NSLS-II to aid facility scientists, including for autonomous experiments, data analytics, and robotics. Future versions of VISION could act as a natural interface to these advanced AI/ML tools.

Now that the architecture for VISION is developed and at a stage where it has an active demonstration at the CMS beamline, the team aims to test it further with beamline scientists and users and eventually bring the virtual AI companion to additional beamlines. This way, the team can have real discussions with users about what’s truly helpful to them.

“The CFN/NSLS-II collaboration is really unique in the sense that we are working together on this frontier AI development with language models on the experimental floor, at the front-line supporting users,” Tsai said. “We’re getting feedback to better understand what users need and how we can best support them.”

CMS lead beamline scientist Ruipeng Li has been a key supporter of VISION. “We’ve been close collaborators and partners since the beamline was built more than eight years ago. These concepts enable us to build on our beamline’s potential and continue to push the boundaries of AI/ML applications for science. We want to see how we can learn from this process because we are riding the AI wave now.”

In the broader context of AI-augmented scientific research, the development of VISION is a step towards realizing other AI concepts across the DOE complex, including a science exocortex. Kevin Yager, the AI-Accelerated Nanoscience Group leader at CFN and VISION co-author, envisions the exocortex as an extension of the human brain that researchers can interact with through conversation to generate inspiration and imagination for scientific discovery.

“When I imagine the future of science, I see an ecosystem of AI agents working in coordination to help me advance my research,” Yager said. “The VISION system is an early example of this future — an AI assistant that helps you operate an instrument. We want to build more AI assistants, and connect them together into a really powerful network.”

This work was supported by the DOE Early Career Research Program and DOE Office of Science.