NEW

Lemma pre-release coming Oct 2025

Join waiting list

Publications

Research papers and publications advancing Axiomatic Intelligence

Authors

Marco Del Tredici, Jacob McCarran, Benjamin Breen, Javier Aspuru Mijares, Weichen Winston Yin, Jacob M. Taylor, Frank Koppens, Dirk Englund

Abstract

We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof generation, a process that demands both creative reasoning and strict syntactic rigor. Ax-Prover meets this challenge by equipping Large Language Models (LLMs), which provide knowledge and reasoning, with Lean tools via the Model Context Protocol (MCP), which ensure formal correctness. To evaluate its performance as an autonomous prover, we benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory. On public datasets, Ax-Prover is competitive with state-of-the-art provers, while it largely outperform them on the new benchmarks. This shows that, unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology for formal verification across diverse scientific domains. Furthermore, we demonstrate Ax-Prover's assistant capabilities in a practical use case, showing how it enabled an expert mathematician to formalize the proof of a complex cryptography theorem.

Artificial Intelligence Multiagent Systems

Authors

Ankita Sharma, YuQi Fu, Vahid Ansari, Rishabh Iyer, Fiona Kuang, Kashish Mistry, Raisa Islam Aishy, Sara Ahmad, Joaquin Matres, Dirk R. Englund, Joyce K.S. Poon

Abstract

The paper presents "Photonics Intelligent Design and Optimization (PhIDO)", a multi-agent framework that converts natural-language photonic integrated circuit (PIC) design requests into layout mask files. The researchers compared 7 reasoning large language models using a testbench of 102 design descriptions, achieving up to 91% success rate for single-device designs and approximately 57% success rate for designs with up to 15 components.

Hardware Architecture Artificial Intelligence Applied Physics Optics

Stay Updated on Our Research

Want to be notified when we publish new research? Reach out to learn more about our work and potential collaboration opportunities.