Header: Image

Can Contemporary AI Agents Solve Symbolic Regression Challenges?

2025-08-06

AI agents enhance symbolic regression through prior knowledge but do not resolve core challenges like computational complexity or global optimality guarantees, functioning best as complementary tools within a broader methodological framework.

 

This is an LLM-translated blog post from the original slides titled "时髦智能体能搞定玄学符号回归吗?". The image above is sourced from the DeepMind blog.

Imagine discovering Kepler's laws from planetary motion data alone, without prior knowledge of gravity. This is the essence of symbolic regression: constructing interpretable and concise mathematical expressions from limited data. This task faces significant challenges due to the vast and discrete search space of possible expressions, where overfitting and local optima are common pitfalls. Recent advances in AI agents have sparked interest in whether these systems can overcome these longstanding difficulties. This post examines the intersection of modern AI agents with symbolic regression, evaluating both their potential and limitations through a critical lens.

AlphaEvolve: Hype and Limitations #

Released by Google DeepMind in May 2025, AlphaEvolve leverages the Gemini 2.0 large language model to improve various algorithms, including matrix multiplication, geometric optimization, and acceleration of FlashAttention implementations. For example, the system demonstrated improvements in matrix multiplication operations, reducing the count from 87 to 85 for a 4×4×7 matrix, from 98 to 96 for a 4×4×8 matrix, and from 93 to 90 for a 4×5×6 matrix.

AlphaEvolve workflow

AlphaEvolve employs an iterative evolutionary process where the LLM modifies code based on performance feedback. Its prompt engineering consists of essential components (system instructions and previously sampled programs) and optional components (performance metrics, problem context, meta-prompts for prompt updating, and randomly set prompts).

Despite its promising architecture, AlphaEvolve faces significant transparency issues. DeepMind has not disclosed critical technical details, withheld the framework code from public release, and shifted focus toward commercialization through early access programs and UI development.

Independent replication attempts, such as those using OpenEvolve for circle packing problems, only reproduce the previous state-of-the-art value of 2.634 instead of AlphaEvolve's reported improvement to 2.635. This discrepancy suggests caution is warranted before embracing AI agents as complete solutions to symbolic regression challenges.

Established Symbolic Regression Approaches #

The search space of symbolic regression is discrete and grows combinatorially with expression length and operator complexity. Since we only have a limited data, it's easy to overfit them. And the optimization can fell in local optima that prevents us finding the simple and interpretable expression.

Genetic Algorithms (PySR) #

PySR implements evolutionary operations—mutation, crossover, selection, and elimination—to evolve expression trees [M. Cranmer, arXiv.2305.01582]. The approach incorporates several mechanisms to avoid premature convergence: simulated annealing (allowing acceptance of poorer individuals early in the process), age regularization (preventing entrapment in local optima), post-evolution simplification and constant optimization, automatic adjustment of complexity penalties, and island algorithms for robustness and parallelization.

Inner loop of PySR

When evaluated against the EmpiricalBench suite of physics formulas, PySR correctly recovered 5/5 solutions for Hubble's law, Kepler's third law, Newton's gravity, Leavitt's law, Schechter's function, Bode's law, and the ideal gas law. However, it failed to recover Planck's radiation law and the Rydberg formula in all test cases.

NameFormulaPySR
Hubblev=H0Dv=H_0 D5/5
KeplerP2a3P^2 \propto a^35/5
NewtonF=G(m1m2)/r2r^\mathbf{F}=G(m_1 m_2) / r^2 \hat{\mathbf{r}}5/5
PlanckB=(2hν3)/c21/(exp(hν/kBT)1)B=(2 h \nu^3) / c^2 \cdot 1 / (\exp(h \nu / k_B T)-1)0/5
LeavittM=αlog10(P)+δM=\alpha \log_{10}(P)+\delta5/5
Schechtern=ϕ/L(L/L)αexp(L/L)n=\phi_\ast / L_\ast (L / L_\ast)^\alpha \exp(-L / L_\ast)5/5
Bodea=0.4+0.3(2n)a=0.4+0.3(2^n)5/5
Ideal GasP=(nRT)/VP=(n R T) / V5/5
Rydberg1/λ=RH(1/n121/n22)1 / \lambda = R_H (1 / n_1^2 - 1 / n_2^2)0/5

PySR's key characteristics include:

While PySR benefits from efficient parallel computation and requires no prior assumptions about expressions, it suffers from weak local optimization capabilities, limited interpretability of the evolutionary process, and difficulties handling high-dimensional problems. The computational demands remain substantial, requiring sufficiently large populations and iterations that can necessitate running each island on a separate node.

Brute-Force Search (SISSO) #

The SISSO algorithm assumes target expressions are linear combinations of generated features [R. Ouyang et al., PRM 2, 083802 (2018)]. It repeatedly applies operators I,+,,×,/,exp,log,,,1,2,3{I,+,-,\times,/,\exp, \log, |·|, \sqrt{·}, ^{-1}, ^2, ^3} to create a vast feature space that can reach 101010^{10} or more elements even when dimensional analysis is applied. The method employs a two-step approach: Sure Independence Screening (SIS) selects features most correlated with the target property, followed by Sparsifying Operator (SO) techniques like LASSO (1\ell_1-norm approximation) or 0\ell_0-norm minimization to combine features linearly.

SISSO workflow

SISSO has demonstrated success in predicting binary compound enthalpies with 82 materials and 32 features, outperforming commercial tools like Eureqa.

SISSO's key characteristics include:

While SISSO systematically explores large feature spaces, its assumption of linear combinations limits expressiveness. Computational demands escalate rapidly with variable count, and iterative variable selection is necessary to manage computational overhead.

Reinforcement Learning (Φ\Phi-SO) #

Φ\Phi-SO employs RNNs with reinforcement learning to maximize rewards for valid expressions, with an "in-situ" mechanism that restricts token output to ensure dimensional correctness. The approach uses a risk-seeking policy gradient objective to maximize the expectation of rewards in the best cases. When evaluated against the Feynman benchmark, Φ\Phi-SO demonstrated competitive performance.

\Phi-SO architecture for symbolic regression

Φ\Phi-SO's key characteristics include:

While Φ\Phi-SO requires no prior assumptions about expressions and demonstrates robustness to noisy data, it depends on prior assumptions about constant dimensions and suffers from token sequences being an imperfect representation of mathematical equivalence (e.g., x/xx/x vs. 11). Like other methods, it cannot guarantee finding global optima and requires retraining for each new problem.

Integrating AI Agents with Symbolic Regression #

Concept-Guided Search (LaSR) #

LaSR employs large language models to extract concepts from high-performing expressions and guide PySR's evolutionary process. The approach involves three steps: the LLM extracts concepts from successful expressions, generates new concepts based on these, and then evolves expressions using either classic genetic algorithms or LLM-guided evolution.

LaSR architecture for concept-guided symbolic regression

When evaluated on the Feynman dataset over 40 iterations, LaSR increased exact matches from 59 (PySR) to 72 (GPT-3.5), with corresponding reductions in almost matches (from 7 to 3) and close matches (from 16 to 10).

TypePySRLaSR (Llama3-8B)LaSR (GPT-3.5)
Exact596772
Almost753
Close16910
Fail181915

LaSR's key characteristics include:

The primary advantages of LaSR are its ability to learn concepts that better guide genetic algorithms and improved interpretability of the evolutionary process compared to traditional approaches. However, it cannot guarantee the LLM learns correct concepts, depends heavily on LLM capabilities, and struggles to fully leverage LLM potential since the expression string is not a field LLMs are good at. Computationally, LLM-guided evolution constitutes only a small portion of the overall process since the LLMs are computationally expensive.

LLM-Driven Search (LLM-SR) #

LLM-SR represents a paradigm shift by using large language models as active search agents. The method operates iteratively: the LLM proposes candidate expressions based on problem descriptions, input variables, and performance feedback, then evaluates them against the data. Through island-based parallelization, LLM-SR maintains a diverse pool of high-performing solutions.

LLM-SR for symbolic regression

LLM-SR has demonstrated strong performance on synthetic benchmarks, such as recovering stress-strain relationships in aluminum across temperatures, often reaching high-accuracy solutions in fewer iterations compared to traditional methods. On the oscillation benchmark, LLM-SR achieved a mean-squared error of 4.65e-7 on in-distribution data and 0.0005 on out-of-distribution data, significantly outperforming other methods.

ModelIDOOD
DSR0.00870.2454
uDSR0.00030.0007
PySR0.00090.3106
FunSearch0.48408.059
LLM-SR4.65e-70.0005
no optim.0.13710.6764
no prior0.00010.0029
no program0.00010.0035

LLM-SR's key characteristics include:

While LLM-SR effectively leverages the LLM's prior knowledge and can propose contextually appropriate expressions, it depends heavily on the LLM's capabilities, cannot directly simplify expressions or measure diversity, and relies on synthetic datasets for benchmarking, raising concerns about real-world applicability.

FunSearch: The AlphaEvolve Precursor #

FunSearch, developed by DeepMind as AlphaEvolve's predecessor, performs poorly on symbolic regression tasks. In oscillation benchmarks, it achieved a mean-squared error of 0.4840 on in-distribution data versus LLM-SR's 4.65e-7. This demonstrates that general-purpose LLM search frameworks require domain-specific adaptations to succeed in symbolic regression.

FunSearch workflow

FunSearch's key characteristics include:

Comparative Analysis #

MethodOpen SourceRepresentationComplexity ControlDimensional AnalysisConstant OptimizationHigh-Dimensional Handling
PySR
GitHub Repo stars
Expression treesFitness penaltiesNoneOptimizationGradient boosting trees
SISSO
GitHub Repo stars
Linear feature combosFeature space sizeYesLASSO/0\ell_0Iterative variable selection
Φ\Phi-SO
GitHub Repo stars
Token sequencesToken count limitYesOptimizationNot addressed
LaSR
GitHub Repo stars
Expression treesFitness penaltiesNoneOptimizationSame as PySR
LLM-SR
GitHub Repo stars
CodeInput parameter countLLM-dependentOptimizationNot addressed
FunSearch
GitHub Repo stars
CodeKolmogorov complexityLLM-dependentLLM-dependentNot addressed

Conclusions #

While AI agents enhance symbolic regression capabilities, they function best as complementary tools rather than complete solutions. Progress in this field requires combining domain expertise with methodological innovation, maintaining realistic expectations about the persistent challenges of expression discovery.

Leave your comments and reactions on GitHub