Week 12: Learning Theory (Baby Steps Towards Symbolic Regression)

DSAN 5300: Statistical Learning
Spring 2026, Georgetown University

Jeff Jacobs

jj1088@georgetown.edu

Monday, April 13, 2026

Schedule

Today’s Planned Schedule:

Start End Topic
Lecture 6:30pm 7:00pm Symbolic Regression: What Can a Machine Learn? →
7:00pm 7:20pm Comparing Between Groups →
7:20pm 8:00pm Regression (Cox Proportional Hazard Model) →
Break! 8:00pm 8:10pm
8:10pm 9:00pm Quiz 3 →

Symbolic Regression: What Can a Machine Learn?

Thus far, we have either…

  • Learned \(X \rightarrow Y\) relationship by setting up a parametric equation like \(Y = \beta_0 + \beta_1X_1 + \cdots + \beta_jX_j\), then figured out how to learn the parameters \(\beta_j\),
  • Tried to find a non-parametric way to learn \(X \rightarrow Y\) relationship (\(K\)-Nearest Neighbors! “Memory-based”), or
  • Asked the computer to find patterns in \(X\) (no \(Y\)): Unsupervised learning
  • There are other (more… “thinking outside the box”) approaches! Let’s look at one (evolutionary computation), then move to thinking more generally about what a machine can learn

Symbolic Regression: “Evolving” Equations

  • Think of the relationship \(X \rightarrow Y\) as a mathematical expression with a “DNA” (bitstring) encoding…
  • The DNA “nucleotides” (substrings) can code foar:
    • Operations: \(+\), \(-\), \(\times\), \(\div\), \(\sqrt{}\), \(\exp(\cdot)\), \(\sin(\cdot)\), \(\arcsin(\cdot)\), etc.
    • Variables: \(x_1, x_2, \ldots, x_n\)
    • Constants: \(1\), \(−1\), \(0\), \(\pi\), \(e\), \(1.2\)
  • Examples: \(y = x_1 + 1\), \(y = x_2\), \(y = x_1 + \sin(x_2)\), \(y = \sin(x_1) + x_2 − 1\), \(y = \sqrt{\sin(x_1) + 1}\)

Evolution of a Computation Graph

From Kronberger et al. (2024)

Ex 1: Learning a Known Equation (Newton)

Figure 1 from de Silva et al. (2020)

Figure 4 from de Silva et al. (2020)

The Learned Equation

SR Result:

\[ \begin{aligned} \text{distance} = &4.2026 t^2 + 0.0177 m^2 + 0.5456 tm \\ &− 0.006564 ct - 4.2614 \times 10^{-4} cm \\ &+ 2.5633 \times 10^{-6} c^2 \end{aligned} \]

From Newton’s Laws:

\[ \text{distance} = \frac{g}{2}t^2 \]

SR-learned equation agreement with (a) Data and (b) Newton’s Laws, from Kronberger et al. (2024)

Example 2: Finding an “Approximately Good” Equation

Accuracy Equations in Sequence Event
-1.4197 \(x + x - c_3 - y\) random
-1.41347 \(x + x + x - c_4 - y\) mutation
-1.41339 \(x + x + x - \sin(c_3) - y\) mutation
-1.13805 \(x + x + x - \sin(y) - (x - x)\) crossover
-1.08904 \((x + x)\cdot x - \sin(y) - (x - x)\) mutation
-1.08574 \((x + x)\cdot x - \sin(y) - c_1\) mutation
-1.01841 \((x + x)\cdot x - y - c_1\) mutation
-0.978484 \((x + x + x)\cdot x - y - c_{13}\) mutation
-0.914336 \((x + y - c_3)\cdot y + x\cdot x \cdot c_{15}\) mutation
-0.303559 \((x + y - c_4)\cdot y + x \cdot x \cdot c_{15}\) mutation
-0.0692607 \((x + y - \sin(x))·y + x \cdot x \cdot c_{15}\) crossover
-0.0140815 \((x + y - x)·y + x·x·c_{15}\) mutation
-0.0050732 \((x + y - x)·y + x·x·c_{16}\) mutation
-0.0050732 \(y \cdot y + c_3 \cdot x \cdot x\) mutation
Table 1: An evolved relationship, from Schmidt and Lipson (2009)

The Takeaway

Statistical Learning Theory

PAC Learning

Quiz Time!

Quiz 12

References

Kronberger, Gabriel, Bogdan Burlacu, Michael Kommenda, Stephan M. Winkler, and Michael Affenzeller. 2024. Symbolic Regression. 1st ed. Chapman and Hall/CRC. https://doi.org/10.1201/9781315166407.
Schmidt, Michael, and Hod Lipson. 2009. “Distilling Free-Form Natural Laws from Experimental Data.” Science 324 (5923): 81–85. https://doi.org/10.1126/science.1165893.
Silva, Brian M. de, David M. Higdon, Steven L. Brunton, and J. Nathan Kutz. 2020. “Discovery of Physics From Data: Universal Laws and Discrepancies.” Frontiers in Artificial Intelligence 3 (April). https://doi.org/10.3389/frai.2020.00025.