Datasets
Data sets, used for model training and testing, are categorized into two main groups:
Synthetic Data
For synthetic problems, the analytical form of underlying models is known and used to generate data points.
This category includes physics equations which are constrained by physical units.
Examples:
mathematical equation: \(f(x) = 2x^2 + \cos(x)\)
physical equation: \(f(x) = Gm_1m_2/r^2\) with \([m_1]=[m_2]=\) Kilograms, \([r]=\) meter and \(G\) is gravitational constant.
Origin |
Underlying model |
Benchmark name |
Reference |
problems |
year |
---|---|---|---|---|---|
Physics | Ordinary differential equations General physics equations (classical mechanics, electromagnetism, quantum mechanics, gravity, nuclear physics, etc. ) |
Strogatz AIFeynman |
Strogatz repositery Feynman Database |
10 120 |
2011 2019 |
Mathematics |
monomials, polynomials, trigonometric, exponential, etc. |
Koza Keijer Vladislavleva Nguyen Korns R Jin Livermore |
Koza Keijer M. Vladislavleva E.J. et al. Uy N.Q. et al. Korns M.F. Krawiec K., Pawlak T. Jin Y. et al. Petersen B. K. et al. |
3 15 8 12 15 3 6 22 |
1994 2003 2009 2011 2011 2013 2019 2021 |
Real Data
For real-world problems, the underlying model is fully unknown. This category includes any type of data such as climate, economics, medical, etc.
Datasets for this category can be found in The Penn Machine Learning benchmarks (PMLB) directory.
Research on SR datasets
- Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery 2022
- Expression Sampler as a Dynamic Benchmark for Symbolic Regression [DOI] 2023
Examples of mathematical equations
Dataset |
Expression |
Variables |
Training data range |
---|---|---|---|
Koza-1 | \(x^4 + x^3 + x^2 + x\) | 1 | U[-1, 1, 20] |
Koza-2 | \(x^5 - 2x^3 + x\) | 1 | U[-1, 1, 20] |
Koza-3 | \(x^6 - 2x^4 + x^2\) | 1 | U[-1, 1, 20] |
Keijzer-1 | \(0.3 x \sin(2\pi x)\) | 1 | E[-1, 1, 0.1] |
Keijzer-2 | \(0.3 x \sin(2\pi x)\) | 1 | E[-2, 2, 0.1] |
Keijzer-3 | \(0.3 x \sin(2\pi x)\) | 1 | E[-3, 3, 0.1] |
Keijzer-4 | \(x^3e^{-x} \cos(x)\sin(x)(\sin^2(x)\cos(x)-1)\) | 1 | E[0, 10, 0.05] |
Keijzer-5 | \(30xz/(x-10)y^2\) | 3 | \(x,z:\) U[-1,1,1000] \(y:\) U[1,2,1000] |
Keijzer-6 | \(\sum_1^{x}i\) | 1 | E[1, 50, 1] |
Keijzer-7 | \(\log x\) | 1 | E[1, 100, 1] |
Keijzer-8 | \(\sqrt{x}\) | 1 | E[0, 100, 1] |
Keijzer-9 | \(\mathrm{arcsinh}(x)=\log(x +\sqrt{x^2 + 1})\) | 1 | E[0, 100, 1] |
Keijzer-10 | \(x^y\) | 2 | U[0, 1, 100] |
Keijzer-11 | \(xy + \sin((x-1)(y-1))\) | 2 | U[-3, 3, 20] |
Keijzer-12 | \(x^4-x^3 +y^2/2 - y\) | 2 | U[-3, 3, 20] |
Keijzer-13 | \(6\sin(x)\cos(y)\) | 2 | U[-3, 3, 20] |
Keijzer-14 | \(8/(2+x^2+y^2)\) | 2 | U[-3, 3, 20] |
Keijzer-15 | \(x^3/5 +y^3/2-y-x\) | 2 | U[-3, 3, 20] |
Vladislavleva-1 | \(\frac{e^{-(x-1)^2}}{1.2+(y-2.5)^2}\) | 1 | U[0.3, 4, 100] |
Vladislavleva-2 | \(e^{-x}x^3(\cos x\sin x)(\cos x \sin^2 x-1)\) | 2 | E[0.5, 10, 0.1] |
Vladislavleva-3 | \(e^{-x}x^3(\cos x\sin x)(\cos x\sin^2 x-1)(y-5)\) | 2 | \(x:\)E[0.05,10,0.1] \(y:\)E[0.05,10.05,2] |
Vladislavleva-4 | \(\frac{10}{5+\sum_{i=1}^{5}(x_i-3)^2}\) | 5 | U[0.05, 6.05, 1024] |
Vladislavleva-5 | \(30(x-1)\frac{(z-1)}{y^2(x-10)}\) | 3 | \(x:\) U[0.05, 2, 300] \(y:\) U[1, 2, 300] \(z:\) U[0.05, 2, 300] |
Vladislavleva-6 | \(6\sin(x)\cos(y)\) | 2 | U[0.1, 5.9, 30] |
Vladislavleva-7 | \((x-3)(y-3) + 2\sin((x-4)(y-4))\) | 2 | U[0.05, 6.05, 300] |
Vladislavleva-8 | \(\frac{(x-3)^4+(y-3)^3-(y-3)}{(y-2)^4+10}\) | 2 | U[0.05, 6.05, 50] |
Nguyen-1 | \(x^3+ x^2 + x\) | 1 | U(-1,1,20) |
Nguyen-2 | \(x^4 + x^3+ x^2 + x\) | 1 | U(-1,1,20) |
Nguyen-3 | \(x^5 + x^4 + x^3+ x^2 + x\) | 1 | U(-1,1,20) |
Nguyen-4 | \(x^6 + x^5 + x^4 + x^3+ x^2 + x\) | 1 | U(-1,1,20) |
Nguyen-5 | \(\sin(x^2)\cos(x) -1\) | 1 | U(-1,1,20) |
Nguyen-6 | \(\sin(x) + \sin(x+x^2)\) | 1 | U(-1,1,20) |
Nguyen-7 | \(\log(x+1) + \log(x^2+1)\) | 1 | U(0,2,20) |
Nguyen-8 | \(\sqrt{x}\) | 1 | U(0,4,20) |
Nguyen-9 | \(\sin(x) + \sin(y^2)\) | 2 | U(-1,1,100) |
Nguyen-10 | \(2\sin(x)\cos(y)\) | 2 | U(-1,1,100) |
Nguyen-11 | \(x^{y}\) | 2 | |
Nguyen-12 | \(x^4 - x^3 + \frac{1}{2}y^2 - y\) | 2 | |
Korns-1 | \(1.57 + (24.3 v)\) | 1 | U[-50, 50, 10000] |
Korns-2 | \(0.23 + 14.2\frac{v+y}{3\omega}\) | 3 | U[-50, 50, 10000] |
Korns-3 | \(-5.41 + 4.9\frac{v-x+y/w}{3\omega}\) | 4 | U[-50, 50, 10000] |
Korns-4 | \(-2.3 + 0.13\sin(z)\) | 1 | U[-50, 50, 10000] |
Korns-5 | \(3 + 2.13 \ln(\omega)\) | 1 | U[-50, 50, 10000] |
Korns-6 | \(1.3 + 0.13 \sqrt{x}\) | 1 | U[-50, 50, 10000] |
Korns-7 | \(213.80940889(1- e^{-0.54723748542 x})\) | 1 | U[-50, 50, 10000] |
Korns-8 | \(6.87 + 11 \sqrt{7.23~x~v~\omega}\) | 3 | U[-50, 50, 10000] |
Korns-9 | \(\frac{\sqrt{x}}{\ln(y)}\frac{e^z}{v^2}\) | 4 | U[-50, 50, 10000] |
Korns-10 | \(0.81 + 24.3\frac{2 y+3 z^2}{4v^3+5\omega^4}\) | 4 | U[-50, 50, 10000] |
Korns-11 | \(6.87 + 11\cos(7.23 x^3)\) | 1 | U[-50, 50, 10000] |
Korns-12 | \(2-2.1\cos(9.8 x)\sin(1.3\omega)\) | 2 | U[-50, 50, 10000] |
Korns-13 | \(32-3\frac{\tan(x)}{\tan(y)}\frac{\tan(z)}{\tan(v)}\) | 4 | U[-50, 50, 10000] |
Korns-14 | \(22-4.2(\cos(x)-\tan(y))\frac{\tanh(z)}{\sin(v)}\) | 4 | U[-50, 50, 10000] |
Korns-15 | \(12-6\frac{\tan(x)}{e^y}(\ln(z)-\tan(v))\) | 4 | U[-50, 50, 10000] |
R1 | \((x+1)^3/(x^2-x+1)\) | 1 | E[-1,1,20] |
R2 | \((x^5-3x^3+1)/(x^2+1)\) | 1 | E[-1,1,20] |
R3 | \((x^6+x^5)/(x^4+x^3+x^2+x+1)\) | 1 | E[-1,1,20] |
Jin-1 | \(2.5x^4 -1.3x^3 +0.5y^2 -1.7y\) | 2 | U(-3,3,100) |
Jin-2 | $ 8.0x^2 + 8.0y^3 -15.0$ | 2 | U(-3,3,100) |
Jin-3 | $ 0.2x^3 +1.5y^3 -1.2y -0.5x$ | 2 | U(-3,3,100) |
Jin-4 | $ 1.5\exp(x) + 5.0\cos(y)$ | 2 | U(-3,3,100) |
Jin-5 | $ 6.0\sin(x)\cos(y)$ | 2 | U(-3,3,100) |
Jin-6 | $ 1.35xy + 5.5\sin((x-1.0)(y-1.0)$ | 2 | U(-3,3,100) |
Livermore-1 | \(1/3 + x + \sin(x^2)\) | 1 | U[-10,10,1000] |
Livermore-2 | \(\sin(x^2)\cos(x) - 2\) | 1 | U[-1,1,20] |
Livermore-3 | \(\sin(x^3)\cos(x^2) -1\) | 1 | U[-1,1,20] |
Livermore-4 | \(\log(x+1) + \log(x^2+1)+\log(x)\) | 1 | U[0,2,20] |
Livermore-5 | \(x^4 - x^3 + x^2 -y\) | 2 | U[0,1,20] |
Livermore-6 | \(4x^4 + 3x^3 + 2x^2 + x\) | 1 | U[-1,1,20] |
Livermore-7 | \(\sinh(x)\) | 1 | U[-1,1,20] |
Livermore-8 | \(\cosh(x)\) | 1 | U[-1,1,20] |
Livermore-9 | \(x^9 +x^8+x^7+x^6+x^5+x^4+x^3+x^2+x\) | 1 | U[-1,1,20] |
Livermore-10 | \(6\sin(x)\cos(y)\) | 2 | U[0,1,20] |
Livermore-11 | \(x^2y^2/(x+y)\) | 2 | U[-1,1,50] |
Livermore-12 | \(x^5/y^3\) | 2 | U[-1,1,50] |
Livermore-13 | \(x^{1/3}\) | 1 | U[0,4,20] |
Livermore-14 | \(x^3+x^2+x+\sin(x)+\sin(x^2)\) | 1 | U[-1,1,20] |
Livermore-15 | \(x^{1/5}\) | 1 | U[0,4,20] |
Livermore-16 | \(x^{2/5}\) | 1 | U[0,4,20] |
Livermore-17 | \(4\sin(x)\cos(y)\) | 2 | U[0,1,20] |
Livermore-18 | \(\sin(x^2)\cos(x) - 5\) | 1 | U[-1,1,20] |
Livermore-19 | \(x^5+x^4+x^2+x\) | 1 | U[-1,1,20] |
Livermore-20 | \(\exp(-x^2)\) | 1 | U[-1,1,20] |
Livermore-21 | \(x^8+x^7+x^6+x^5+x^4+x^3+x^2+x\) | 1 | U[-1,1,20] |
Livermore-22 | \(\exp(-0.5x^2)\) | 1 | U[-1,1,20] |