Problems with Benchmarks
We’ve seen the possible problem of overfitting
- remember machine learning benchmarks?
Two common approaches are used
- benchmark libraries
- should include hard problems and expand over time
- random problems
- should include problems believed to be hard
- allows unlimited test sets to be constructed
- disallows “cheating” by hardwiring algorithms
- so what’s the problem?