- [ ] Check the list of existing benchmarks and if/how we can use https://drive.google.com/drive/folders/11xgW5F_z6F3ePARTbNDA9InpBScIyNte?usp=drive_link - [ ] implement pipeline to check models on the benchmarks **This issue requires a design doc:** https://github.com/orgs/sensein/projects/60/views/6?pane=issue&itemId=155029901