Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The largest problem is available training data actually.

They have already done experiments with dittrent sub 10b models with both fine-tuning and fully from scratch. And last I check the fully from scratch captured the language in a better way.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: