The largest problem is available training data actually. They have already done ...

gunalx 33 days ago | parent | context | favorite | on: Norway's 2 petabytes of Huawei flash storage and L...

The largest problem is available training data actually.

They have already done experiments with dittrent sub 10b models with both fine-tuning and fully from scratch. And last I check the fully from scratch captured the language in a better way.