Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It all should be tuned with an AMD CPU expert, and programmer adjusting code under their guidance to leverage all CPU features.

Did AMD engineers or seasoned hardware experts from server vendor assist in this implementation?

Were the "Nodes Per Socket", "CCX as NUMA", "Last Level Cache as NUMA" settings tested/optimized? I don't see them mentioned in the article. They can make A LOT of difference for different workloads, and there's no single setting/single recommendation that would fit all scenarios.

"The locality of cores, memory, and IO hub/devices in a NUMA-based system is an important factor when tuning for performance” - „AMD EPYC 9005 Processor Architecture Overview” page 7

What was the RAM configuration? 12 DIMM modules (optimal) or 24 (suboptimal)?

Was the virtualization involved? If so, how was it configured? How does bare metal performance compare to virtualized system for this specific code?

So many opportunities to explore not mentioned in the text.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: