In the recently released EPYC 7002 “Rome” server chip guide, AMD admitted that due to a bug in the clock countdown timer, the second-generation EPYC chip had a kernel freeze after 1044 days of normal operation.
Servers using this EPYC chip need to be restarted every 2.93 years, and AMD officially stated that it will not fix the above BUG.
AMD stated in the guide that the cause of this problem is that the core cannot leave the CC6 power saving mode (Core C6 State), and after entering this mode, the voltage and clock frequency will be reduced. AMD clarified that the timing of the error may depend on the spread spectrum modulation and the REFCLK frequency reference.
Reddit user acid_migrain, after detailed calculations, believes that the actual time leading to freezing is not 1044 days, but 1042 days and 12 hours.
Note: AMD launched the EPYC “Rome” series of server chips in 2018, and some customers have reported to AMD that they have encountered this problem.
AMD stated that it has no plans to fix the vulnerability. The foreign technology media technewsspace believes that the reason is either that the repair cost is too high, or the number of affected users is not large.