The trouble was that it only periodically failed to resume, which makes it a nightmare to debug as you can never 100% trust that your latest tweaks have actually fixed the problem or not.
Well, I'm posting this update to say that I'm almost maybe probably possibly definitely mostly 99% certain that I have a fix. It's not really addressing the base cause of the problem, which I suspect is a buggy ACPI BIOS, but it serves to prevent the strange failure state that it kept getting stuck in.
C State. C State run. Run, state, run!Intel C States are the terms used to describe how active or sleepy the processor is at any given moment. C0 means the processor is busy working at full capacity, and higher numbers refer to increasing amounts of laziness in the name of power efficiency. A lot of the time while you are using your laptop, it really isn't doing anything special, and will sit in the lowest power state - the highest numbered C state. This isn't a problem, because under normal circumstances, it takes mere microseconds to snap back to life when needed.
Unless, maybe, it goes into this deep sleep mode just as the laptop is also going into an ACPI sleep mode and maybe it's just too sleepy to wake up and just five more minutes, please.
To look at how the C states are being used, we can use
powertop, a lovely little utility made by Intel (but entirely usable on non-Intel machines) to track power usage. It's great for finding ways to increase your battery life, but also handy to illustrate what your CPU is doing and when. Here's what it looks like when starting it:-
Pushing the right arrow key takes us to the Idle stats page, which shows the percentage of time spent in each state.
Unfortunately, it seems that that max C4 state is interacting badly with whatever ACPI bug I also have. So the fix is to simply tell Linux to never let the processor daydream that hard, and to give it a good poke if it tries to doze off.
/etc/default/grubfile allows us to change what kernel parameters are used at boot. The line to change is the
GRUB_CMDLINE_LINUXone, and we want to add the
intel_idle.max_cstate=3parameter to prevent us from ever going into the C4 state. My
/etc/default/grubnow looks like this:-
# If you change this file, run 'update-grub' afterwards to update # /boot/grub/grub.cfg. # For full documentation of the options in this file, see: # info -f grub -n 'Simple configuration' GRUB_DEFAULT=0 GRUB_HIDDEN_TIMEOUT=0 GRUB_HIDDEN_TIMEOUT_QUIET=true GRUB_TIMEOUT=10 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" GRUB_CMDLINE_LINUX="acpi_osi=Linux acpi_backlight=vendor intel_idle.max_cstate=3" # Uncomment to enable BadRAM filtering, modify to suit your needs # This works with Linux (no patch required) and with any kernel that obtains # the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...) #GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef" # Uncomment to disable graphical terminal (grub-pc only) #GRUB_TERMINAL=console # The resolution used on graphical terminal # note that you can use only modes which your graphic card supports via VBE # you can see them in real GRUB with the command `vbeinfo' #GRUB_GFXMODE=640x480 # Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux #GRUB_DISABLE_LINUX_UUID=true # Uncomment to disable generation of recovery mode menu entries #GRUB_DISABLE_RECOVERY="true" # Uncomment to get a beep at grub start #GRUB_INIT_TUNE="480 440 1"
Save the file, run
sudo update-grubafterwards as directed, and reboot. There won't be any noticeable difference, but checking PowerTOP will show:-
We're no longer spending any time in the C4 state. And months and months of cautious use will show that it's actually working and hasn't failed to resume once!
Obviously, not entering the maximal power-saving state means slightly increased power draw on average and slightly reduced battery life. I haven't noticed a huge difference, personally; the power use tends to hover around 10W±1 and I still get more battery life from a full charge than I spend in a typical outing. I hope someone out there with similar problems finds this information useful. It's always so hard to find a good solution to these kinds of problems online, because they're always very specific to individual hardware and software configurations. What worked fine for one person may be completely useless for another; I tried blacklisting modules for various bits of possibly-buggy hardware, and tried different acpi-related kernel params. Just as I decided that I had finally found the solution, it would get locked up again. But I've finally got it sorted out and there's just one more problem to fix: The 3G modem!