With EXA support enabled in the xorg.conf for radeonhd, I started to get X.org lock-ups when resuming from s2disk on the 22.214.171.124 kernel at random times; sometimes after 2 days of uptime, sometimes after just 15 minutes. Since I'm using a laptop with that funny Fn key layout, pressing alt-sysrq-k (SAK), alt-sysrq-u (remount read-only) and alt-sysrq-i (kill all tasks but init IIRC) makes the kernel believe I have pressed alt-sysrq-<insert numpad key here>, which doesn't do anything but set the kernel loglevel. Not being able to see anything on the pitch-black screen, all I have left then is alt-sysrq-s (sync system call) and alt-sysreq-b (reboot) and let fsck replay the ext3 journals while restarting.
Hearing that the drm kernel modules at the freedesktop.org repositories are not updated with the mainline kernel changes, I decided to try out the kernel 2.6.30-rc2. Fear. Despair. It locked up forever while suspending to disk.
Either it apparently did something more than mereley locking up, or the freedesktop.org drm and radeon kernel modules are more broken than I expected, but some hours after rebooting to 126.96.36.199 and doing my homework on the laptop, most of the running GUI programs crashed with random SIGSEGVs and SIGABRTs. Yes, what you read - even the KDE window manager got busted. Interestingly, I could bring it up again with some effort without restarting X, but then I noticed that one of the apps I was working with, Eclipse (3.4.x), didn't want to start again. When launched from the console, it would exit around 2 seconds after being invoked, with no explanation. After fiddling around with my ~/.eclipse dir, restarting X various times just to see knotify and artsd crashing like mad on KDE startup, I started session as another (dummy) user in X and tried to launch Eclipse 3.2 instead - its explanation was a bit more satisfactory: missing libraries.
"Of course", I thought, "the ldconfig cache may have become corrupted".
I kind of hit the nail:
bluecore:~# ldconfig ldconfig: Cannot mmap file /usr/lib/libartsmidi.so.0.0.0. ldconfig: Cannot lstat /usr/lib/libartsgui.so.0.0.0: Input/output error ldconfig: Cannot lstat /usr/lib/libpangox-1.0.so.0.2002.3: Input/output error ldconfig: Cannot mmap file /usr/lib/libwine.so.1. ldconfig: Cannot mmap file /usr/lib/libpangox-1.0.so. ldconfig: Cannot lstat /usr/lib/libpangoft2-1.0.so.0.2002.3: Input/output error ldconfig: Cannot lstat /usr/lib/libakregatorprivate.so: Input/output error ldconfig: Cannot mmap file /usr/lib/libartsmidi.so.0. ldconfig: Cannot mmap file /usr/lib/libpangoxft-1.0.so.0.2002.3. ldconfig: Cannot lstat /usr/lib/libartsgui_idl.so.0.0.0: Stale NFS file handle ldconfig: Cannot lstat /usr/lib/libartsbuilder.so.0.0.0: Input/output error ldconfig: Cannot mmap file /usr/lib/libpangoft2-1.0.so.0. ldconfig: /usr/lib/libwine.so.1 is not a symbolic linkThe 188.8.131.52 kernel was not amused by that attempt:
Apr 21 00:21:56 bluecore kernel: init_special_inode: bogus i_mode (3000) Apr 21 00:22:44 bluecore kernel: init_special_inode: bogus i_mode (473) Apr 21 00:26:40 bluecore kernel: init_special_inode: bogus i_mode (55000) Apr 21 00:26:40 bluecore kernel: init_special_inode: bogus i_mode (0) Apr 21 00:26:40 bluecore kernel: init_special_inode: bogus i_mode (71165)
My thoughts at that moment: WHAT THE F***** HELL HAS HAPPENED!!!?!?! Considering that I hadn't run Wine for months, and it wasn't running at that moment either... I ran some ls's to see what had become of those libraries... it wasn't pretty. They had become character, block devices and FIFOs. I rebooted, and fsck still ignored the "clean" /usr (/dev/sda5) filesystem until I forced it to check it all filesystems again by setting their mount counts to some funny values using tune2fs.
Indeed, /usr was corrupted.
After init dropped me to a emergency console to run fsck manually on sda5, running e2fsck threw some interesting stuff, which I forgot to save to a file, but it was something like this:
inode ####### has 'compressed' flag set on an unsupported filesystem inode ####### has invalid size inode ####### (fifo) has non-zero size inode ######## has 'compressed' flag set on an unsupported filesystem (ad infinitum, ad nauseam)
After it finished, I noticed that the /usr/lost+found directory got populated with files and directories resembling parts of the Python 2.4 and 2.5 foundations installed. Of course, missing entire directories is a really bad sign. But I could recover the affect packages I noticed, with help of apt-get install --reinstall. Those where anything aRts, Python, wine, Pango and Akregator (which I actually remembered to reinstall right now, while writing this), although I reinstalled the Samba server I had installed just some hours before the crash, just in case. Apparently I didn't lose anything else. None of the other filesystems were affected by whatever smashed parts of /usr.
The obvious course of action then was blacklisting drm and radeon, and removing the
Option "AccelMethod" "exa" line in xorg.conf, and nuking the 2.6.30-rc2 kernel image just in case. Eclipse and artsd worked again, and nothing else has shown any symptoms of being affected by fsck's decisions since then. It's been just 5 days though, and I have many apps lying around that I only use when asked/forced to, so I don't even remember their names.
I didn't say goodbye to kompmgr, however; it has a pretty decent performance and marginal CPU usage even while using a shadow framebuffer, with just one CPU core enabled at half of its maximum frequency, and even if the system is under load, so I figured that I really didn't need DRI or EXA all the time (radeonhd still doesn't have a opengl implementation, so having DRI enabled made no difference for clients) Still, it was nice to be able to scroll a huge konsole or Iceweasel/Firefox window without experiencing flickering.