XSan crashing randomly, can’t cvfsck
Got 5 XServe RAIDS connected via QLogic 5200 to 6 XServes: four G4s, one G5 (PMDC), and one Intel. Setup mostly works but every couple days one of the servers will crash then reboot and all is normal again.
After the reboot I get the dialog to send a bug report to Apple. It seems to implicate kernel module for acfs. It reads:
[code]
panic(cpu 1 caller 0x001A4A55): Unresolved kernel trap (CPU 1, Type 14=page fault), registers:
CR0: 0x8001003b, CR2: 0x071d5000, CR3: 0x00d78000, CR4: 0x000006e0
EAX: 0x47a6b580, EBX: 0x00000000, ECX: 0x00000006, EDX: 0x00000048
CR2: 0x071d5000, EBP: 0x47a6b6c8, ESI: 0x071d5000, EDI: 0x47a6b5b0
EFL: 0x00010206, EIP: 0x00196316, CS: 0x00000008, DS: 0x8ded0010
Backtrace, Format - Frame : Return Address (4 potential args on stack)
0x47a6b328 : 0x128d08 (0x3cc0a4 0x47a6b34c 0x131de5 0x0)
0x47a6b368 : 0x1a4a55 (0x3d24b8 0x1 0xe 0x3d1cdc)
0x47a6b478 : 0x19aeb4 (0x47a6b490 0x47a6b498 0x47a6b4d8 0x6)
0x47a6b6c8 : 0x870ce2 (0x71c80cc 0x71c80cc 0x1c 0x33e68c)
0x47a6b718 : 0x871afc (0x1 0xdda4a242 0x6c 0x71d4f80)
0x47a6b7d8 : 0x873576 (0x6a12d00 0x2 0x1 0x47a6b840)
0x47a6b868 : 0x1e4c09 (0x47a6b88c 0x71c2084 0x47a6bdcc 0x682c000)
0x47a6b8b8 : 0x1e7901 (0x71c2000 0x47a6ba2c 0x47a6bf08 0x0)
0x47a6b918 : 0x1d367d (0x71c2000 0x47a6ba2c 0x47a6bf08 0x6a6)
0x47a6bb68 : 0x32543d (0x5f94604 0x0 0x880 0x47a6bf08)
0x47a6bba8 : 0x1d0b6d (0x5fcf204 0x5f94604 0x880 0x47a6bf08)
0x47a6bbf8 : 0x1e0133 (0x71c2000 0x0 0x880 0x47a6bf08)
0x47a6bc38 : 0x1d921b (0x71c2000 0x47a6bcec 0x0 0x47a6bf08)
0x47a6bd68 : 0x1d9636 (0x409480 0x0 0x0 0x0)
0x47a6bf28 : 0x1d96cb (0x409480 0x0 0x0 0x0)
0x47a6bf58 : 0x37ad83 (0x61ab1f4 0x667c530 0x667c574 0x0) Backtrace continues...
Kernel loadable modules in backtrace (with dependencies):
com.apple.filesystems.acfs(2.7.50)@0x863000
dependency: com.apple.iokit.IOStorageFamily(1.5.1)@0x519000
Kernel version:
Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386
Model: Xserve1,1, BootROM XS11.0080.B01, 4 processors, Dual-Core Intel Xeon, 2 GHz, 4 GB
Graphics: ATI Radeon X1300, ATY,RadeonX1300, PCIe, 64 MB
Memory Module: BRANCH 0 CHANNEL 0/DIMM 1, 1 GB, DDR2 FB-DIMM, 667 MHz
Memory Module: BRANCH 0 CHANNEL 1/DIMM 2, 1 GB, DDR2 FB-DIMM, 667 MHz
Memory Module: BRANCH 1 CHANNEL 0/DIMM 3, 1 GB, DDR2 FB-DIMM, 667 MHz
Memory Module: BRANCH 1 CHANNEL 1/DIMM 4, 1 GB, DDR2 FB-DIMM, 667 MHz
Network Service: Built-in Ethernet 1, Ethernet, en0
Network Service: Built-in Ethernet 2, Ethernet, en1
PCI Card: ATY,RadeonX1300, Display, Mezzanine
PCI Card: pci1000,646, sppci_fibrechannel, Slot-2
PCI Card: pci1000,646, sppci_fibrechannel, Slot-2
Parallel ATA Device: MATSHITACD-RW CW-8124
Fibre Channel Device: SCSI Target Device @ 1
Fibre Channel Device: SCSI Target Device @ 3
Fibre Channel Device: SCSI Target Device @ 5
Fibre Channel Device: SCSI Target Device @ 6
Fibre Channel Device: SCSI Target Device @ 8
Fibre Channel Device: SCSI Target Device @ 1
Fibre Channel Device: SCSI Target Device @ 3
Fibre Channel Device: SCSI Target Device @ 5
Fibre Channel Device: SCSI Target Device @ 6
Fibre Channel Device: SCSI Target Device @ 8
USB Device: Frontpanel Controller, Apple Computer, Up to 12 Mb/sec, 500 mA
FireWire Device: built-in_hub, unknown_value, Unknown
[/code]
Nothing in syslog after a crash.
So I suspected metadata corruption and ran cvfsck -n . It sat there for several days and did nothing. Produced no output at all. Tried it on a different XSan volume name and it worked fine.
So what to try next?