Home Forums OS X Server and Client Discussion Xsan XSan crashing randomly, can’t cvfsck

This topic contains 2 replies, has 2 voices, and was last updated by  garges 10 years, 5 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #371997

    garges
    Participant

    Got 5 XServe RAIDS connected via QLogic 5200 to 6 XServes: four G4s, one G5 (PMDC), and one Intel. Setup mostly works but every couple days one of the servers will crash then reboot and all is normal again.

    After the reboot I get the dialog to send a bug report to Apple. It seems to implicate kernel module for acfs. It reads:
    [code]
    panic(cpu 1 caller 0x001A4A55): Unresolved kernel trap (CPU 1, Type 14=page fault), registers:
    CR0: 0x8001003b, CR2: 0x071d5000, CR3: 0x00d78000, CR4: 0x000006e0
    EAX: 0x47a6b580, EBX: 0x00000000, ECX: 0x00000006, EDX: 0x00000048
    CR2: 0x071d5000, EBP: 0x47a6b6c8, ESI: 0x071d5000, EDI: 0x47a6b5b0
    EFL: 0x00010206, EIP: 0x00196316, CS: 0x00000008, DS: 0x8ded0010

    Backtrace, Format – Frame : Return Address (4 potential args on stack)
    0x47a6b328 : 0x128d08 (0x3cc0a4 0x47a6b34c 0x131de5 0x0)
    0x47a6b368 : 0x1a4a55 (0x3d24b8 0x1 0xe 0x3d1cdc)
    0x47a6b478 : 0x19aeb4 (0x47a6b490 0x47a6b498 0x47a6b4d8 0x6)
    0x47a6b6c8 : 0x870ce2 (0x71c80cc 0x71c80cc 0x1c 0x33e68c)
    0x47a6b718 : 0x871afc (0x1 0xdda4a242 0x6c 0x71d4f80)
    0x47a6b7d8 : 0x873576 (0x6a12d00 0x2 0x1 0x47a6b840)
    0x47a6b868 : 0x1e4c09 (0x47a6b88c 0x71c2084 0x47a6bdcc 0x682c000)
    0x47a6b8b8 : 0x1e7901 (0x71c2000 0x47a6ba2c 0x47a6bf08 0x0)
    0x47a6b918 : 0x1d367d (0x71c2000 0x47a6ba2c 0x47a6bf08 0x6a6)
    0x47a6bb68 : 0x32543d (0x5f94604 0x0 0x880 0x47a6bf08)
    0x47a6bba8 : 0x1d0b6d (0x5fcf204 0x5f94604 0x880 0x47a6bf08)
    0x47a6bbf8 : 0x1e0133 (0x71c2000 0x0 0x880 0x47a6bf08)
    0x47a6bc38 : 0x1d921b (0x71c2000 0x47a6bcec 0x0 0x47a6bf08)
    0x47a6bd68 : 0x1d9636 (0x409480 0x0 0x0 0x0)
    0x47a6bf28 : 0x1d96cb (0x409480 0x0 0x0 0x0)
    0x47a6bf58 : 0x37ad83 (0x61ab1f4 0x667c530 0x667c574 0x0) Backtrace continues…
    Kernel loadable modules in backtrace (with dependencies):
    com.apple.filesystems.acfs(2.7.50)@0x863000
    dependency: com.apple.iokit.IOStorageFamily(1.5.1)@0x519000

    Kernel version:
    Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386

    Model: Xserve1,1, BootROM XS11.0080.B01, 4 processors, Dual-Core Intel Xeon, 2 GHz, 4 GB
    Graphics: ATI Radeon X1300, ATY,RadeonX1300, PCIe, 64 MB
    Memory Module: BRANCH 0 CHANNEL 0/DIMM 1, 1 GB, DDR2 FB-DIMM, 667 MHz
    Memory Module: BRANCH 0 CHANNEL 1/DIMM 2, 1 GB, DDR2 FB-DIMM, 667 MHz
    Memory Module: BRANCH 1 CHANNEL 0/DIMM 3, 1 GB, DDR2 FB-DIMM, 667 MHz
    Memory Module: BRANCH 1 CHANNEL 1/DIMM 4, 1 GB, DDR2 FB-DIMM, 667 MHz
    Network Service: Built-in Ethernet 1, Ethernet, en0
    Network Service: Built-in Ethernet 2, Ethernet, en1
    PCI Card: ATY,RadeonX1300, Display, Mezzanine
    PCI Card: pci1000,646, sppci_fibrechannel, Slot-2
    PCI Card: pci1000,646, sppci_fibrechannel, Slot-2
    Parallel ATA Device: MATSHITACD-RW CW-8124
    Fibre Channel Device: SCSI Target Device @ 1
    Fibre Channel Device: SCSI Target Device @ 3
    Fibre Channel Device: SCSI Target Device @ 5
    Fibre Channel Device: SCSI Target Device @ 6
    Fibre Channel Device: SCSI Target Device @ 8
    Fibre Channel Device: SCSI Target Device @ 1
    Fibre Channel Device: SCSI Target Device @ 3
    Fibre Channel Device: SCSI Target Device @ 5
    Fibre Channel Device: SCSI Target Device @ 6
    Fibre Channel Device: SCSI Target Device @ 8
    USB Device: Frontpanel Controller, Apple Computer, Up to 12 Mb/sec, 500 mA
    FireWire Device: built-in_hub, unknown_value, Unknown
    [/code]

    Nothing in syslog after a crash.

    So I suspected metadata corruption and ran cvfsck -n . It sat there for several days and did nothing. Produced no output at all. Tried it on a different XSan volume name and it worked fine.

    So what to try next?

    #376191

    torona318
    Participant

    Do you have a backup Metadata controller?
    When you ran the command to check the volume did you do this?
    cd /Library/Filesystems/Xsan/bin
    sudo ./cvfsck -nv

    -Thomas

    #376422

    garges
    Participant

    Yes we do have a backup meta data controller.

    Since this was posted we’ve narrowed things down a bit. Got cvfsck to run, output checked by Apple engineer and looks fine. Upgraded all systems to 10.5.7. Determined that crashes happen during backups and only when the Intel box is doing backup of one of the XSan volumes. G4s can do the XSan backup just fine.

    Usually now the MDC crashes and the backup MDC takes over about 15 minutes after backups start. Then the next night the secondary MDC crashes and the primary one takes over. So we ping pong back and forth every night.

    Some nights the crash is worse and takes down the entire SAN, forcing a manual reboot.

    We turned on a logging option in the backup program (Bacula). That slowed down backups a lot, 45 minutes up to 3 hours. And crashes are less frequent, couple times a week instead of every night.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

Comments are closed