Home Forums OS X Server and Client Discussion Questions and Answers xserve spontaneously restarts

Viewing 15 posts - 16 through 30 (of 45 total)
  • Author
    Posts
  • #361897
    embee
    Participant

    D’oh! In that huge post, I did neglect two other significant troubleshooting steps taken before I went to the Apple store. I did zap PRAM and reset the PMU.

    #361995
    Anonymous
    Guest

    Hi, we get this same problem, an Xserve G4 sometimes reboots itself, sometimes locks up solid. it is on a UPS, which i will look at (previous post) Just adding wieght to this thread really. All this machine is doing is file sharing, all beit a rather non standard sort (one generic user which every one, upto 50, logs in as and shares a whole partition, 20 GB, thought that might be the problem) Would be nice to find a simple answer. Rather than configure a wndoze box to do the job.

    #361996
    chiefgeek
    Participant

    FWIW, we have both our Xserve G4s on a APC 3000VA UPS. While I don’t want to say out loud what the machines HAVEN’T done for awhile (restarted themselves), they were doing so while connected to the UPS.

    #363367
    Anonymous
    Guest

    Have you guys filed bugreports with apple ? (bugreport.apple.com). I believe the -122 error is a power supply issue, and -93 is a watchdog reset, so the system wedged for some reason.

    #363374
    Anonymous
    Guest

    I’ve been seeing this recently (I have panics in my logs 4 times over the past month — coincidentally right around when the Fall term has started here at UMich.)

    I’ll submit a bug report, but I’m wonder if it’s some kind of SMB overload (I’m running 10.4.2 server here and FM 7 Server on it), but it’s been rock solid other than in the past month.

    Server Monitor doesn’t say anything about the problem…

    #363802
    vudutu
    Participant

    It looks like my server is developing a chronic crash condition and
    slipping into never never lan. My usually docile OSX rack server (Dual 2 gig g5 running 10.4.2) is puking all over itself, it restarts itself and this error shows up in the var/system.log, no hints in crashguard. This has happened three times this week, no recent changes in config.

    This is a pretty simple setup, AFP and Open Directory are the only
    services running. A couple of hundred users total, only about 20 on each
    time it crashed.

    Oct 24 12:30:24 localhost kernel[0]: ApplePMU:RazzMU
    FORCED SHUTDOWN, CAUSE = -93

    A Google search bring up a number of incidents, none with conclusive
    solutions.

    From the looks of the logs and the time that the Apple File Server
    crashed then took down the OS, one minute I was watching AFP server go into the ozone the next minute the screen went to black and the white on black “you must restart” error came up. Note the last line of the system log “Server crashed and exited with status 11”.

    The Panic.log is the most interesting, all four crashes are almost identical except for the Exception state.

    I have Googled the daylights out of this problem and I am posting here because this post looks very similar to my problem.
    Any ideas?

    Note! Panic, system and crash logs are below
    Thanks
    Craig

    ************Here is the panic log************

    Mon Sep 26 16:07:56 2005
    panic(cpu 0 caller 0x000E6D70): vnode_put(3ac7738): iocount < 1
    Latest stack backtrace for cpu 0:
    Backtrace:
    0x00095544 0x00095A5C 0x0002683C 0x000E6D70 0x000E6D1C 0x000D3798
    0x002A7A94 0x000ABCB0
    0x093C7374
    Proceeding back via exception chain:
    Exception state (sv=0x448AB500)
    PC=0x9001BF20; MSR=0x0000F030; DAR=0x4614FFBA; DSISR=0x40000000;
    LR=0x90B0BA6C; R1=0xF12A28C0; XCP=0x00000030 (0xC00 – System call)

    Kernel version:
    Darwin Kernel Version 8.2.0: Fri Jun 24 17:46:54 PDT 2005;
    rootAngrynu-792.2.4.obj~3/RELEASE_PPC
    *********

    Thu Oct 20 17:21:42 2005
    panic(cpu 0 caller 0x000E6D70): vnode_put(3ac7738): iocount < 1
    Latest stack backtrace for cpu 0:
    Backtrace:
    0x00095544 0x00095A5C 0x0002683C 0x000E6D70 0x000E6D1C 0x000D3798
    0x002A7A94 0x000ABCB0
    0x093C7374
    Proceeding back via exception chain:
    Exception state (sv=0x448AB500)
    PC=0x9001BF20; MSR=0x0000F030; DAR=0x4614FFBA; DSISR=0x40000000;
    LR=0x90B0BA6C; R1=0xF12A28C0; XCP=0x00000030 (0xC00 – System call)

    Kernel version:
    Darwin Kernel Version 8.2.0: Fri Jun 24 17:46:54 PDT 2005;
    rootAngrynu-792.2.4.obj~3/RELEASE_PPC
    *********

    Mon Oct 24 12:30:28 2005
    panic(cpu 0 caller 0x000E6D70): vnode_put(339ce70): iocount < 1
    Latest stack backtrace for cpu 0:
    Backtrace:
    0x00095544 0x00095A5C 0x0002683C 0x000E6D70 0x000E6D1C 0x000D3798
    0x002A7A94 0x000ABCB0
    0xFFFFFFFF
    Proceeding back via exception chain:
    Exception state (sv=0x4A59CC80)
    PC=0x9001BF20; MSR=0x0000F030; DAR=0xE077B000; DSISR=0x42000000;
    LR=0x90B0BA6C; R1=0xF068A8C0; XCP=0x00000030 (0xC00 – System call)

    Kernel version:
    Darwin Kernel Version 8.2.0: Fri Jun 24 17:46:54 PDT 2005;
    rootAngrynu-792.2.4.obj~3/RELEASE_PPC
    *********

    Tue Oct 25 11:39:33 2005
    panic(cpu 1 caller 0x000E6D70): vnode_put(42b5840): iocount < 1
    Latest stack backtrace for cpu 1:
    Backtrace:
    0x00095544 0x00095A5C 0x0002683C 0x000E6D70 0x000E6D1C 0x000D3798
    0x002A7A94 0x000ABCB0
    0x7BD04360
    Proceeding back via exception chain:
    Exception state (sv=0x449E0C80)
    PC=0x9001BF20; MSR=0x0000D030; DAR=0x00787000; DSISR=0x42000000;
    LR=0x90B0BA6C; R1=0xF0B948C0; XCP=0x00000030 (0xC00 – System call)

    Kernel version:
    Darwin Kernel Version 8.2.0: Fri Jun 24 17:46:54 PDT 2005;
    rootAngrynu-792.2.4.obj~3/RELEASE_PPC
    *********

    ************NOTE THIS TEXT IS FROM – From Apple File server crash log************

    Date/Time: 2005-10-25 11:40:05.755 -0400
    OS Version: 10.4.2 (Build 8C47)
    Report Version: 3
    Command: AppleFileServer
    Path: /usr/sbin/AppleFileServer
    Parent: AppleFileServer [252]
    Version: ??? (???)
    PID: 253
    Thread: 23
    Exception: EXC_BAD_ACCESS (0x0001)
    Codes: KERN_INVALID_ADDRESS (0x0001) at 0x007dd000

    ************NOTE THIS TEXT IS FROM- system.log************

    Oct 25 11:39:26 localhost kernel[0]: standard timeslicing quantum is 10000
    us
    Oct 25 11:39:25 localhost mDNSResponder-107 (Mar 20 2005 20: 31:47)[46]:
    starting
    Oct 25 11:39:26 localhost kernel[0]: vm_page_bootstrap: 509506 free pages
    Oct 25 11:39:26 localhost lookupd[42]: lookupd (version 365) starting –
    Tue Oct 25 11:39:26 2005
    Oct 25 11:39:26 localhost kernel[0]: mig_table_max_displ = 70
    Oct 25 11:39:26 localhost kernel[0]: 98 prelinked modules
    Oct 25 11:39:26 localhost kernel[0]: Copyright (c) 1982, 1986, 1989, 1991,
    1993
    Oct 25 11:39:26 localhost kernel[0]: The Regents of the University of
    California. All rights reserved.
    Oct 25 11:39:26 localhost watchdogtimerd: Automatic reboot timer enabled.
    Oct 25 11:39:26 localhost kernel[0]: using 5242 buffer headers and 4096
    cluster IO buffer headers
    Oct 25 11:39:26 localhost kernel[0]: DART enabled
    Oct 25 11:39:26 localhost kernel[0]: MacRISC4CPU: publishing BootCPU
    Oct 25 11:39:26 localhost kernel[0]: Enabling ECC Error Notifications
    Oct 25 11:39:26 localhost kernel[0]: FireWire (OHCI) Apple ID 42 built-in
    now active, GUID 000d93ff feb125e2; max speed s800.
    Oct 25 11:39:26 localhost kernel[0]: Security auditing service present
    Oct 25 11:39:26 localhost kernel[0]: BSM auditing present
    Oct 25 11:39:26 localhost kernel[0]: disabled
    Oct 25 11:39:26 localhost kernel[0]: rooting via boot-uuid from /chosen:
    21CC91DC-BFE5-34E9-896D-BD5111EAA88C
    Oct 25 11:39:26 localhost kernel[0]: Waiting on ID=”0″>IOProviderClass ID=”1″>IOResourcesIOResourceMatch ID=”2″>boot-uuid-media
    Oct 25 11:39:26 localhost kernel[0]: Got boot device =
    IOService:/MacRISC4PE/ht@0,f2000000/AppleMacRiscHT/pci@7/IOPCI2PCIBridge/k2-sata-root@C/AppleK2SATARoot/k2-sata@0/AppleK2SATA/ATADeviceNub@0/IOATABlockStorageDriver/IOATABlockStorageDevice/IOBlockStorageDriver/Hitachi
    HDS722525VLSA80 Media/IOApplePartitionScheme/Apple_HFS_Untitled_1@3
    Oct 25 11:39:26 localhost kernel[0]: BSD root: disk0s3, major 14, minor 4
    Oct 25 11:39:26 localhost kernel[0]: jnl: replay_journal: from: 7272448
    to: 20358144 (joffset 0x750000)
    Oct 25 11:39:26 localhost kernel[0]: hfs mount: enabling extended security
    on SystemX
    Oct 25 11:39:26 localhost kernel[0]: HFS: Removed 3 orphaned unlinked
    files
    Oct 25 11:39:26 localhost kernel[0]: Jettisoning kernel linker.
    Oct 25 11:39:26 localhost kernel[0]: Resetting IOCatalogue.
    Oct 25 11:39:26 localhost kernel[0]: Matching service count = 0
    Oct 25 11:39:26 localhost kernel[0]: Matching service count = 11
    Oct 25 11:39:26 localhost kernel[0]: Matching service count = 11
    Oct 25 11:39:26 localhost kernel[0]: Matching service count = 11
    Oct 25 11:39:26 localhost kernel[0]: Matching service count = 11
    Oct 25 11:39:26 localhost kernel[0]: AppleRS232Serial: 44247020 80013020
    chip base, virtual, physical
    Oct 25 11:39:26 localhost kernel[0]: IOPlatformControl::registerDriver
    Control Driver AppleSlewClock did not supply target-value, using default
    Oct 25 11:39:26 localhost kernel[0]: BCM5701Enet: Ethernet address
    00:0d:93:9c:1e:d4
    Oct 25 11:39:26 localhost kernel[0]: BCM5701Enet: Ethernet address
    00:0d:93:9c:1e:d5
    Oct 25 11:39:26 localhost lookupd[61]: lookupd (version 365) starting –
    Tue Oct 25 11:39:26 2005
    Oct 25 11:39:26 localhost kernel[0]: jnl: replay_journal: from: 34000896
    to: 34100224 (joffset 0xe90000)
    Oct 25 11:39:26 localhost diskarbitrationd[36]: disk3s3 hfs
    F36002D8-C85F-3F28-9EFD-F1E2DE126898 BigDisk465
    /Volumes/BigDisk465
    Oct 25 11:39:26 localhost kernel[0]: jnl: replay_journal: from: 14544896
    to: 5208576 (joffset 0x750000)
    Oct 25 11:39:26 localhost kernel[0]: jnl: replay_journal: from: 4548096
    to: 22191104 (joffset 0x750000)
    Oct 25 11:39:27 localhost diskarbitrationd[36]: disk0s3 hfs
    21CC91DC-BFE5-34E9-896D-BD5111EAA88C SystemX /
    Oct 25 11:39:28 localhost kernel[0]: AppleBCM5701Ethernet – en0 link
    active, 100-Mbit, full duplex, no flow control
    Oct 25 11:39:28 localhost configd[34]: AppleTalk startup
    Oct 25 11:39:28 myserversname configd[34]: setting hostname to
    “aacxserve.local”
    Oct 25 11:39:28 myserversname kernel[0]: VTXP: vram [94000000:02000000]
    Oct 25 11:39:28 myserversname servermgrd: cupsd’s bootstrap server port
    not found
    Oct 25 11:39:28 myserversname servermgrd: cupsd’s bootstrap server port
    not found
    Oct 25 11:39:28 myserversname servermgrd: cupsd’s bootstrap server port
    not found
    Oct 25 11:39:28 myserversname servermgrd: cupsd’s bootstrap server port
    not found
    Oct 25 11:39:29 myserversname
    /System/Library/CoreServices/loginwindow.app/Contents/MacOS/loginwindow:
    Login Window Application Started
    Oct 25 11:39:29 myserversname mDNSResponder: Adding browse domain local.
    Oct 25 11:39:29 myserversname loginwindow[97]: Login Window Started
    Security Agent
    Oct 25 11:39:29 myserversname myserversname
    /System/Library/CoreServices/mcxd.app/Contents/MacOS/mcxd:
    DSGetLocallyHostedNodeNames(): dsFindDirNode() == -14008
    Oct 25 11:39:29 myserversname
    /System/Library/CoreServices/mcxd.app/Contents/MacOS/mcxd:
    DSGetSearchPath(): DSGetLocallyHostedNodeNames() == -14956
    Oct 25 11:39:29 myserversname
    /System/Library/CoreServices/mcxd.app/Contents/MacOS/mcxd:
    DSGetCurrentConfigInfo(): DSGetSearchPath() == -14956
    Oct 25 11:39:29 myserversname
    /System/Library/CoreServices/mcxd.app/Contents/MacOS/mcxd:
    DSGetCacheInfo(): DSGetCurrentConfigInfo() == -14956
    Oct 25 11:39:29 myserversname
    /System/Library/CoreServices/mcxd.app/Contents/MacOS/mcxd: ***
    MCXD.getComputerInfo: Couldn’t get cache info -14956
    Oct 25 11:39:29 myserversname kernel[0]: AppleBCM5701Ethernet – en0 link
    active, 100-Mbit, full duplex, flow control enabled
    Oct 25 11:39:31 myserversname kernel[0]: hfs mount: enabling extended
    security on Faculty
    Oct 25 11:39:31 myserversname diskarbitrationd[36]: disk1s3 hfs
    220748BD-0F19-32A4-9F23-89C4B108BE16 Faculty
    /Volumes/Faculty
    Oct 25 11:39:31 myserversname launchd: Server 0 in bootstrap 1103 uid 0:
    “/usr/sbin/lookupd”[61]: exited abnormally: Hangup
    Oct 25 11:39:31 myserversname configd[34]: executing
    /System/Library/SystemConfiguration/Kicker.bundle/Contents/Resources/enable-network
    Oct 25 11:39:31 myserversname configd[34]: posting notification
    com.apple.system.config.network_change
    Oct 25 11:39:31 myserversname lookupd[109]: lookupd (version 365) starting
    – Tue Oct 25 11:39:31 2005
    Oct 25 11:39:31 myserversname configd[34]: setting hostname to
    “aacxserve.artacademy.edu”
    Oct 25 11:39:36 myserversname kernel[0]: hfs mount: enabling extended
    security on Students
    Oct 25 11:39:36 myserversname diskarbitrationd[36]: disk2s3 hfs
    1B4B1971-43C2-3C99-B037-18B30AE34BD9 Students
    /Volumes/Students
    Oct 25 11:39:37 myserversname mDNSResponder: ERROR: Only name server
    claiming responsibility for “_kerberos.aacxserve.” is “.”!
    Oct 25 11:39:38 myserversname configd[34]: target=enable-network:
    disabled
    Oct 25 11:39:40 myserversname /usr/sbin/serialnumberd[201]: serialnumberd:
    Firewall rule #1 added to allow port 626.
    Oct 25 11:39:43 myserversname /usr/sbin/serveradmin:
    servermgr_ipfilter:ipfw config:NoticeBig Grinisabled firewall
    Oct 25 11:39:50 myserversname configd[34]: AppleTalk startup failed,
    status = 92 (retrying)
    Oct 25 11:39:51 myserversname configd[34]: AppleTalk startup
    Oct 25 11:39:53 myserversname DirectoryService[41]: Failed Authentication
    return is being delayed due to over five recent auth failures for
    username: myserveradmin.
    Oct 25 11:39:57 myserversname configd[34]: AppleTalk startup complete
    Oct 25 11:40:06 myserversname crashdump[283]: AppleFileServer crashed
    Oct 25 11:40:07 myserversname crashdump[283]: crash report written to:
    /Library/Logs/CrashReporter/AppleFileServer.crash.log
    Oct 25 11:40:07 myserversname /usr/sbin/AppleFileServer: Server crashed
    and exited with status 11.

    #364057
    Anonymous
    Guest

    if you are getting error -93, I think the process
    LSRecentTool
    is hanging the computer. This process seems to steal all processor time. If this happens, then watchdog will restart the server

    [QUOTE BY= Dof] Hi,

    I was wondering if someone on AFP548 already found a cause for the spontaneous reboots of their x-serve’s.

    We experience the same thing and all lead to the PMU manager in combination with watchdog. Some users recommended to disable the “reboot when computer boots” option but for me this is no option because the server is located on another location and the downtime this brings is unacceptable.

    I notitced a syslog line after a reboot:

    ApplePMU:RazzMU FORCED SHUTDOWN, CAUSE = -92

    Anyone has information on the ApplePMU cause codes ? i’ve tried every dark corner on the WWW but no information on this.

    The problem occured after one month after installation with no changes in this period (except for relocating the server to another location).

    The server is connected to an UPS (APC) without the usb cable connected, other servers connected to this ups don’t have this problem(serveral Proliant servers and an other x-serve G5).

    Could the spontaneous reboot have something to do with a defective power supply (although i can’t see any strange power behavior in Server Monitor) ?

    The server doesn’t have a SCSI adapter but the Apple FC card installed which is used to connect to a FC switch.

    Greetings Dof
    [/QUOTE]

    #364709
    Anonymous
    Guest

    Well, we have a dual G5 xserve which *had* been working fine until a few days ago. We installed Tiger over the Christmas break (break?Smile ). Since then we have had unexpected restarts. Yes, we’re behind a UPS, but so were we last year.

    (On another subject, I have a Dual 2.GHz G5 PowerMac at home that won’t even boot. Actually, I tested another instance of the same model and it won’t boot here either. I have a 20amp circuit, electrician tells me everything is fine. Anyway APC told me that they have often seen this problem and have been able to solve it for their customers with a 1500.)

    #364710
    staze
    Participant

    I have seen this issue with several xserves. 1 was due to a faulty NIC (so I thought, until I found out it was the riser board), and the other was bad ram, even though I tested the ram several times before, I finally was out of options and pulled it. No problems since.

    Try a ram swap… and remove all the PCI cards save video. Then start replacing other things… I had to go through a logicboard replacement before I finally tried the ram that I swore was good.

    P.S. can also cause these issues if there is a drop in the 3.3V or 5V line.

    Good luck!

    #364855
    peterthorn
    Participant

    I have also had a lot of -93 errors within the last couple of months. It topped a few days ago with more than 40 restarts in two days. The users (fortunately only 4) was, of course, on a tight deadline, so I couldn’t take it down for long.
    I installed a clean 10.4 on a firewire drive and made the server boot from that (it only runs AFP service) – obviously, only as a temp solution.
    It has, however, rebooted once since with the -93 error 🙁

    It’s a dual G5 xserve, had a atto scsi card, but it has been removed.
    I think the ram swap will be the next move (even though it has reported no errors).

    Peter

    #365016
    Anonymous
    Guest

    "Oct 25 11:39:28 myserversname servermgrd: cupsd’s bootstrap server port",
    this is the heart of the issue. ‘myservername’ isnt the name of the server, and now that all the fields are full and set with the real server name, not ‘myservername’, its seems nearly almost toatally impossible to change a setting somehow somwhwere got missed. now how do you change it?
    probably cant start up server admin util either?
    i broke my server too….
    wonderin tho really if this sounds familiar?
    the last time i ran the server admin util, i got a big red serial number error. !!!!
    should i note the truth in the non-validity of my osx serial # ?
    iver nerver been able to confirm what i try not to think, but twice this has happened to me, and twice the same way, only to be repaired, by my limited skill and patients, by a new install….
    my conclusion thusfar has been to not take part in the "Active Directory" fun, at least not untill im driving a leagal rockit ship.
    i learned this — i can build my own domain and name it. if i dont go out, nobody will see me. my directory is WAY more active anyway…

    #365450
    MacDave
    Participant

    I’ve been having a very consistent and inexplicable reboot of my xserve G5, which seems to happen at least a few times per week. Retrospect 6.1.x is running on the machine, and backing up hundreds of Gigs to a FireWire 800 Exabyte VXA2 Packetloader 10 slot library (Josh: you mentioned these are often the cause?).

    I turned off the ‘Restart automatically [after power failure | if the computer freezes]’ options in energy saver, and instead of reboots, the machine now just powers off.

    Panic.log shows the following after every crash, which seems to implicate the packetloader:

    Thu Feb 16 07:19:04 2006

    Unresolved kernel trap(cpu 0): 0x400 – Inst access DAR=0x00000000E00FF000 PC=0x0000000000000000
    Latest crash info for cpu 0:
    Exception state (sv=0x326A1A00)
    PC=0x00000000; MSR=0x40009030; DAR=0xE00FF000; DSISR=0x40000000; LR=0x0076713C; R1=0x1C27BB70; XCP=0x00000010 (0x400 – Inst access)
    Backtrace:
    0x00767108 0x003BECC0 0x003BEEE0 0x32ABCF08 0x0045FC84 0x00665408 0x00658844 0x00267FCC
    0x00266EAC 0x00266E10
    Kernel loadable modules in backtrace (with dependencies):
    com.apple.iokit.IOFireWireSerialBusProtocolTransport(1.3.1)@0x32abb000
    dependency: com.apple.iokit.IOFireWireSBP2(1.6.2)@0x32aa1000
    dependency: com.apple.iokit.IOSCSIArchitectureModelFamily(1.3.9)@0x3b7000
    dependency: com.apple.iokit.IOFireWireFamily(1.8.8)@0x434000
    com.apple.iokit.SCSITaskUserClient(1.3.9)@0x762000
    dependency: com.apple.iokit.IOStorageFamily(1.3.4)@0x62b000
    dependency: com.apple.iokit.IOSCSIArchitectureModelFamily(1.3.9)@0x3b7000
    com.apple.driver.AppleFWOHCI(2.2.10)@0x653000
    dependency: com.apple.iokit.IOPCIFamily(1.4)@0x398000
    dependency: com.apple.iokit.IOFireWireFamily(1.8.8)@0x434000
    com.apple.iokit.IOFireWireFamily(1.8.8)@0x434000
    com.apple.iokit.IOSCSIArchitectureModelFamily(1.3.9)@0x3b7000
    Proceeding back via exception chain:
    Exception state (sv=0x326A1A00)
    previously dumped as "Latest" state. skipping…
    Exception state (sv=0x3218FA00)
    PC=0x00000000; MSR=0x0000D030; DAR=0x00000000; DSISR=0x00000000; LR=0x00000000; R1=0x00000000; XCP=0x00000000 (Unknown)

    Kernel version:
    Darwin Kernel Version 7.9.0:
    Wed Mar 30 20:11:17 PST 2005; root:xnu/xnu-517.12.7.obj~1/RELEASE_PPC

    panic(cpu 0): 0x400 – Inst access
    Latest stack backtrace for cpu 0:
    Backtrace:
    0x00083498 0x0008397C 0x0001EDA4 0x00090C38 0x0009402C
    Proceeding back via exception chain:
    Exception state (sv=0x326A1A00)
    PC=0x00000000; MSR=0x40009030; DAR=0xE00FF000; DSISR=0x40000000; LR=0x0076713C; R1=0x1C27BB70; XCP=0x00000010 (0x400 – Inst access)
    Backtrace:
    0x00767108 0x003BECC0 0x003BEEE0 0x32ABCF08 0x0045FC84 0x00665408 0x00658844 0x00267FCC
    0x00266EAC 0x00266E10
    Kernel loadable modules in backtrace (with depen`
    *********

    #365681
    vulcan
    Participant

    This problem sucks! Now that I’ve got that off of my chest. . .
    I read this thread and see many other people having the same issues with the PMU. I’m not actually having the issue on an XServe, it’s on a 1GHz G4 Powerbook that was just recently sent back to apple for a motherboard replacement because of this issue. It came back about 2 weeks later with the same issue, forced shutdowns by the PMU after sleep. Not sure where to go with this since the machine had a clean install of the OS (10.3.9) and now with new HW, it has the same problem. No FW drives, peripherals, or anything for that matter attached to this machine except the power supply sometimes when the problems occur. It happens when the user closes the lid to put the machine to sleep and then when he attempts to wake it up, it has shut itself down. Battery is almost always plenty full.

    I thought I’d post this here to maybe help with the discussion since some people are pointing towards external devices and scsi cards as potential causes. Not saying they’d be wrong with that, just explaining that I have none of those external devices to cause the same issue on my end.

    Brian

    #365711
    chiefgeek
    Participant

    Our server that was doing this was diagnosed (on the second trip to the Apple Store two years later) with a bad optical drive. As I reflect on it, the drive always was a bit pokey. Problem was, the system installed. Every time. No errors, no nothing. And then things would begin going horribly wrong….

    After having the optical drive replaced, the server is currently installed and working fine as an OD replica with a relatively light workload.

    #366045
    apr400
    Participant

    I have a server crashing with similar symptoms as the above, but much greater rate.

    Thought I’d add my experiences as there doesn’t seem to be all that much about this on the web, plus a few queries to other people who’ve had this problem.

    I have an XServe that I am configuring to replace a linux server (machine environment and configuration at the bottom of this post). Everything was going well and I had started moving users mail across from the old server when after about 15 successful migrations the XServe started to crash. The machine simple stopped, and then restarted itself. After this it went into a cycle where every five minutes or so it would switch off and then restart. During the auto reboot it seemed to struggle a bit, powering the blowers up and down and flashing the System Warning light for a minute before stopping completely for a minute and then starting up as normal.

    (Is it supposed to do that (the struggling to start business I mean rather than the crashing!)? {This is my first XServe})

    After rebooting I looked in the logs – nothing useful before the crash (With the exception that just prior to the SECOND crash the last thing written was a few lines of binary – which is odd). The only helpful log item is a PMU FORCED SHUTDOWN, CAUSE = -122 after the restart. The power is supplied from a surge protected socket (Not UPS), several other machines run off the same circuit without trouble. I checked the power cable, and also tried a different circuit without joy – the server continued to reset after five minutes of uptime.

    I also watched activity monitor to keep an eye on run away cpu or memory usages – nothing going on there. Server Monitor also showed nothing abnormal going on.

    Checked crontab and cron – nothing running at that time.

    So I started disabling services, basing my choices on the last thing to write to the system log.
    Shutdown DiskSpaceMonitor – still crashed
    Shutdown DSM and watchdog – still crashed
    Shutdown DSM, WD and NTPD (checking apple time server) – still crashed

    At this point I got a bit irritated with the thing, and so after startup I shutdown everything, all of the above, plus everything in Server Manager, plus slapd.
    And the server stayed up. After an hour and a half, I started bringing things back up. Having started all of the services that were previously running the server stayed up for another hour or so without trouble. I then logged in to an ibook authenticating via LDAP, and mounting a (default (ie empty)) home from the server via AFP. Based on what I had read in various forums I was keeping a very close eye on the cpu usage. However the moment login began (ie as I hit enter on the ibook) the server shutdown immediately. (ie no time for watchdog to have been prevented from running, and no time for anything to be written to the log other than the first three steps of the Kerberos auth (up to localhost krb5kdc[341]: TGS_REQ…).)

    After auto restart I left services on, but didn’t connect the ibook, and it shutdown at five minutes as before. So somehow having the services off for a while and then bringing them back up allowed the server to run until the ibook log in.

    Anyway at this stage I pulled all of the leads on the Xserve and opened it up. I checked the battery – 3.7 volts. I reset the PMU, and then reattached everything bar the Firewire. Booted up again – ran normally for an hour. Logged in the ibook – no problems. Called it an evening at that point (what with the configuration and the migration and the crashing I’d been at my desk for 44 hours at that point!)

    Came in this morning and ran the server with no problems. At lunch I shut it down and reattached the firewire drives. Restarted – no problems so after an hour auth’d the ibook, and still going strong – uptime now 3 hours. I am a bit worried that there is not much loading on the server yet – difficult to replicate exactly the transfer of mail that started all this, as I have had to redeploy the old server back to the main net for the moment. I want to retry the migration this coming weekend, but am having worries re the XServe’s reliability at this point.

    If you have had this problem and managed to sort it via the PMU voodoo – does the problem reoccur? My nightmare involves moving umpteen gigabytes of user files and mail to the new server and then having to move it all back in five minute steps (especially given the format changes mean I can’t easily just pull the Xserve disks and pop them in the linux server).

    Going through various forums I have not found much on this situation other than this thread. Lots of people seem to have the crashes, lots of solutions are suggested, but not much is reported back re success. Also I have not heard of anyone crashing with quite my frequency. I have seen suggestions that it is due to the firewire disappearance issue as well, although at the moment my drives seem to be playing nice and they weren’t mounted during any of the problems (for what that’s worth), and I would like to avoid losing my local backup capability. (I use the disk to stage nightly backups before transfer to offsite NAS)

    Also, just to be safe I would love to run Apple Hardware Test, but I only have one Mac OS X server, and I’ll be buggered if I can get Javaist’s tip on the bootable xrdiag cd to work – tried both methods and just don’t seem to be able to boot of the resulting disks.

    (One thing of note – my system chokes on the suggested

    sudo bless –folder path

    but does accept

    sudo bless -folder "path"

    would that make any difference?)

    Anyone got any suggestions on that.

    Anyway thanks for reading this far through a long post. Config is below.

    ______System Configuration_______
    Xserve G5, 1 CPU 2GHz, 1GB Memory

    Some History on the server.

    The Server is one and a half years old, but for various reasons only brought into service about 2 weeks ago so it’s basically a new machine (albeit without a warranty unfortunately). All updates to 10.3.9 were applied before configuration. In order to configure the server we have a small private network with a NAT to the main LAN. This allows the XServe to look up its DNS on the main company DNS whilst having the same name and IP as the old server, and whilst keeping the old server running in the main lan. DNS is rock solid and we have had none of the normal DNS LDAP issues.

    The server has a VGA card. The monitor keyboard and mouse are by KVM, with a local and remote option (Adderview OSD with AdderLink X-Silver). There is an additional local Mac style keyboard attached by USB. There are two firewire devices attached – a tape backup drive (LaCie d2) and a Disk (LaCie d2) – Neither are left mounted during normal operation – they are for backups. There are also two ethernet cables, although only one NIC is on at the moment. On the net with the server – a linux server, an ibook (10.3.9), a windows (XP) and a printer. Power is through a surge protected socket.

    The server is running Open Directory, as an OD master, AFP, Firewall, Mail, and Windows Services (as PDC). User home directories are not on the startup volume, but a raided Home Volume (Xserve disks 2 and 3). User mailboxes are on a Mailboxes partition (same disk different volume to Startup). Users have AFP and SMB access (SMB by virtual shares). Until the crashes everything was running smoothly after configuration, with various windows and mac machines able to connect via SSL versions of services, Kerberos and Password auth all working etc etc.

Viewing 15 posts - 16 through 30 (of 45 total)
  • You must be logged in to reply to this topic.

Comments are closed