Home › Forums › OS X Server and Client Discussion › Open Directory › OD Crash…
- This topic has 1 reply, 2 voices, and was last updated 15 years, 10 months ago by
firehaus.
-
AuthorPosts
-
April 21, 2009 at 8:40 pm #376029
trampoline
ParticipantHi
We have a problem…two machines display the same behaviour.
Both machines have been stripped and rebuilt from scratch…
but still a crash occurs…
BEfore and after the alterations made below…Xenon Quad core 10.5.6…
What could it be ???
*Environment:*
OpenDirectory Master that has had it’s LDAP directory imported from a
previous Tiger server instance.
PDC for windows network.
WINS server.
AFP home folder for macs.
SMB shares for PC’s and macs.
NFS shares for Linux servers and workstations.
HFS+ file systems on fibre Unity RAID array.
*
Factors leading up to a crash:*
Increased load on AFP and SMB shares, usually reaching their peak of
~40+ AFP connections and ~90+ SMB connections.
We have witnessed this crash during periods of low load also.*Results of the crash:*
DirectoryServices daemon running continuously at 100+%.
Slow to ID a user, directory listing are slow as a result.
Slow to login via SSH.
Samba fails to allow current new connections to work.
CPU lights on the xserve are showing high levels of activity.
Server can be come progressively worse until it is almost unresponsive.*Steps to recover:*
Reload DirectoryServices using these commands:
launchctl unload
/System/Library/LaunchDaemons/com.apple.DirectoryServices.plist
launchctl load
/System/Library/LaunchDaemons/com.apple.DirectoryServices.plist
Sometimes samba needs restarting after running the above commands in
order to get access working properly again.
Sometimes kerberos authentication fails for specific external servers
requiring us to reboot OS X server to fix this.
A hard reboot is sometimes required to get other services working
properly again.*Steps taken to prevent the crash:*
In an attempt to fix the vm growth error (see syslog) we made the
following changes:
Edited /etc/sysctl.conf and added the following lines
kern.maxproc=2128
kern.maxprocperuid=400
Also this change:
echo “limit maxproc 1500 2500” | sudo tee -a /etc/launchd.conf*Frequency of occurrence:*
Usualy daily, resulting in up to ten minutes loss of access for 40 pc’s
and 100+ proxy userssys log…
xserve DirectoryService[28]: Potential VM growth in DirectoryService since
client PID: 0, has 800 open references when the warning limit is 500.
Mar 26 12:51:41 xserve DirectoryService[28]: Potential VM growth in
DirectoryService since client PID: 0, has 775 open references when the warning
limit is 500.
Mar 26 12:51:42: — last message repeated 1 time —
Mar 26 12:51:42 xserve DirectoryService[28]: Potential VM growth in
DirectoryService since client PID: 0, has 800 open references when the warning
limit is 500.
Mar 26 12:51:53 xserve DirectoryService[28]: Potential VM growth in
DirectoryService since client PID: 0, has 775 open references when the warning
limit is 500.
Mar 26 12:52:06: — last message repeated 2 times —
Mar 26 12:52:06 xserve DirectoryService[28]: Potential VM growth in
DirectoryService since client PID: 0, has 800 open references when the warning
limit is 500.Crash Log
Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000030
Crashed Thread: 6Thread 0:
0 libSystem.B.dylib 0x90bc11da write$NOCANCEL$UNIX2003 + 10
1 libSystem.B.dylib 0x90bc105f __sflush + 79
2 libSystem.B.dylib 0x90bca511 fflush + 106
3 …ectoryServiceCore.Framework 0x00162998 CFile::write(void const*, int) + 284
4 …ectoryServiceCore.Framework 0x00164797 CFile::write(char const*, int) + 31
5 …ectoryServiceCore.Framework 0x00163a13 CLog::Append(CString const&) + 299
6 …ectoryServiceCore.Framework 0x00163ba6 SrvrLog + 98
7 DirectoryService 0x000171fd main + 2821
8 DirectoryService 0x000166da start + 54Thread 1:
June 2, 2009 at 12:09 am #376352firehaus
ParticipantI have begun to experience either a related issue or the identical issue since late March. I implemented your suggestions for “Steps taken to prevent the crash” as they seemed reasonable. I’ll post back any update or future solution if I figure out anything new. My hardware is a little different and I have other related systems that are also causing this issue to occur.
Mac OS X 10.5.6
Xserve dual G4 1GHz 2GB RAM
OD Master + AFP shares
HFS+ file system on internal RAID 1Mac OS X 10.5.6
Xserve Quad Xeon 3GHz 16GB RAM
Mail related services only
HFS+ file system on internal RAID 5When we reach ~40+ AFP users during biz hours about half or more are disconnected. There are only 2-3 SMB users at most.
Typically it appears to be our mail server that instigates the issue. Mail delivery is held up because of the error since users can’t be authenticated to access their accounts.Downtime for the AFP share is usually less than 4-5 minutes but downtime for the mail server can be as high as 15-20 minutes as we have a large message store, plus a large 1.7TB RAID 5 that has to be initialized.
-
AuthorPosts
- You must be logged in to reply to this topic.
Comments are closed