Forum Replies Created

Viewing 6 posts - 16 through 21 (of 21 total)
  • Author
    Posts
  • in reply to: Help us silence slapd with its GSSAPI errors #368555
    mcnaugha
    Participant

    This is what we’re getting on the one I tried to fix:

    Mar 14 14:48:30 g4server slapd[29001]: SASL [conn=4190] Failure: GSSAPI Error: Miscellaneous failure (Decrypt integrity check failed)\n

    I know that means the keys are out of sync between the KDC and keytab. Cannot seems to fix.

    This is the initlal message:

    Mar 14 16:32:00 g4server slapd[46]: SASL [conn=9731] Failure: GSSAPI Error: Miscellaneous failure (No principal in keytab matches desired name)\n

    Another message that appears is:

    Mar 14 16:32:00 g4server slapd[46]: SASL [conn=9731] Failure: GSSAPI Error: Miscellaneous failure (Server not found in Kerberos database)\n

    Now we’re not talking about the odd log entry of this… we’re talking hundreds if not thousands! We’re seeing it on several High School Tiger Server. They run as ODM and PDC. We’re also seeing major instability with these servers too. It’s very random though. As I’m not used to seeing slapd logging this much I can only think it is related to the instability.

    We are ironically unable to update to 10.4.9 because of another issue I have posted here.

    in reply to: 10.4 Clients hang on log on windows #368554
    mcnaugha
    Participant

    Try implementing the following commands in the command line on your problem 10.4 clients to see if it resolves the login issues:

    Make sure you are logged in with an administrator account.

    Open Terminal (/Applications/Utilities).

    Create a backup of the current NFS Startup script with this command on a single line:

    sudo cp /System/Library/StartupItems/NFS/NFS /System/Library/StartupItems/NFS/NFS.bak

    Open the NFS Startup Item script for editing with nano by using this command:

    sudo nano /System/Library/StartupItems/NFS/NFS

    To locate the section of the script that starts automount, press Control-W, type “automounter”, and press Return.

    Under the section “Start the automounter,” change the line that reads:

    automount -m /Network -nsl -mnt ${AUTOMOUNTDIR}

    to read:

    automount -1 -m /Network -nsl -mnt ${AUTOMOUNTDIR}

    Change the line that reads:

    automount -m /automount/Servers -fstab -mnt /private/Network/Servers \

    to read:

    automount -1 -m /automount/Servers -fstab -mnt /private/Network/Servers \

    Save the file (Control-O, Return), and exit nano (Control-X).

    Reboot.

    Let me know if that works. If it does, you can use ARD to roll out this file to all of your clients.

    mcnaugha
    Participant

    I hope I’m not jumping the gun with our situation, but we have made some fairly simple fixes which may have stabilised our servers. I cannot guarantee this information will help you or that it is even safe to carry out. It appears to be working for us at the moment. Follow at your own risk!

    We discovered two issues from examining the logs.

    I believe that the SMB issue was serious and was definitely causing the problems we were seeing.

    The LDAP issue is only serious if my theory is correct. Only time might tell on that one.

    The first thing we found was entries in the log.smbd and log.nmbd logs referring to “.tdb” files. TDB files are Samba’s trivial databases. The format it uses to store operational data. These files are extremely important and need to be in good health. Filter your smbd and nmbd logs for “.tdb”. This led me to recall that there is a special Samba tool which verifies the integrity of these files. It’s called tdbbackup. As you can tell from it’s name it also backs up the files if wanted. To use tdbbackup you need to point it at the locations where your tdb files are. I am only aware of two important locations for these on Mac OS X Server. They are:

    /var/samba/ and /var/db/samba

    tdbbackup can be run even when your SMB service is up and running… which is great news for the impatient. I ran tdbbackup with the “-v” option over both of those locations. The command I needed to run was:

    sudo tdbbackup -v /var/samba/*.tdb

    and

    sudo tdbbackup -v /var/db/samba/*.tdb

    Normal output would look similar to this:

    /var/samba/brlock.tdb : 0 records
    /var/samba/connections.tdb : 0 records
    /var/samba/gencache.tdb : 0 records
    /var/samba/locking.tdb : 0 records
    /var/samba/sessionid.tdb : 0 records
    /var/samba/unexpected.tdb : 1 records

    Note there is no mention of “restoring”. tdbbackup will automatically attempt to restore a backup if it find corruption.

    Problem output contains something similar to this:

    restoring /var/samba/share_info.tdb
    /var/samba/share_info.tdb.bak: No such file or directory

    You need to note all the tdb file names where tdbbackup tried to restore. These files are corrupted and should be thrown away. Don’t throw them away immediately. Move them to another location, e.g. a folder on your desktop, and then restart your server. Verify your SMB service continues to function as you expect. This is just incase your SMB service needed something within the tdb files you have removed. I’m not knowledgeable enough to know if these files are always respawnable or if they are actually created/built-up and needed. If your SMB service is working fine, including Domain logons and Domain joining if you’re working with the PDC role.

    Re-run tdbbackup again. This time you should get the normal output without the “restoring” entries. From this point you should have a healthy SMB service… if you were suffering from corrupt tdb files. We found the tdb file corrupt was leading to the SMB service going “mad” and generating exponentially-sized log.smbd files. This led to the startup disk being completely filled; halting the server. It could also have been responsible for the server crashes we experienced without the startup disk filling up.

    i also advise adding a “max log size” entry to your smb.conf. We’ve gone for 50MB at the moment, i.e. max log size = 51200. Probably should be even smaller. Any changes you make to smb.conf won’t take effect until you restart.

    The LDAP slapd.log files were showing the following kind of error:

    Feb 28 07:49:45 g4server slapd[46]: <= bdb_equality_candidates: (sambaSID) index_param failed (18)\n Feb 28 07:49:45 g4server slapd[46]: <= bdb_equality_candidates: (rid) index_param failed (18)\n Feb 28 10:57:39 g4server slapd[46]: <= bdb_substring_candidates: (apple-mcxflags) index_param failed (18)\n Masses and masses of them. The slapd.log files were getting bigger and bigger. We even saw one which was 180MB. Credit to Josh for pointing me in the right direction. It is my theory that it may indicate that the LDAP server is struggling to lookup these values fast enough and needs them to be indexed. This could lead to the slapd having fits... or at least that's what I think is happening when our heavy-usage servers suddenly start to act up - often leading to a complete hang. How do you get them indexed? Well you need to modify your slapd_macosxserver.conf file. This file contains the parameters which should be indexed for faster retrieval. This is the section to look at: # Indices to maintain index cn,sn,uid pres,eq,approx,sub index uidNumber,gidNumber eq index memberUid eq index apple-generateduid eq index ou eq index apple-group-realname eq index macAddress eq index apple-category eq index apple-networkview eq index apple-group-memberguid eq index apple-group-nestedgroup eq index objectClass eq The parameters needing indexed are contained within the brackets in the log entries. Also note the type of indexing candidate, e.g. the "apple-mcxflags" is a "substring" and "sambaSID" is a "equality". This is what we had to add to the slapd_macosxserver.conf file: index sambaSID eq index rid eq index apple-mcxflags sub This can be achieved using the following text editor: sudo nano /etc/openldap/slapd_macosxserver.conf Do not copy exactly what I've done. We found that different servers had different parameters needing indexed. You need to evalute your requirements from your slapd.log. Next you need to get the LDAP index updated. To do this, LDAP must be taken offline. All your users must be prepared for the server to go offline. Here's the Tiger commands: sudo launchctl unload /System/Library/LaunchDaemons/org.openldap.slapd.xml sudo slapindex sudo launchctl load /System/Library/LaunchDaemons/org.openldap.slapd.xml Here's the Panther commands: sudo SystemStarter stop LDAP sudo slapindex sudo SystemStarter start LDAP You should now find that the slapd.log entries relating to these parameters stop. If my theory is correct, the newly updated index just took a lot of pressure off of slapd and that should bring about greater stability. I welcome feedback here. I'm sure there are command optimisations that could be added. Also if you think this is wrong and dangerous let us know too. For the brave, please let us know if you think this helped.

    mcnaugha
    Participant

    We are seeing this across multiple servers. All sorts of crazy stuff seems to be happening and I can’t tell if it’s OS bug, hardware overload (although not according to Activity Monitor), or perhaps Windows viruses repeatedly hitting their PDC, i.e. the Mac Server.

    in reply to: LDAP Errors #368439
    mcnaugha
    Participant

    I’m seeing this all over the place in a large education district where every server has over 1000 users. Should we be concerned Joel? Anything we can do to rectify?

    These schools have around 200 Macs and sometimes as many as 200 PCs hanging off the Mac server.

    Multiple servers are falling over several times a week or day. One school seems to have stabilised when we switched from Dual 1.25GHz G4s to Mac Pro hardware. Are these old G4s being tasked too much?

    Thanks!
    A.

    in reply to: Kerberos – Can’t change passwords #366220
    mcnaugha
    Participant

    I had to do a clean build in order to get the Windows XP clients to be able to change their passwords. Very annoying and time consuming.

    This probably isn’t linked to Kerberos given that the Kerberos still cannot let me reset my password when it is set as ana option from Workgroup Manager.

    All Windows XP clients need to have their local profiles flushed and profiles on the server need to be renamed and data manually migrated. What a nightmare!

Viewing 6 posts - 16 through 21 (of 21 total)