Home Forums OS X Server and Client Discussion Open Directory Suddenly LDAP gone

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #368864
    kd4ttc
    Participant

    I was mucking about on the server installing a new version of mysql. At one point I restarted the server. This morning all the client accounts are locked out. Server shows LDAP service is unavailable with the LDAP server stopped. Lookupd, Password Server, and Kerberos are running, and NeiInfo Server is Local only.

    About that time I had been in server config and deselected an option to disable login after so many failed attempts at log-in which I thought was an innocuous change.

    The logs in the slapconfig log showed

    2007-04-27 08:59:24 -0500 – slapconfig -setmacosxodpolicy
    2007-04-27 08:59:24 -0500 – command: /usr/bin/ldapadd -c -x -H ldapi://%2Fvar%2Frun%2Fldapi
    2007-04-27 08:59:25 -0500 – slapconfig -setldapconfig
    2007-04-27 08:59:25 -0500 – Stopping LDAP server (slapd)
    2007-04-27 08:59:27 -0500 – Starting LDAP server (slapd)
    2007-04-27 09:04:33 -0500 – slapconfig -setldapconfig
    2007-04-27 09:04:33 -0500 – command: /usr/sbin/mkpassdb -setreplicationinterval 300 SyncDefault
    2007-04-27 09:04:33 -0500 – slapconfig -setldapconfig
    2007-04-27 09:04:33 -0500 – Stopping LDAP server (slapd)
    2007-04-27 09:04:35 -0500 – Moving database from /var/db/openldap/openldap-data to /var/db/openldap/openldap-data
    2007-04-27 09:04:35 -0500 – Removed file at path /var/db/openldap/openldap-data/__db.001.
    2007-04-27 09:04:35 -0500 – Error moving database from /var/db/openldap/openldap-data to /var/db/openldap/openldap-data
    2007-04-27 09:11:05 -0500 – slapconfig -setmacosxodpolicy
    2007-04-27 09:12:15 -0500 – slapconfig -setmacosxodpolicy
    2007-04-27 09:13:50 -0500 – slapconfig -setmacosxodpolicy
    2007-04-27 09:13:50 -0500 – slapconfig -setldapconfig
    2007-04-27 09:26:15 -0500 – slapconfig -backupdb

    where at 08:59:24 I think I changed a policy. at 9:26:15 I figured I better backup lest I mess stuff up even more.

    The LDAP log shows

    Apr 27 08:59:25 ngi slapd[74]: slapd shutdown: waiting for 1 threads to terminate\n
    Apr 27 08:59:25 ngi slapd[74]: bdb(dc=ngi,dc=server): Locker still has locks\n
    Apr 27 08:59:25 ngi slapd[74]: bdb_locker_id_free: 9 err Invalid argument(22)\n
    Apr 27 08:59:26 ngi slapd[74]: slapd stopped.\n
    Apr 27 08:59:27 ngi slapd[1730]: @(#) $OpenLDAP: slapd 2.2.19 $\n
    Apr 27 08:59:27 ngi slapd[1730]: bdb_back_initialize: Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)\n
    Apr 27 08:59:28 ngi slapd[1730]: bdb_db_init: Initializing BDB database\n
    Apr 27 08:59:28 ngi slapd[1730]: slapd starting\n
    Apr 27 09:04:33 ngi slapd[1730]: slapd shutdown: waiting for 0 threads to terminate\n
    Apr 27 09:04:33 ngi slapd[1730]: bdb(dc=ngi,dc=server): Locker still has locks\n
    Apr 27 09:04:33 ngi slapd[1730]: bdb_locker_id_free: 9 err Invalid argument(22)\n
    Apr 27 09:04:34 ngi slapd[1730]: slapd stopped.\n

    which I think are the relevant entries.

    Any thoughts on where I need to start on this? It seemed the Locker still has locks is a clue, but I cannot find what that implies.

    Steve

    (BTW, permissions on the openldap data dir are:

    ngi:/var/db/openldap myname$ ls -l
    total 0
    drw——- 23 root wheel 782 Apr 27 18:10 openldap-data
    drwx—— 2 root wheel 68 Mar 25 2005 openldap-slurp
    drwxr-xr-x 2 root wheel 68 Mar 25 2005 run

    )

    #368866
    kd4ttc
    Participant

    All this is most bizarre. I fixed the problem thanks to AFP548 and Google. Solution follows.

    Continuing from the above, the day the server died my secretary had logged in that morning. I had later changed a policy on password lockout in Server Admin and after that no one could log in. I recall now that the microwave in my office showed evidence of a power loss that morning. Another thing that was going on was I was doing a program install and I restarted the server to check all would work from a reboot. A user had been left logged in at the reboot. I logged off that user after initiating the reboot, and the reboot may have hung until that user was logged out, but I’m a bit fuzzy on the details there.

    Anyway, the error that got me to a solution was reported in workgroup manager:

    [i]The node /LDAPv3/127.0.0.1 couldn’t be opened because an unexpected error of type -14002 occurred. [/i]

    Google got me to a Mac OSXHints post at
    http://forums.macosxhints.com/archive/index.php/t-37805.html
    which pointed to a post https://www.afp548.com/forum/viewtopic.php?forum=39&showtopic=4946&mode=&onlytopic=&show=10&page=2
    which together suggested that a power outage could cause this problem with a corrupted file in /var/db/openldap/

    Essentially, I used slapcat to retrieve configuration data from the openldap database, created a new openldap database, then repopulated the openldap database from my corrupted data. All is now fine.

    The details:

    slapd is not running for this to work. Had it been running and non-functional killing slapd would have been needed, I guess.
    [code]
    mkdir ~/ldap-rescue # create convenient directory
    sudo slapcat -l ldif # create text file from slapd database
    cd /var/db/openldap # move to openldap directory
    sudo su
    mv openldap-data openldap-data-old # srchive old data
    mkdir openldap-data # new directory
    chmod go-rx openldap-data # fix permissions, don’t know if needed.
    /usr/libexec/slapd # test to see if slapd will run. This didn’t work before, with slapd exiting.
    cat /var/run/slapd.pid # This resulted in a return value of 18691 on my system, so now slapd will run.
    kill -INT `cat /var/run/slapd.pid` # kill slapd anticipating use of slapadd.
    exit # get out of root. I’m dangerous.
    cd ~/ldap_rescue # back to the rescue directory.
    sudo slapadd -l ldif # reload the data. I’m lucky I got away with this.
    sudo slapcat -l ldifnew # diff reports no differences in ldif and ldifnew
    sudo /usr/libexec/slapd # start up slapd. Now all is well.
    [/code]
    and all was well. Looking at the two directories, openldap-data and openldap-data-old in /var/db/openldap showed a couple of interesting differences. __db.002 was 20meg in the old directory, and dn2id.bdb shrunk a bit, and

    [code]
    ngi:/var/db/openldap root# ls -l openldap-data
    total 3312
    -rw——- 1 root wheel 8192 Apr 29 20:23 __db.001
    -rw——- 1 root wheel 270336 Apr 29 20:23 __db.002
    -rw——- 1 root wheel 98304 Apr 29 20:23 __db.003
    -rw——- 1 root wheel 368640 Apr 29 20:23 __db.004
    -rw——- 1 root wheel 24576 Apr 29 20:23 __db.005
    -rw——- 1 root wheel 8192 Apr 29 20:22 apple-generateduid.bdb
    -rw——- 1 root wheel 8192 Apr 29 20:22 apple-group-memberguid.bdb
    -rw——- 1 root wheel 8192 Apr 29 20:25 apple-group-nestedgroup.bdb
    -rw——- 1 root wheel 8192 Apr 29 20:22 apple-group-realname.bdb
    -rw——- 1 root wheel 24576 Apr 29 20:26 cn.bdb
    -rw——- 1 root wheel 16384 Apr 29 20:22 dn2id.bdb
    -rw——- 1 root wheel 8192 Apr 29 20:22 gidNumber.bdb
    -rw——- 1 root wheel 229376 Apr 29 20:26 id2entry.bdb
    -rw——- 1 root wheel 561997 Apr 29 20:25 log.0000000001
    -rw——- 1 root wheel 8192 Apr 29 20:22 memberUid.bdb
    -rw——- 1 root wheel 8192 Apr 29 20:26 objectClass.bdb
    -rw——- 1 root wheel 8192 Apr 29 20:26 ou.bdb
    -rw——- 1 root wheel 8192 Apr 29 20:22 sn.bdb
    -rw——- 1 root wheel 8192 Apr 29 20:22 uid.bdb
    -rw——- 1 root wheel 8192 Apr 29 20:22 uidNumber.bdb
    ngi:/var/db/openldap root# ls -l openldap-data-old
    total 46536
    -rw——- 1 root wheel 55 Aug 6 2006 DB_CONFIG
    -rw——- 1 root wheel 8192 Apr 27 18:10 __db.001
    -rw——- 1 root wheel 20979712 Apr 27 18:10 __db.002
    -rw——- 1 root wheel 98304 Apr 27 18:10 __db.003
    -rw——- 1 root wheel 368640 Apr 27 18:10 __db.004
    -rw——- 1 root wheel 24576 Apr 27 18:10 __db.005
    -rw——- 1 root wheel 8192 Apr 27 00:17 apple-generateduid.bdb
    -rw——- 1 root wheel 8192 Apr 27 09:04 apple-group-memberguid.bdb
    -rw——- 1 root wheel 8192 Apr 27 09:04 apple-group-nestedgroup.bdb
    -rw——- 1 root wheel 8192 Apr 23 00:24 apple-group-realname.bdb
    -rw——- 1 root wheel 24576 Apr 27 18:13 cn.bdb
    -rw——- 1 root wheel 20480 Apr 27 18:13 dn2id.bdb
    -rw——- 1 root wheel 8192 Apr 27 08:59 gidNumber.bdb
    -rw——- 1 root wheel 229376 Apr 27 18:13 id2entry.bdb
    -rw——- 1 root wheel 1976350 Apr 29 19:52 log.0000000001
    -rw——- 1 root wheel 8192 Apr 27 09:04 memberUid.bdb
    -rw——- 1 root wheel 8192 Apr 27 18:13 objectClass.bdb
    -rw——- 1 root wheel 8192 Apr 27 18:13 ou.bdb
    -rw——- 1 root wheel 8192 Apr 27 00:17 sn.bdb
    -rw——- 1 root wheel 8192 Apr 27 09:04 uid.bdb
    -rw——- 1 root wheel 8192 Apr 27 09:04 uidNumber.bdb
    [/code]

    Thanks very much to the contributors of posts on the two threads referneced above, one on MacOSXHints, and the other on AFP548.

    #368888
    kd4ttc
    Participant

    [QUOTE][u]Quote by: MacTroll[/u][p]Glad you were able to fix it. Can’t say this happens often, but if the power does go out the LDAP db could be put away dirty.

    Are you now doing nightly OD backups? 😀 [/p][/QUOTE]

    Well, that is a fine question. I have just gotten started in the Server world with migration from a peer to peer topology to a server setup. I was pleased to have everything working. I do keep my clinical data backed up, and mysql files are backed up as well. (I own a medical practice.) The crash here shows me I need to get my backup plans in order, pronto, before too much more time goes by.

    I have backed up my configuration data by saving the .plist files you get when you drag the icon out of server administrator. Reviewing those files shows they do not constitute a full backup of the server software. I saw that there is a backup script for Open Directory available here at afp548 for doing that, which archives the slap directory, Kerberos, and another servive. These findings beg the question: just what does one do to back up the server?

    Steve

    #369111
    timh
    Participant

    I always set the following locations to be backed up on Mac servers and sometimes workstations:
    * /Users – optional, but most useful on workstations.
    * /System
    * /Library – the apache document root is contained within this location
    * /private – this is the important one on Mac servers as it encompasses most of the system configuration (/etc) and working data (/var)

    It’s similar to my linux server backup policy, which is always at least:
    * /etc
    * /var
    * /home
    * /opt – if I have any custom software installed.[/list]

    #369137
    Simple1
    Participant

    [QUOTE][u]Quote by: kd4ttc[/u][p][QUOTE][u]Quote by: MacTroll[/u][p]Glad you were able to fix it. Can’t say this happens often, but if the power does go out the LDAP db could be put away dirty.

    Are you now doing nightly OD backups? 😀 [/p][/QUOTE]

    Well, that is a fine question. I have just gotten started in the Server world with migration from a peer to peer topology to a server setup. I was pleased to have everything working. I do keep my clinical data backed up, and mysql files are backed up as well. (I own a medical practice.) The crash here shows me I need to get my backup plans in order, pronto, before too much more time goes by.

    I have backed up my configuration data by saving the .plist files you get when you drag the icon out of server administrator. Reviewing those files shows they do not constitute a full backup of the server software. I saw that there is a backup script for Open Directory available here at afp548 for doing that, which archives the slap directory, Kerberos, and another servive. These findings beg the question: just what does one do to back up the server?

    Steve[/p][/QUOTE]

    Well for me I have two raided external disks that backup up my OD Master on alternating days with the application Super Duper! (only cavit is that it can’t backup ACLs, small price to pay for creating a startup disk of my current server on an external disk). I always make sure I create an archive of my OD Master every month and back that up to my portable external disk, just in case of a fire or anything crazie like that (nothing a 128mb flash drive can’t hold).

    #370203
    kd4ttc
    Participant

    In the above script I modified the permissions with the line
    [code]
    chmod go-rx openldap-data
    [/code]
    Well, this isn’t a good idea. On server restarts the system is unable to read the directory and slapd isn’t running. I thought root was mucking around there, so restrictive permissions were not going to be a problem. I dropped the chmod command and not the server starts. No I can reboot and everything works.

    Could someone check their setup and report the user and group ids for the directory and the permissions? What process is reading the openldap-data directory that I excluded with those permissions?

    Steve

Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.

Comments are closed