Home › Forums › OS X Server and Client Discussion › Open Directory › Incorrect PasswordServer Entries
- This topic has 1 reply, 2 voices, and was last updated 16 years, 1 month ago by
tlarkin.
-
AuthorPosts
-
February 15, 2009 at 7:35 pm #375415
trice
ParticipantOk – some background – the short short version – Back around October I noticed the password server not playing nice with passwords and WGM – basically users would have mixed results changing passwords from the login window and WGM was kicking up all sorts of errors when making changes in it – such as changing passwords or even simply adding a user. Usually these were 14090 or 14134, not to mention the occasional 14098 errors. I noticed however that these errors seemed to go away when I changed one of our replicas to Connected to a Directory System. Though we would still have issues where the slapd process would spike now and again and would basically kill directory services. After doing some digging I noticed the edu.mit.kerberos file on the master, and thus everywhere else, contained an IP listing for this replica instead of its hostname. After verifying DNS was correct and flushing the DNS cache on the master and the replicas, I removed those errant entries by hand. However they would always wind up returning so I figured there had to be some sort of defaults somewhere. Finally I discovered that those entries were coming from the XMLPlist and dsAttrTypeNative:apple-xmlplist variables in the /LDAPv3/127.0.0/1/Config/KerberosClient Key. So I got them out of there re-promoted that server from Connected to a Directory System to a replica and the kerberos file was once again correct. I also followed the instructions at http://support.apple.com/kb/TA23516?viewlocale=en_US (modifying it for Leopard) to clear all previous Kerberos info off of this server before I re-promoted it. However that still didn’t fix the aforementioned errors.
At this point I noticed that /LDAPv3/127.0.0.1/Config record contained extra passwordserver entries. There was the regular passwordserver key and a bunch of passwordserver_XXXXXXXXXX entries. The XXX would be a series of numbers and letters that corresponded to the ID string in the XML config files associated with these entries. So now I assumed that these entries corresponded to the replica being promoted incorrectly. I demoted the replica, deleted these entries, and re-promoted the replica. No dice another errant entry was still created. Now I thought Ok maybe my authserverreplicas file was messed up. So this morning I demoted all of our replicas to standalone and then deleted all of the authserverreplica files in /var/db/authserver on the master. I then did a kill -9 PasswordService on the master, which recreated the authserverreplicas file. At this time I noticed that the DNS entry in the authserverreplicas file contained the ODM ptr record instead of its DNS name. So I changed that to reflect the correct information. I then started promoting replica’s again. As soon as I started doing that those extra passwordserver keys came back. Now I’m at the point where I’m assuming all the problems I’m experiencing with WGM and people changing passwords at the login window have to do with the OD seeing these extraneous entries and getting all confused as to where it should be looking. I think the slapd issue might be related but might be more closely related to having limited admins since other people have reported this symptom and reported it disappearing after removing limited admin functionality
So now I tired to get a bit more ambitious so I did the following…..1) Took all existing replicas and made them standalones
2) On those replicas I removed the following folders/files
/var/db/authserver
/var/db/krb5kdc
/etc/krb5.keytab3) Stopped the password server on the master
4) Deleted all files beginning with authserverreplica in the var/db/authserver folder on the master
5) Rebuilt the authserverreplicas file using the instructions at http://support.apple.com/kb/TA24459
This created a new file with a completely new ID string6) Started the password service and rebooted the master
Once I rebooted the master and looked in the /var/db/authserver folder i saw a file with the following name
authserverreplicas.remote.1111222233334444That number corresponds to the ID string in the previous authserverreplicas file
7) Looked at the /LDAPv3/127.0.0.1/Config/passwordserver key. It still contained info from the old authserverreplicas file including that ID string listed above. If i tried to promote one of the replicas it would create the extraneous passwordsever key with the old id such as passwordserver_1111222233334444
8) At this point I thought ok maybe this key for whatever reason is not picking up the correct info from the authserverreplicas file. So i deleted the values in the /LDAPv3/127.0.0.1/Config/passwordserver/PasswordServerList attribute and replaced it with the current contents of the authserverreplicas file. Rebooted the server and all services still running normally.
9) Began to repromote the standalone servers.
10) As soon as I did that the extraneous passwordserver keys began to return, but this time with the new id number appended to them.
11) Now (although the same thing happened last week before I made any of these changes) the replicas are not starting their own Kerberos servers. The replication process goes ok except for Kerberos starting up. DNS hasn’t changed and is still resolving. The logs show the kerberos process completing normally. However the Configuration log shows at one point “kdc command failed with status 3” The system log also shows errors about not being able to start up the LKDC though I would think this wouldn’t matter for the OD realm. So for now I have everything configured as Connected to a Directory Server, which seems to be working and not kicking up any errors in WGM. Which leads me to believe even more that the initial errors I saw in WGM has something to do with the PasswordServer and/or the replication process.
The only other thing, which I don’t know if its correct or not, is that the /LDAPv3/127.0.0.1/Config/KerberosKDC key has configuration info for both the OD relam and the LKDC.
So just to recap – there’s a new authserverreplicas file with what appears to be the correct information. There is also that authserverreplicasremote file with a different ID key attached to it. Extraneous passwordserver keys keep getting created. Kerberos on the replicas isn’t working. All servers are 10.5.6. The OD was upgraded from 10.4.11 and at one point had its IP and hostname changed, but I used changeip and all seem to go well. Could there be cached info someplace else? Apple Enterprise support had no answers. I did at one point demote the master to standalone, reinstall Leopard Server from scratch, and restored an OD archive, but that didn’t fix anything.
Sorry for the length of this post but I am really out of options right now and the brain is a bit fried.
February 18, 2009 at 8:18 pm #375463tlarkin
ParticipantI have had similar problems in the past. I had it at one point using 10.5.3 and 10.5.4 where WGM would spit out all sorts of errors and give out users negative UIDs. After trying a bunch of different things and talking with our enterprise support from apple we deemed it as LDAP corruption. Exported LDAP users and groups, demoted all servers, wiped and reloaded, then repromoted and then improted back in users and groups. I had to reset master passwords for users and I used passenger to apply passwords to other users.
Haven’t had those problem since, no more crazy WGM errors, no more negative UIDs (still get negative UIDs once in a while), and our problem has been a lot better. We also migrated all clients and servers to 10.5.5. I submitted another enterprise ticket on WGM and they Apple replied that there are some more kinks to work out and it will be fixed in a future major software update. Meaning, 10.5.7 or who knows, 10.6? Couldn’t tell ya.
I also had some DNS issues too, but those are now taken care of. I can’t say that our problems are exactly the same, but I was getting tons and tons of errors in WGM and PasswordServer service was going off the wall. slapd was also bending over my ODM server running at like 380% usage out of the servers four cores.
-
AuthorPosts
- You must be logged in to reply to this topic.
Comments are closed