AFP548 - Topic: HACKED??? Massive Failure on 10.4.8 intel Auth Related not sure

This topic has 12 replies, 7 voices, and was last updated 16 years, 10 months ago by aoihmc.

Viewing 11 posts - 1 through 11 (of 11 total)

Author

Posts
November 13, 2006 at 11:49 pm #367577

dragonmac
Participant

First the setup info.
Hardware Overview:
Machine Name: Mac Pro
Machine Model: MacPro1,1
Processor Name: Dual-Core Intel Xeon
Processor Speed: 2.66 GHz
Number Of Processors: 2
Total Number Of Cores: 4
L2 Cache (per processor): 4 MB
Memory: 3 GB
Bus Speed: 1.33 GHz
Boot ROM Version: MP11.005C.B01
SMC Version: 1.7f8

System Software Overview:
System Version: Mac OS X Server 10.4.8 (8L2127)
Kernel Version: Darwin 8.8.1
Boot Volume: Macintosh HD
Computer Name: OSXServer
User Name: System Administrator (root)

Ok Last night around 4am (i’ll post parts of log) a massive system problem occurred. Not sure where to even begin to fix this but the primary problem is that the Mail is not working. I think there is a lot more going on as you will see.
Got a phone call mail is down all other services seemed unaffected at about 9:30. I VNC’d in an was logged in as root. The root directory seemed fine at this time. I launch server Admin ok the first time and services “looked” ok so I just restarted the Mail service. Ok that didn’t work and something else seemed wrong. I looked in the Activity Monitor and noticed smbd & syslogd where hogging the CPU. So in SA i stopped the windows/SAMBA. Back in the Activity Monitor smbd stopped but syslogd was still going nuts. OK lets Reboot. After reboot VNC was down and they said no one could even log in now let alone get mail. OK to SSH we go and it appeared all services except OD did not start?? hmm. So I “serveradmin start afp” “serveradmin start mail” “serveradmin start web” “serveradmin start windows” i then kickstart’ed VNC/ARD. I got back on VNC to look around and logged in as root. OMG the Whole Root user appeared to be Messed up. No open windows popped up keychain was bad, Nothing was right with the root user directory. Mail service was started but IMAP & POP3 where not started? Ok i reset the the settings in SA and restarted Mail no dice and the SA would require me to enter password since keychain was bad. Also the “Print” in SA can’t even get to it. It’s Grayed out. Just everything seemed bad and there was only 2GB available on the server Hard Drive. I found later the “/private/var/log/samba/log.smbd.0.gz” file was 85GB in size. I just trashed it.
There are more problems but some massive failure has happened and I will post logs to show the weird stuff i saw in the logs.

WAS I HACKED!!!!!

File links you should see DAM SPAM filter won’t let me post the links the right way please look at these logs they are scary!!!

use “members” dot “aol” dot “com” as the FQDN

http://FQDN/dragonmacpc/systemlog0gz.txt

http://FQDN/dragonmacpc/systemlog.txt

There are TWO log files overlapping in time entries. the new Log started around 4:30am contain HIGHLY sensitive data!!!!
I cut out things i felt where bad to post publicly and tried to shorten the Post But you have to see this!!!!

Shortly after midnight: “osxserver kernel[0]: file: table is full” errors thousands of them:

Nov 13 00:07:06 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 00:07:15 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 00:18:25 osxserver ctl_cyrusdb[22546]: checkpointing cyrus databases
Nov 13 00:18:55 osxserver ctl_cyrusdb[22546]: done checkpointing cyrus databases
Nov 13 00:23:46 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 00:37:15 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 00:40:26 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 00:48:25 osxserver ctl_cyrusdb[22795]: checkpointing cyrus databases
Nov 13 00:48:25 osxserver ctl_cyrusdb[22795]: done checkpointing cyrus databases
Nov 13 00:57:06 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 01:00:17 osxserver sshd[22816]: fatal: Timeout before authentication for 62.249.240.14
Nov 13 01:07:15 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 01:13:46 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 01:18:25 osxserver ctl_cyrusdb[22855]: checkpointing cyrus databases
Nov 13 01:18:55 osxserver ctl_cyrusdb[22855]: done checkpointing cyrus databases
Nov 13 01:30:26 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 01:30:29 osxserver kernel[0]: file: table is full
Nov 13 01:30:29 osxserver kernel[0]: file: table is full
…
…
Nov 13 01:36:26 osxserver kernel[0]: file: table is full
Nov 13 01:37:15 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 01:37:37 osxserver kernel[0]: file: table is full
…
…
Nov 13 04:01:06 osxserver kernel[0]: file: table is full
Nov 13 04:01:12 osxserver tls_prune[24491]: DBERROR db4: /var/imap/db/log.0000000005: log file open failed: Too many open files in system
Nov 13 04:01:12 osxserver tls_prune[24491]: DBERROR db4: PANIC: Too many open files in system
Nov 13 04:01:12 osxserver tls_prune[24491]: DBERROR: critical database situation
Nov 13 04:07:28 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 04:17:06 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 04:18:25 osxserver ctl_cyrusdb[24506]: DBERROR db4: PANIC: fatal region error detected; run recovery
Nov 13 04:18:25 osxserver ctl_cyrusdb[24506]: DBERROR: critical database situation
Nov 13 04:25:02 osxserver postfix/trivial-rewrite[22189]: warning: write resolver reply: Broken pipe
Nov 13 04:33:46 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 09:36:19 osxserver postfix/postqueue[24159]: warning: close: Operation timed out

This is end of log file system.log.0.gz “osxserver kernel[0]: file: table is full” appeared Thousands of times.

system.log begins here

Nov 13 04:37:07 osxserver cp: error processing extended attributes: Operation not permitted
Nov 13 04:37:08 osxserver cp: error processing extended attributes: Operation not permitted
Nov 13 04:37:08 osxserver cp: error processing extended attributes: Operation not permitted
Nov 13 04:37:28 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 04:39:48 osxserver postfix/smtpd[22514]: warning: 202.10.85.170: hostname 202_10_85_170.g-node.com.au verification failed: Host not found
Nov 13 04:39:49 osxserver postfix/smtpd[22514]: warning: 216.129.243.156: hostname 216-129-243-156.dsl.williston.nemontel.net verification failed: Host not found
Nov 13 04:39:49 osxserver postfix/smtpd[22514]: warning: 216.129.243.156: hostname 216-129-243-156.dsl.williston.nemontel.net verification failed: Host not found
Nov 13 04:39:55 osxserver postfix/smtpd[22514]: warning: 201.160.164.114: hostname 201.160.164.114.cableonline.com.mx verification failed: Host not found
Nov 13 04:40:02 osxserver postfix/smtpd[22514]: warning: 201.160.164.114: hostname 201.160.164.114.cableonline.com.mx verification failed: Host not found
Nov 13 04:40:35 osxserver postfix/smtpd[22514]: warning: 218.85.28.202: hostname pc202.broad.dynamic.fz.fj.cn.cndata.com verification failed: Host not found
Nov 13 04:46:41 osxserver postfix/qmgr[22234]: fatal: watchdog timeout
Nov 13 04:46:42 osxserver postfix/master[58]: warning: process /usr/libexec/postfix/qmgr pid 22234 exit status 1
Nov 13 04:46:42 osxserver postfix/master[58]: warning: /usr/libexec/postfix/qmgr: bad command startup — throttling
Nov 13 04:48:26 osxserver ctl_cyrusdb[24593]: DBERROR db4: PANIC: fatal region error detected; run recovery
Nov 13 04:48:56 osxserver ctl_cyrusdb[24593]: DBERROR: critical database situation
Nov 13 04:50:26 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 05:07:06 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 05:07:28 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 05:18:25 osxserver ctl_cyrusdb[24627]: DBERROR db4: PANIC: fatal region error detected; run recovery
Nov 13 05:18:25 osxserver ctl_cyrusdb[24627]: DBERROR: critical database situation
Nov 13 05:23:46 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 05:37:28 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 05:40:26 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 05:48:25 osxserver ctl_cyrusdb[24636]: DBERROR db4: PANIC: fatal region error detected; run recovery
Nov 13 05:48:55 osxserver ctl_cyrusdb[24636]: DBERROR: critical database situation
Nov 13 05:57:06 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 06:07:28 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 06:13:46 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 06:18:26 osxserver ctl_cyrusdb[24666]: DBERROR db4: PANIC: fatal region error detected; run recovery
Nov 13 06:18:56 osxserver ctl_cyrusdb[24666]: DBERROR: critical database situation
Nov 13 06:30:26 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 06:35:44 osxserver sshd[24670]: fatal: Timeout before authentication for 62.149.229.143
Nov 13 06:37:29 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 06:47:06 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 06:48:25 osxserver ctl_cyrusdb[24677]: DBERROR db4: PANIC: fatal region error detected; run recovery
Nov 13 06:48:25 osxserver ctl_cyrusdb[24677]: DBERROR: critical database situation
Nov 13 07:03:46 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 07:07:28 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 07:13:04 osxserver postfix/trivial-rewrite[24522]: warning: write resolver reply: Broken pipe
Nov 13 07:18:25 osxserver ctl_cyrusdb[24689]: DBERROR db4: PANIC: fatal region error detected; run recovery
Nov 13 07:18:55 osxserver ctl_cyrusdb[24689]: DBERROR: critical database situation
Nov 13 07:20:26 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 07:37:06 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 07:37:29 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 07:48:26 osxserver ctl_cyrusdb[24695]: DBERROR db4: PANIC: fatal region error detected; run recovery
Nov 13 07:48:26 osxserver ctl_cyrusdb[24695]: DBERROR: critical database situation
Nov 13 07:53:46 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 08:07:29 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 08:10:26 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 08:18:25 osxserver ctl_cyrusdb[24701]: DBERROR db4: PANIC: fatal region error detected; run recovery
Nov 13 08:18:55 osxserver ctl_cyrusdb[24701]: DBERROR: critical database situation
Nov 13 08:27:06 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 08:37:29 osxserver kernel[0]: (126: coreservicesd)tfp: failed on 0:
Nov 13 08:43:46 osxserver postfix/master[58]: warning: unix_trigger_event: read timeout for service public/flush
Nov 13 08:44:50 osxserver imap[20853]: login: localhost [::1] coling plaintext User logged in
Nov 13 08:44:50 osxserver imap[20860]: login: localhost [::1] coling plaintext User logged in
Nov 13 08:44:50 osxserver imap[20867]: login: localhost [::1] coling plaintext User logged in
Nov 13 08:44:50 osxserver imap[20869]: login: localhost [::1] coling plaintext User logged in
Nov 13 08:44:50 osxserver imap[20876]: login: localhost [::1] coling plaintext User logged in
Nov 13 08:44:50 osxserver imap[20881]: login: localhost [::1] coling plaintext User logged in
Nov 13 08:44:51 osxserver lmtpunix[20406]: warning: unable to post message for user: kimb, mail is not enabled for this user
Nov 13 08:44:52 osxserver master[53]: process 20860 exited, signaled to death by 10
Nov 13 08:44:53 osxserver master[53]: process 20867 exited, signaled to death by 10
Nov 13 08:44:53 osxserver postfix/smtpd[24572]: warning: 216.60.1.241: hostname 216-60-1-241.ded.swbell.net verification failed: Host not found
Nov 13 08:44:53 osxserver master[53]: process 20881 exited, signaled to death by 10
Nov 13 08:44:53 osxserver master[53]: process 20876 exited, signaled to death by 10
Nov 13 08:44:54 osxserver master[53]: process 20853 exited, signaled to death by 10
Nov 13 08:44:56 osxserver lmtpunix[20412]: warning: unable to post message for user: kimb, mail is not enabled for this user
Nov 13 08:44:56 osxserver master[53]: process 20418 exited, signaled to death by 10
…
…
…

November 14, 2006 at 12:11 am #367578

dragonmac
Participant

File links you should see DAM SPAM filter won’t let me post the links the right way please look at these logs they are scary!!!
use “members” dot “aol” dot “com” as the FQDN
http://FQDN/dragonmacpc/systemlog0gz.txt
http://FQDN/dragonmacpc/systemlog.txt

November 22, 2006 at 1:29 am #367684

dragonmac
Participant

Well nothing in the Firewall logs showed anything bad. I have a minamal amout of info being tracked though.

Secure.log seem a bit wierd but it was a Sun to Mon time frame and there was no entry at all in the log for nov 13th.
here are the 2 entiers from the last sercue.log till i made it on site and rebooted and begain my trouble shooting the 14th 2:30pm.

Nov 12 22:47:21 osxserver com.apple.SecurityServer: Succeeded authorizing right system.burn by process /Applications/Retrospect 6.1/Retrospect for authorization created by /Applications/Retrospect 6.1/Retrospect.
Nov 14 02:33:53 localhost com.apple.SecurityServer: Entering service

Yeah the 85GB file I dumped, was in a rush to get all up and running so saving an 85 gb log file was not on my list. In the end I had to replace the cyrus DB from the backup at around 10pm Sunday and the service was turned off at 9:30AM , since the Pop3 and imapd would not let them get there mail. smtp was going fine it seemed. you can see the DB errors in the Log i posted at the end of it. all my repair tricks failed and I had to go to backup.

As far as SMB I always get entries like these when service is running.

[2006/11/13 04:03:04, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:write_socket_data(446)
write_socket_data: write failure. Error = Broken pipe
[2006/11/13 04:03:04, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:write_socket(471)
write_socket: Error writing 114 bytes to socket 6: ERRNO = Broken pipe
[2006/11/13 04:03:04, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:send_smb(663)
Error writing 114 bytes to client. -1. (Broken pipe)
[2006/11/13 04:03:04, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:write_socket_data(446)
write_socket_data: write failure. Error = Broken pipe
[2006/11/13 04:03:04, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:write_socket(471)
write_socket: Error writing 53 bytes to socket 6: ERRNO = Broken pipe
[2006/11/13 04:03:04, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:send_smb(663)
Error writing 53 bytes to client. -1. (Broken pipe)
[2006/11/13 04:03:04, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:write_socket_data(446)
write_socket_data: write failure. Error = Broken pipe
…
…
[2006/11/13 04:25:02, 0] pdb_ods.c:odssam_getsampwnam(2329)
odssam_getsampwnam: [0]get_sam_record_attributes dsRecTypeStandard:Users no account for ‘ADMINISTRATOR’!
[2006/11/13 04:25:02, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:write_socket_data(446)
write_socket_data: write failure. Error = Broken pipe
[2006/11/13 04:25:02, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:write_socket(471)
write_socket: Error writing 39 bytes to socket 23: ERRNO = Broken pipe
[2006/11/13 04:25:02, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:send_smb(663)
Error writing 39 bytes to client. -1. (Broken pipe)
[2006/11/13 04:25:02, 0] pdb_ods.c:odssam_getsampwnam(2329)
odssam_getsampwnam: [0]get_sam_record_attributes dsRecTypeStandard:Users no account for ‘ADMINISTRATOR’!
[2006/11/13 04:25:02, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:write_socket_data(446)
…
…
[2006/11/21 13:55:43, 0] pdb_ods.c:odssam_getsampwent(2295)
odssam_getsampwent: entriesAvailable Take 2(3) contextData(0x0)
[2006/11/21 13:55:43, 0] pdb_ods.c:odssam_getsampwent(2283)
odssam_getsampwent: entriesAvailable(3) contextData(0x0)
[2006/11/21 13:55:44, 0] pdb_ods.c:odssam_getsampwnam(2329)
odssam_getsampwnam: [0]get_sam_record_attributes dsRecTypeStandard:Users no account for ‘administrator’!
[2006/11/21 13:55:45, 0] pdb_ods.c:odssam_getsampwnam(2329)
odssam_getsampwnam: [0]get_sam_record_attributes dsRecTypeStandard:Users no account for ‘administrator’!
[2006/11/21 13:55:45, 0] pdb_ods.c:odssam_getsampwnam(2329)
odssam_getsampwnam: [0]get_sam_record_attributes dsRecTypeStandard:Users no account for ‘administrator’!
…
…
[2006/11/21 13:57:02, 0] pdb_ods.c:odssam_getsampwnam(2329)
odssam_getsampwnam: [0]get_sam_record_attributes dsRecTypeStandard:Users no account for ‘admin’!
[2006/11/21 13:57:03, 0] pdb_ods.c:odssam_getsampwnam(2329)
odssam_getsampwnam: [0]get_sam_record_attributes dsRecTypeStandard:Users no account for ‘admin’!
[2006/11/21 13:57:05, 0] pdb_ods.c:odssam_getsampwnam(2329)
odssam_getsampwnam: [0]get_sam_record_attributes dsRecTypeStandard:Users no account for ‘admin’!
[2006/11/21 13:57:06, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:get_peer_addr(1016)
getpeername failed. Error was Socket is not connected
[2006/11/21 13:57:06, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:write_socket_data(446)
write_socket_data: write failure. Error = Broken pipe
[2006/11/21 13:57:06, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:write_socket(471)
write_socket: Error writing 4 bytes to socket 23: ERRNO = Broken pipe
[2006/11/21 13:57:06, 0] /SourceCache/samba/samba-100.4/samba/source/lib/util_sock.c:send_smb(663)
Error writing 4 bytes to client. -1. (Broken pipe)
[2006/11/21 13:57:06, 0] pdb_ods.c:odssam_getsampwnam(2329)
odssam_getsampwnam: [0]get_sam_record_attributes dsRecTypeStandard:Users no account for ‘admin’!
[

heres some slapd.log entries too not sure if it’s anything. I’m no unix guru but gettin better every day 😉

Nov 10 19:31:39 osxserver slapd[57]: <= bdb_substring_candidates: (givenName) index_param failed (18)n
Nov 10 19:31:39 osxserver slapd[57]: <= bdb_substring_candidates: (mail) index_param failed (18)n
Nov 12 14:23:57 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 12 14:23:57 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 12 14:25:27 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 12 14:25:53 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 12 14:26:37 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 12 14:35:27 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 12 14:35:29 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 12 14:35:53 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 12 14:35:54 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 12 14:36:37 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 12 14:36:38 osxserver slapd[57]: <= bdb_equality_candidates: (apple-computers) index_param failed (18)n
Nov 14 02:33:53 localhost slapd[57]: @(#) $OpenLDAP: slapd 2.2.19 $n
Nov 14 02:33:53 localhost slapd[57]: bdb_back_initialize: Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)n
Nov 14 02:33:53 localhost slapd[57]: bdb_db_init: Initializing BDB databasen
Nov 14 02:33:54 localhost slapd[57]: slapd startingn
Nov 14 09:11:54 osxserver slapd[57]: <= bdb_substring_candidates: (givenName) index_param failed (18)n
Nov 14 09:11:54 osxserver slapd[57]: <= bdb_substring_candidates: (mail) index_param failed (18)n

Still no real clue as to what happened but untill i reconfigue my PIX506 Cisco firewall SMB is off and on only when need which is only a few time a week.

December 5, 2006 at 10:14 pm #367782

minibrain
Participant

I’ve been having the same problem with our new Mac Pro server running 10.4.8. We had the same problem with SAMBA creating a huge log file (135GB). After deleting the log file and restarting the server would eventually slow down to a crawl and kick everyone out of email and off AFP. Any help would be appreciated.

December 7, 2006 at 3:32 pm #367798

norsoft
Participant

I have the same problem. This has happened 4 times with shorter intervalls lately (one week).
/private/var/log/samba there is one file created log.smbd.old which takes up remaining disk space. The disk is 92 GByte and normally 72 GByte free space, so this odd file is 72 GByte. I just remove the file and restart the server and everything is OK.

Computer: MacMini PowerPC
OS: MacOS X Server 10.4.8 (Norwegian)
Running AFP server, DHCP, DNS, Print and Windows Services, but no ftp-server, web-server or mail-server.
Running Retrospect too

I guess something is broken in the OS and would not expect any intruder activity.

December 22, 2006 at 3:22 am #367886

dragonmac
Participant

Norsoft, you on a compleatly private network as to not expect any intruder activity?

Since the problem first appeared it has not happened again but I do not leave SMB running. I only need it a few times a week for some transfers of data from the PC’s to the Mac side. I only enable the service when needed and shut it down. All was fine after a reboot and removal of Log file and restore of cyrus. The LDAP was messed up but as I said in the previous post i had to go to backup which I had a shinny 1 week old Archive made with Sys Admin/OD/Archive 😀 (so much nicer then the backup scripts for LDAP in 10.3)
Since norsoft is PPC I guess this is not an Intel bin gone bad but I’m surprised then that I’m not seeing more of this with 10.4.8 admins. 😕
I was planing to write a auto restart for smaba or a job of some sort to just turn it on for the two, 1/2 days a week i need it.

December 22, 2006 at 8:11 am #367888

norsoft
Participant

This has happened a few times more since Dec 7. Disk Capasity is 93 GBytes and I have 73 GByte free space. Suddenly the smbd.log.old appear with all remaining disk space, so there is no more left.
I guess there is a fault or broken code somewhere. I need to run smb all the time, so I just remove the odd file. But it is not a good situation. I am too surpriced of no more reports on this one.

March 2, 2007 at 2:09 am #368437

mearling
Participant

i have no idea if this will be helpful at all, but here goes.

recently i was getting a huge ammount of error messages in both the samba logs and also the dhcp logs. The logs were being created at a rate of around 10 meg a second, i found that the problem wasn’t actually on the server, one of the windows clients i had connected to the server had been removed from the network for hardware repairs, but the network cable that was connected to it was left connected to the switch, a freindly cleaner thought that said cable had been disconnected from the switch and decided that the best way to fix it was to plug both ends into said switch.

this made the server go crazy thinking that the computer that was connected was still there, but was not able to actually get any reply from it, and because all of these resources and traffic was grinding to a halt the entire network, more and more clients were recieveing errors, dropping out and not able to send/recieve any data.

i have also heard of a similar thing happening when a network card has fried and created a feedback loop on itself.

as i said i have no idea as to wether this would help in this situation, but who knows.

March 2, 2007 at 12:14 pm #368440

mcnaugha
Participant

We are seeing this across multiple servers. All sorts of crazy stuff seems to be happening and I can’t tell if it’s OS bug, hardware overload (although not according to Activity Monitor), or perhaps Windows viruses repeatedly hitting their PDC, i.e. the Mac Server.

March 9, 2007 at 12:57 pm #368514

mcnaugha
Participant

I hope I’m not jumping the gun with our situation, but we have made some fairly simple fixes which may have stabilised our servers. I cannot guarantee this information will help you or that it is even safe to carry out. It appears to be working for us at the moment. Follow at your own risk!

We discovered two issues from examining the logs.

I believe that the SMB issue was serious and was definitely causing the problems we were seeing.

The LDAP issue is only serious if my theory is correct. Only time might tell on that one.

The first thing we found was entries in the log.smbd and log.nmbd logs referring to “.tdb” files. TDB files are Samba’s trivial databases. The format it uses to store operational data. These files are extremely important and need to be in good health. Filter your smbd and nmbd logs for “.tdb”. This led me to recall that there is a special Samba tool which verifies the integrity of these files. It’s called tdbbackup. As you can tell from it’s name it also backs up the files if wanted. To use tdbbackup you need to point it at the locations where your tdb files are. I am only aware of two important locations for these on Mac OS X Server. They are:

/var/samba/ and /var/db/samba

tdbbackup can be run even when your SMB service is up and running… which is great news for the impatient. I ran tdbbackup with the “-v” option over both of those locations. The command I needed to run was:

sudo tdbbackup -v /var/samba/*.tdb

and

sudo tdbbackup -v /var/db/samba/*.tdb

Normal output would look similar to this:

/var/samba/brlock.tdb : 0 records
/var/samba/connections.tdb : 0 records
/var/samba/gencache.tdb : 0 records
/var/samba/locking.tdb : 0 records
/var/samba/sessionid.tdb : 0 records
/var/samba/unexpected.tdb : 1 records

Note there is no mention of “restoring”. tdbbackup will automatically attempt to restore a backup if it find corruption.

Problem output contains something similar to this:

restoring /var/samba/share_info.tdb
/var/samba/share_info.tdb.bak: No such file or directory

You need to note all the tdb file names where tdbbackup tried to restore. These files are corrupted and should be thrown away. Don’t throw them away immediately. Move them to another location, e.g. a folder on your desktop, and then restart your server. Verify your SMB service continues to function as you expect. This is just incase your SMB service needed something within the tdb files you have removed. I’m not knowledgeable enough to know if these files are always respawnable or if they are actually created/built-up and needed. If your SMB service is working fine, including Domain logons and Domain joining if you’re working with the PDC role.

Re-run tdbbackup again. This time you should get the normal output without the “restoring” entries. From this point you should have a healthy SMB service… if you were suffering from corrupt tdb files. We found the tdb file corrupt was leading to the SMB service going “mad” and generating exponentially-sized log.smbd files. This led to the startup disk being completely filled; halting the server. It could also have been responsible for the server crashes we experienced without the startup disk filling up.

i also advise adding a “max log size” entry to your smb.conf. We’ve gone for 50MB at the moment, i.e. max log size = 51200. Probably should be even smaller. Any changes you make to smb.conf won’t take effect until you restart.

The LDAP slapd.log files were showing the following kind of error:

Feb 28 07:49:45 g4server slapd[46]: <= bdb_equality_candidates: (sambaSID) index_param failed (18)\n Feb 28 07:49:45 g4server slapd[46]: <= bdb_equality_candidates: (rid) index_param failed (18)\n Feb 28 10:57:39 g4server slapd[46]: <= bdb_substring_candidates: (apple-mcxflags) index_param failed (18)\n Masses and masses of them. The slapd.log files were getting bigger and bigger. We even saw one which was 180MB. Credit to Josh for pointing me in the right direction. It is my theory that it may indicate that the LDAP server is struggling to lookup these values fast enough and needs them to be indexed. This could lead to the slapd having fits... or at least that's what I think is happening when our heavy-usage servers suddenly start to act up - often leading to a complete hang. How do you get them indexed? Well you need to modify your slapd_macosxserver.conf file. This file contains the parameters which should be indexed for faster retrieval. This is the section to look at: # Indices to maintain index cn,sn,uid pres,eq,approx,sub index uidNumber,gidNumber eq index memberUid eq index apple-generateduid eq index ou eq index apple-group-realname eq index macAddress eq index apple-category eq index apple-networkview eq index apple-group-memberguid eq index apple-group-nestedgroup eq index objectClass eq The parameters needing indexed are contained within the brackets in the log entries. Also note the type of indexing candidate, e.g. the "apple-mcxflags" is a "substring" and "sambaSID" is a "equality". This is what we had to add to the slapd_macosxserver.conf file: index sambaSID eq index rid eq index apple-mcxflags sub This can be achieved using the following text editor: sudo nano /etc/openldap/slapd_macosxserver.conf Do not copy exactly what I've done. We found that different servers had different parameters needing indexed. You need to evalute your requirements from your slapd.log. Next you need to get the LDAP index updated. To do this, LDAP must be taken offline. All your users must be prepared for the server to go offline. Here's the Tiger commands: sudo launchctl unload /System/Library/LaunchDaemons/org.openldap.slapd.xml sudo slapindex sudo launchctl load /System/Library/LaunchDaemons/org.openldap.slapd.xml Here's the Panther commands: sudo SystemStarter stop LDAP sudo slapindex sudo SystemStarter start LDAP You should now find that the slapd.log entries relating to these parameters stop. If my theory is correct, the newly updated index just took a lot of pressure off of slapd and that should bring about greater stability. I welcome feedback here. I'm sure there are command optimisations that could be added. Also if you think this is wrong and dangerous let us know too. For the brave, please let us know if you think this helped.

June 19, 2008 at 4:31 pm #373191

aoihmc
Participant

I have been seeing similar messages in my slapd.log file:
slapd[28607]: <= bdb_substring_candidates: (authAuthority) index_param failed (18) slapd[28607]: <= bdb_equality_candidates: (kerio-Mail-Address) index_param failed (18) I tried updating the slapd_macosxserver.conf by adding the following: index kerio-Mail-Address eq index authAuthority sub at the end of the index statements and before the timeout. I then stopped LDAP and ran the slapindex and started LDAP again, but I'm stilling getting the same index_param failed errors. Any ideas?
Author

Posts