This article, the first of an intro to HPC, could also be called, “Centralized Scripting for Workstation Management”, so don’t let the scary HPC title put you off! In reality we can all learn a bit from how an HPC (High Performance Computing) environment is managed to better streamline even the smallest computer setup – after all, more streamlined means more spare time to spend out of the office!
Read on for more…There’s quite a few occurrences as a Systems Administrator where you’d want to make changes to your users’ systems, and being able to do these in the background with the least amount of effort put out on your part is what we should all be striving to do.
While there are a lot of things you can do with Apple Remote Desktop’s “Send UNIX command”, it’s quite nice to keep a centralized collection of frequently used scripts and binaries so that all your users and other administrators can easily access them, and it also makes versioning so much easier! One of the most vital tools that an HPC administrator, or an admin with a large amount of machines to think about it a parallel remote shell utility, for this example I’m going to work with pdsh. Pdsh uses a “sliding window” parallel algorithm to conserve socket resources on the initiating node and to allow progress to continue while timeouts occur on some connections, which for us as administrators means that you can keep one file of all your machines, and even if you have a downed machine, or a laptop that’s off line, your entire script won’t wait around for that machine to come back, or fail altogether.
So, first things first – plan somewhere for all your centralized information this to live. It should be somewhere that your users have mounted automatically – a dynamically automounted NFS volume is perfect – there’s several discussions out there on setting up dynamic NFS mounts on OS X, I personally use Netinfo at the moment, but others use the fstab file with success also. I like to link this directory in /usr also as it is where most people are going to look for binaries. So, for this example I’m going to refer to /usr/central for the location of our centralized scripts where /usr/central@ -> /Network/Servers/nfs_server/centralized_scripts
<code> ln -s /Network/Servers/nfs_server/centralized_scripts /usr/central </code>
This makes for an excellent place to keep your facility’s frequently used items and any home-grown scripts, and binaries that your faility use that may require updating (ImageMagick and qt_tools are just a couple examples of these binaries that would be useful to a media based environment, remember HPC isn’t always for the Universities and Bio-research!) – it’s certainly a lot easier to update in one location rather than across every machine in your environment.
Let’s set up your directory structure on your server. While you could just keep everything at the top level of your centralized scripts, this isn’t particularly neat, or easily navigable once you’ve collected all your favorite CLI tools here. Within centralized_scripts create directories called “bin”, “man” and “sysadmin” – of course you can add more for your uses, but we’ll stick with these for the moment. The “bin” directory is going to hold your users binaries – this is where ImageMagick, qt_tools and pdsh can live, “man” is going to hold the manuals for everything that’s in your centralized scripts directory, and “sysadmin”, perhaps with stricter permissions, is going to be where your administrators can keep their bags of tricks.
A note here about versioning – frequently binaries will keep the same name when they’re versioned up, but something that may be worth thinking about – how about renaming the actual binary to binary_v100 and then linking binary to binary_v100 – this will allow you to put in binary_v200, test it out, and when you’re ready change the link, all while preserving binary_v100 in place to go back to if your testing on binary_v200 wasn’t thorough enough. It also gives your end users less to think about with only having to remember the name of one tool, and not what version they need to use also.
So, here’s where we get into pdsh. Download the most recent version, (pdsh-2.5-1.tar.bz2 at the time of writing) from the pdsh ftp server, and expand the file. We’re going to have to compile this software, and at the same time we’re going to install it into our network location so that any machine with the network mounts can access it. If you’ve never compiled software before, you are going to need to install Apple’s Xcode Tools (The developer CD that came with your OS, or available for download here.) on your machine.
Compiling software is nowhere near as scary as you might think it to be if you’ve never done this yourself before, take a deep breath, and plow forward! First, we’re going to configure the source – we’re going to do this specifically for a Power PC based system, so make sure you’re on a Mac while doing this – there are ways to configure for PPC without being on a Mac, but for simplicity, grab your favourite Mac for now! I downloded pdsh into my home directory; change the path to where you put your source:
<code> cd ~/pdsh-2.5-1 ./configure --without-rsh --with-ssh --with-machines --prefix=/usr/central </code>
Running configure' takes awhile. While running, it prints some messages telling which features it is checking for. Sit back and watch it all scroll by, or make a cup of tea and come back in a couple minutes - faster if you have a flashy new G5... The extra variables we've put on the end there are modules that we want and don't want. By default, if we'd just typed in ./configure with no variables, it would install with rsh and that's it - while rsh is significantly faster, rsh is not easily enabled on a Mac (by easy, I mean a check-box in the GUI!). So, we're specifying that we don't need rsh, we do need ssh, and the --with-machines is so that we can feed the binary a flat file of machine names when we want to run, rather than having to type in our machine names all the time. If you're interested in other modules, take a look at the README files included in the pdsh source. We've also spcified an installation prefix other than the standard
/usr/local’ by giving configure' the option
–prefix=/usr/central’, if you’ve chosen a location other than /usr/central, then you’ll need to adjust this.
Once the configure has finished, we’re onto compiling the package:
<code> make </code>
Yes, that’s it… just make… I would say, there is no step three here, but there is… compiling software came before the age of the iMac, so step three, installing the newly compiled software into its final destination:
<code> make install </code>
Now this is all very well and good, you can certainly run pdsh on your machines now, but every time you want to run it you’re going to have to type out the following:
<code> /usr/central/bin/pdsh -w machines 'command' </code>
Pretty longwinded…. time to add our new centralized scripts to our users’ paths. As 10.3 and up has used bash as the default shell, I’m only going to show setting up the bash environment here. You can do this per user by creating/modifying the ~/.bash_profile or system-wide by modifying /etc/profile. By default only /etc/profile exists, and looks like this:
<code> # System-wide .profile for sh(1) PATH="/bin:/sbin:/usr/bin:/usr/sbin" export PATH [ -r /etc/bashrc ] && source /etc/bashrc </code>
We’re concerned with the PATH= line. This shows the path that the bash shell will search for binaries in. It’s up to you if you do this system-wide, or per user – consider giving your sysadmin folder group permissions for your sys admins only. As always, keep a backup copy of the original file before editing. For simplicity, I’m going to show putting both /usr/central/bin and /usr/central/sysadmin in /etc/profile:
<code> # System-wide .profile for sh(1) PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/central/bin:/usr/central/sysadmin" export PATH [ -r /etc/bashrc ] && source /etc/bashrc </code>
Now that you’ve made this change, you can check and see if you’re getting your new path by opening a new shell and typing in the following:
<code> echo $PATH </code>
it should return the following:
<code> bash: /bin:/sbin:/usr/bin:/usr/sbin:/usr/central/bin:/usr/central/sysadmin </code>
So great, now your central scripts are in your default path which will allow you to just type in pdsh without preceding it with the location of the script, but what if you want to know something about them – off to put our centralized man pages into /etc/manpath.config. Again, make a backup of this file before you change it. We’re going to add the following line to this file:
<code> OPTIONAL_MANPATH /usr/central/man </code>
Great – open a new shell and try it out:
<code> man pdsh </code>
You can also do an
<code>echo $MANPATH</code>
but viewing a man page is so much more satisfying!
That’s it, you’ve set up a centralized area for your script library, compiled software, and got it working on your machine, and the fabulous part about doing this example with pdsh, is now you can write your own script utilizing pdsh to set up your NFS mounts on all your machines, and then copy and replace their profile and manpath with your centralized locations for ease of running your own scripts across all your machines in the future.
Hey, there has to be some content left for part 2! For the meantime, check out
the "README" file and the "README.modules" in the pdsh source, and play with
it yourself – you can always use a non-destructive command, like hostname for
instance, to check out how pdsh is working for you. There’s also some good
tidbits in the man page of pdsh to help you forward.
What this error is essentially telling you is that you haven’t got your password-
less ssh keys set up. See Josh’s article on the Keys to the Door
of the SSH Tunnel and make sure you use a blank pass phrase when
setting this up.
Checking the modules README, have you got the requirements, and avoided
the conflicts?
Conflicts: misc/nodeattr, misc/machines
Requires: libgenders
I believe it’s saying your path /usr/local/lib/pdsh is insecure – check your
permissions, and see if you’ve relaxed any permissions in that path that it
may be complaining about.
Read the posts and it’s explained. It typically means that it doesn’t like the
mode or ownership of the directory.
—
Breaking my server to save yours.
Josh Wisenbaker
http://www.afp548.com