AFP548 – Stop Remediating While you Audit

Let’s talk about orchestration. This term is different than just applying the normal set of configuration profiles you want near-permanently enforced on the workstations under your management. Too much theory isn’t necessarily helpful, but sometimes I come across something that feels right, and then experience validates it as a real, valuable practice to follow. My thesis: you should not affect a change through orchestration on the same pass as you check if the change is necessary. Taking a step back, orchestration as a concept, in my own definition, means you’re leading the customers under your care through a change. Better practices of sysadminery include not having too rigid or monolithic a structure when piecing your workflow together. These concepts combine when I see folks talk about (potentially continuously) affecting changes without first, separately, discovering/evaluating/assessing what needs to change, or the impact it will have to make it.

: Some men just want to watch the world burn

Configuration management is another concept. I don’t want to go off on a tangent spouting derivitive fluff, but the separate moving parts of ‘checking-adding-removing’ in configuration management tools add to the debug-ability of a management process. Testing what the current state is, and having the actions performed separately lets you benchmark each part individually, as well.

So to bring this back to the thesis – if you’re running a process to enable ARD for your admin user(s) every 15 minutes, you’re wasting resources, and potentially causing instability if that happens to be the moment someone’s trying to connect over screensharing. (Same for blindly running Apple updates on a continuous basis without a QA and approval process, like the concepts of branches in Reposado and catalogs in Munki, but that’s just common sense.) Even for other recent security audit-type tasks, if you’re checking for the exact ntp or bash version to patch, or a step further simulating the benign examples of exploits available for the software, you shouldn’t go ahead and fix it in the same session.

(Although cool, doing exploit checks by exercising code paths on anything but a small cross-section of your customers would be processor-intensive and could possibly widen an attack surface, so discretion is advised there as well. It’s certainly an admirable practice when a fix is still not officially pushed from the vendor, and/or if the number of different exploits may still be unknown, to help gauge the effectiveness of a patch as it rolls out.)

If you’re using Casper, cascading between smart groups makes it easy to have a modular process, which comes in handy in case either the detection or the remediation/orchestration steps require tweaking. ‘Real’ configuration management tools have the modular parts built in to their processes (the mechanism of which is called ‘providers’ and the things to operate on are classified as ‘types’ in Puppet). There are folks who think it’s fine to enforce a state on every boot, and that’s more understandable, especially when it’s tied to the management system so it can be logged and reported on. That’s commonly for the more permanent (and usually well-tested and robust) parts of ‘enforcing’ a setup, not my definition of orchestration, so it’s easier to accept that remediation will occur on every boot.

But at that critical point when the security department calls or you need to lead your fleet through a change in services they rely on, please think about adding reliability and robustness to the process by building it modularly. Sometimes even your checks may have a cost in resources and will need to run on a randomized trigger to avoid the ‘stampeding herd’ effect on your services – if an orchestration task kills your network before it can apply a fix… well, now you have two problems. And seriously, please, where applicable, stop changing stuff without first, separately, checking its state. It’s good and it’s good for you. Agreed?

Allister Banks

Allister lives in Japan, has not read the Slack scroll back, and therefore has no idea what is going on.

Stop Remediating While you Audit

Allister Banks

Related Posts

Leave a reply

Recent Forum Topics

Recent Comments