AFP548 Site News April 30, 2020 at 8:51 am

Code Wranglers Speak In Tongues (Generally)

It is commonly said that python tends to be at least the second-best tool to reach for when approaching a task, whereas some of us get by just fine with shell. The lack of a compilation step for scripting languages tends to help us iterate and ship good-enough solutions quickly, and I’ll be touching on specific style points for both later. For how sysadmins commonly interact with either language, for glue code or tasks to automate, some guidelines I tend to focus on are equally applicable across both.

To even be able to look at the style used (instead of extracting the code from whatever prod system it’s running on…) we need to get it in a more share-able format at least. Although we’ve already touched on git, you don’t actually need to know proper care and feeding of a repo to let others see your code at whatever state it’s in; sites like Pastebin have been around forever, GitLab has a snippets concept, and GitHub gists can even accept multiple files in a single page. GitHub then tracks versions for you mimicking a simplified repo. Some of these sites accept anonymous posts, and some links can be shared ‘privately’ (for anyone with the link, not even limited to logged-in users with an account) without being discoverable/indexed by search engines – handy when wanting to share (ahem hopefully sanitized) log files or debug output, as a more general-purpose usage example.

It’s all downhill from here

Now before I go further, don’t get me wrong – my pet peeves are not important, and neither is my opinion. (Until it is, intuition is a real thing dedicated practitioners might develop over time. If at all, ever.) Don’t take it personally if you blissfully do the things I’m warning against, or ignore the things I’m advocating for in your code, even though I obviously consider these important enough to write (at length!) about. Maybe I’m just ‘owning the brand’ of being a snob/elitist, but as with anything in life, you either complete tasks where you only care about the goal/’it appears to do the thing’, or you show care about the process and let it influence how you operate (on code). All of this is for as much as you both find relevant, personally, and like, remember/take the time/effort to do. Real scientists publish ship.

But, except for XML, use spaces, not tabs. Or be wrong, up to you. :sarcmark:

Trust no one

Taking it from the top, she bangs! #!'s! Pay attention to how you’re finding an interpreter in whatever execution environment it is that the code is supposed to/ is intended to be running in, especially if you’re pointing to a path that’s actually a symlink. (Like /bin/sh, which actually points to… /bin/bash…) With python 3, using an env shebang is definitely a risky click if not exactly YOLOMODE. (Unless you’re in a venv, natch.) It’s more reliable to be as explicit/exact as you can, at least feel free to explain what e.g. version of python and what 3rd party modules you’re expecting (if any) in comments or associated docs. Keep in mind when contributing to other projects that they have probably considered this fundamental piece carefully; sometimes you may actually not want to specify an interpreter, for example in ‘library’-like modules loaded by other code. Reasonable people can disagree, but it’s house/maintainer rules. Or you can go fork yourself. (That might sound aggressive, I mean it that you’re more than welcome to show the world how it’s done in your own repo – I’m happy as long as others can see how you’re solving a problem at all, if you’re kind enough to share.)

Continuing with not trusting PATHs, anytime you’re essentially shelling out or leveraging a binary, when I’m the reviewer, I’m going to expect you to use the full path. (Again, including symlinks, if you weren’t already aware, var, tmp, and etc are symlinks to inside root /private in macOS). This is definitely a style point, but instead of assuming the contents of a defined path is what you expect, (or worse, assume that e.g. launchd or the agent running this code knows what paths/lookup order has the version you need) doing this is (again) potentially better/more defensive by being explicit/exact. With Apple SIP-protecting some defined paths you may find it less important, and that’s ok, but they’re also requiring us to ship our own interpreters more and more. (Targeting a SIP-protected path in your shebang is probably good practice for as long as those paths can be relied on :sweat_smile:)

I usually check for the complete path with which <binaryname>, but also consider the output of type <binaryname> – if it’s not a shell builtin, it may not be portable to *nix, on the off chance that’s important for you to know/handle. And don’t take it for granted that Apple won’t move stuff around or delete it on a whim/in a point release. (Yes it was a major OS release, but we all recall how telnet got pulled.)

If you’re talking to an API or expecting specific schema for a data structure your code parses, consider putting your assumption of that criteria in comments/docstrings, a sanitized example is even more straightforward. That leaves a paper trail/gives context for why you operated on it in a specific way (and what was valid input to your program at some point in time), with the bonus feature that people can take that mockup and replicate your experience/confirm your assumptions (even without access to the same system). Same for links to API docs or other references that were used to understand the system being interacted with. It’s ideal if someone other than just the creator can follow the references and see the ‘shadow’ inputs operated on by the design, this really shortens ramping-up time to reason about whether the decisions made were the best way to approach the problem.

Easiest computer to throw out a window: yours (ideally into the ocean)

You don’t run random code from the internet without at least attempting to confirm it does what you need, I hope/presume? The same goes with boilerplate/scaffolding/auto-generated stuff that bootstrapping tools may ‘crap up’ your repo with. Some of it can certainly be helpful when you’re getting started, but it’s all code you’ll end up owning, and you want to strike a good balance of ‘I think I have to include at least this amount of bloat to ship’ with ‘now that I understand enough of the moving parts, let’s strip this back to focus on what we actually need/use’. Convention is a good thing to follow when collaborating with a larger community that imposes boilerplate on us or wants us to grapple with what they consider best practices, but otherwise let’s leave it as close to brass tacks as possible so we don’t have trouble remembering where we actually started making the code do things. Unused functions that are mostly stubbed out or exercise nothing should be purged until activated. Ideally, everything we commit to git should have a reason to exist that we can explain.
To mix all the metaphors, code both rots and is flammable, don’t have too much of it in one place. Teach a person to code, they’re on fire for a lifetime, amirite?

As part of these style nitpicks, this may not be as valuable/important to you/your team, but consider removing extra/extraneous comments, verbose manual logging, or debug echo/print() statements (unless gated by arguments/a flag and/or logging level) – ideally debug would stick around in something like integration tests that help you validate assumptions/confirm serviceability, but elsewhere in your code it could leak secrets or otherwise add unnecessary overhead/bloat to sift through when put into service/deployed to ‘prod’.

Can’t believe I have to say this

I think besides being overtly… ‘particular’/borderline-cargo-culty in points above, I’ve been a benevolent rubocop dictator. (All the metaphors!) Time for bad cop.
I don’t know who needs to hear this (jk, you unconscionable no-good-niks, you know who you are)…

Log.

It’s usually not hard and will save your butt later, it’s certainly cheaper and more practical than a time machine or ‘god mode’. Put whatever output you generate somewhere discoverable, e.g. in /private/var/log for system-wide or ~/Library/Logs, and check out the python and bash follow-up post for examples. If you then have those logs shipped/aggregated, know what value it will actually provide to put stuff at different ‘levels’ – ‘info’, ‘warning’, ‘error’, ‘debug’, etc., are those labels too numerous and too open to interpretation for your team? Maybe just assume only qualified individuals even look at/read this stuff anyway, so shove anything helpful in there and let your log ingest system discard whatever you don’t need, maybe? Consider writing json blobs if appropriate (or some other parseable format, just throw us a bone with UTC timestamps and delimiters or something, especially if you want to quickly turn the corner from shipped data to metrics). Got state you want to track locally? Consider shoving stuff into a database format, primarily if you are generating data that would benefit from ledger-style operations. Just leave a record somewhere, optimized for retrieval!
For more ‘vanilla’ logs, rotate and purge using system-native facilities (logrotate for Linux, apple system logging conf’s on Mac, etc. Don’t be like vanilla Docker.) Or don’t purge! If you don’t think it’ll be more than a couple hundred MBs in your average computer lifespan, maybe who cares, maybe disk is cheap and plentiful enough and this historical data local is helpful to have? Hey, if you break it you get to keep the pieces. (Or Apple will just helpfully dump it for you on OS point upgrades, surprise!)

Continuing, ffs, go out of your way to trust (and confirm your code validates) TLS certs when transiting the series of tubes. It’s worth it not just because it encrypts bits on the move, it also assists in validating the host you’re talking to. Another side effect is, when it’s wrong and due to our inspection it gets fixed, we’re being part of the solution. Calling protecting people’s privacy a side effect is almost forgetting the oath we all took as sysadmins.

Are you pulling random dookie over the network to then pass to processes running as root? Checksum it. It’s literally the least you could do. Don’t let you and me get joined at the MacMule. Otherwise you may be able to join that startup working on Privesc-as-a-Service.

And finally, to wrap up Things You Should Already Know: don’t put api keys, passwords, or secrets-in-general in plaintext or trivially-reversible formats. Options are available, but for crissakes obfuscation shouldn’t be one of them – truffle hogs look for that stuff, you’re not being clever, herr hax0r. Private repo’s are not an excuse, what happens in obscurity ends up dehydrated on a casino rooftop in Vegas. Environment variables? Decoupling a credentials file? Those are (arguably) Less Turrible, but Still Not Greatâ„¢, but again I’m not in security and can’t tell what your threat model is (although hopefully it’s not Mossad). Is SAML hard? Yes. Should we all find ways we should integrate with it when appropriate? It’s often the only game in town, take yer lumps. Using libraries written/evaluated by crypto pro’s? Yes pls. I’d add that PKI (e.g. client certs signed by a trusted CA) is another potential avenue to handle authentication/identification which you can therefore leverage as an ingredient when at least authorizing access, but don’t listen to me! I’m a rando on the interwebs. (They says, ~1800 words later.)

Tune in for our next installment where we get into the actual programming language slingin’ specifics!
Got other general coding ‘best practices’ thoughts? Use your tweet horn to holler at me!

Allister Banks

Allister lives in Japan, has not read the Slack scroll back, and therefore has no idea what is going on.

More Posts - Website

Follow Me:
Twitter

Tags:

Leave a reply

You must be logged in to post a comment.