The (Web) Skills of Leonardo da Vinci

For the craic: if Leonardo da Vinci were a software developer sending his cover letter to that hot new startup. Adapted from Letters of Note: The Skills of Leonardo da Vinci.

My Most Illustrious Lord,

Having now sufficiently seen and considered the achievements of all those who count themselves masters and artificers of software, and having noted that the invention and performance of the said software is in no way different from that in common usage, I shall endeavour, while intending no discredit to anyone else, to make myself understood to Your Excellency for the purpose of unfolding to you my secrets, and thereafter offering them at your complete disposal, and when the time is right bringing into effective operation all those things which are in part briefly listed below:

1. I have plans for very light, strong and easily portable libraries with which to supplement and, on some occasions, replace existing software, and others, sturdy and indestructible either by hackers or at scale, easy and convenient to extend and maintain. Also means of exploiting and overloading those of competitors.

2. I know how, in the course of an ongoing software project, to remove features from the roadmap and how to make an infinite number of patches, duct tape fixes and scalable solutions and other changes necessary to such an enterprise.

3. Also, if one cannot, when engaged in a project, proceed by agile methodologies either because of the nature of the project or the recalcitrance of the developers, I have methods for destroying every protest or other objection unless it has been founded upon a genuine concern or so forth.

4. I have also types of shell scripts, most convenient and easily portable, with which to curl small websites almost like a hail-storm; and the traffic from the scripts will instill a great fear in competitors on account of the grave damage and confusion.

5. Also, I have means of arriving at a deadlines through ugly hacks and spaghetti code constructed completely without documentation, even if it should be necessary to commit directly to master.

6. Also, I will make root kits, safe and unassailable, which will penetrate competitors and their firewalls, and there is no host of neckbeards so great that they would not break through it. And behind these the script kiddies will be able to follow, quite uninjured and unimpeded.

7. Also, should the need arise, I will make domain specific languages of very beautiful and functional design that are quite out of the ordinary.

8. Where the use of DSLs is impracticable, I will assemble modular, documented, dependency-free command line tools and other instruments of wonderful efficiency not in general use. In short, as the variety of circumstances dictate, I will make an infinite number of items for attack and defense.

9. And should a rewrite be occasioned, I have examples of many arguments which are highly suitable either in attack or defense, and code which will resist the fire of all the heaviest critics and company lifers.

10. In times of Java development I believe I can give as complete satisfaction as any other in the field of architecture, and the construction of both public and private methods, and in conducting business logic from one place to another.

Also I can execute scripts in Python, Perl and Ruby. Likewise in web design, I can do everything possible as well as any other, whosoever he may be.

Moreover, work could be undertaken on horse_ebooks which will be to the immortal glory and eternal honour of the auspicious memory of His Lordship your father, and of the illustrious house of Internet.

And if any of the above-mentioned things seem impossible or impracticable to anyone, I am most readily disposed to demonstrate them in your local coffee house or in whatsoever place shall please Your Excellency, to whom I commend myself with all possible humility.

Posted in Miscellaneous | Leave a comment

Pontoon – A Digital Ocean CLI (and library) in Python

Digital Ocean has been an excellent way for me to spin up test Ubuntu VMs for testing my expanding collection of PPAs, build scripts, and various other bits and pieces.

While I do stack testing with kitchen, chef, vagrant, and frequently AWS, it is really nifty to be able to boot a fresh VM on the cheap with Digital Ocean, test, nuke, and repeat.
The 512MB Droplet boots more quickly and feels much nippier than its EC2 equivalent (micro), and can actually be a bit more convenient than a local VM because it doesn’t take up a CPU core or local disk space.

I wasn’t really happy with the available options for CLI tools; not featureful enough or require too much mental juggling of implementation details (like ID numbers), so I’ve written Pontoon.

One of the primary goals of Pontoon was that it be written for human consumption first. To me, this means safe defaults, intuitive usage and the option to do more complex things when required.

As a result, Pontoon is also a library, so more complex tasks can be hacked right into the tool if desired, or combined with others. The README is the primary source of documentation, though there is additional API documentation available.

Some quick examples of usage:

Creating a new Droplet:

$ pontoon droplet create foobar
Creating Droplet foobar (512MB using Ubuntu 12.04 x64 in Amsterdam 1)...

Listing available Droplets:

$ pontoon droplet list
foobar:         (512MB, Ubuntu 12.04 x64, Amsterdam 1,, active)

SSHing into a Droplet:

$ pontoon droplet ssh foobar
Welcome to Ubuntu 12.04 LTS (GNU/Linux 3.2.0-23-virtual x86_64)
 * Documentation:
Last login: Fri May  3 18:23:56 2013

This is also the first time it’s being published for use by anyone other than myself, so, while I’ve gone to some length to ensure it is well tested, it is most certainly in-development.

Posted in Code | Comments closed

Pivoting yourself

In the startup world, it’s common to talk about “pivoting” a business; that is to say, taking what you are doing and refocusing on some aspect of the opportunity you hadn’t spent a lot of time on before. We aren’t all founders, though.

Today, coders have a lot of options when it comes to what they work on, but the golden ticket that companies like Github, Facebook, etc, offer, is the opportunity to work on something you use yourself. Working on your own tools, scratching your own itch, is the magic cheat sauce for motivation – it’s what compelled me to work long hours on and not mind being on call 24/7. It’s what compels open source authors to do what they do, and drags lots of hopefuls into the meatgrinder of the game development industry.

The obvious problem, of course, is that there aren’t a lot of these jobs to go around; certainly not well paid, or without crazy hours. Doing what you love might require starting a company. I think this is a huge missed opportunity for a lot of developers – they create their own career “filter bubble,” excluding themselves from potentially valuable opportunities, stuff they might love if they gave it half a chance.

I recently received a lovely and unexpected compliment at a friend’s barbecue. I was chatting with him while he cooked, and went to put my mug down on the side table of the barbecue; I thought, “this is an awkward place to put this mug if he needs to stick a plate there or something,” so quickly moved it off elsewhere. This wouldn’t have been a notable interaction other than that he then observed, “you know this is why you’re great – you’re considerate. Other people wouldn’t have given that a second thought; you realized I’d need to use that space.”

For me, this sort of crystalized how I approach software development. I’m deeply reliant on the concept of putting myself in another person’s position. It’s not exactly an engineering approach, which I would see as more detached, falsifiable, professional; something I constantly strive for and only ever seem to have moderate success with.

I have to believe in what I’m doing. This can be viewed as both a strength and a weakness; a strength because it empowers me to reason about the software I’m building in a very natural way, a weakness because it means that when I can’t find that internal motivation, I’m like a ship without a compass; I become lethargic, or worse, feckless. Building a product you don’t understand is fine if someone else understands it and can communicate that effectively, but it’s crazy if nobody understands.

Putting myself in someone else’s position is not always as simple as being a conscientious barbecue guest. Sometimes it requires a lot of hard work, a lot of talking, a lot of banging my head against the wall. I think that’s how it works for most people; if you don’t have the problem yourself, you have trouble grasping the requirements of the solution. In software, this can mean the developer forcing their incomplete interpretation of the problem on to the solution, or just not solving the problem at all, exasperating everyone involved.

If you can acquire the patience and empathy to understand other people’s problems, you unlock the master key for your career as a software developer. Most of the software development community’s attention is clustered around this 1-5% of jobs that happen to align with our collective interests, but there’s a whole world of opportunity dying for attention in the other 95%.

Make other people’s problems your problems.

Posted in Programming | Comments closed

Musical floppy drive maintenance –

I was going over journal notes from last year, and spotted my scribbles for working out MIDI notes from corresponding standard musical notation (C1, C#, etc). Maybe my Google-fu was weak, but I couldn’t find anything that described, in algorithmic terms, how to convert between these two systems at the time.

Beyond some brief flirtations with learning guitar in my teens, I’ve never had much of a musical education, so it’s likely that this is just old news to folks who do. In any case, it was a bit of fun to work out.

I was using it to help “debug” the drives I used for my musical floppy drives project last year. It’s obvious when you think about it, but due to each drive’s history and construction each produces a different range and timbre. This means when you’re making tunes, you get drives which work better as sort of “baritones” or “sopranos”, if you catch my drift.

I’d hook up the drives before playing anything, and run through each note in the desired range, and adjust the drives accordingly. Drives can start to wear down a bit when you’re using them for a purpose they were never intended for, so their ranges change; again, useful to test beforehand.

Another angle on this is that MIDI is a really weird old format, with various implementations of the spec at different times, and created for different target devices (an array of floppy drives was not an early contender, unsurprisingly).

I thought it would be fun to write code to create some of the songs I wanted to reproduce. Since “output to floppy drive” is a definite reduction in fidelity, I thought it wouldn’t be too much hassle, and would save me basically learning how to play piano. Someone else clearly thought this would be a good idea too, and created MIDIUtil to aid in doing so.

Reproducing songs turned out to be somewhat tricky (to put it mildly). I did, however, use it to help produce the tests for my drives. I wanted to start hooking up microphones and run an automated test and callibration suite on the drives, but at this point it was becoming a bit of an obsession and I decided to take my foot off the gas for a while.

I never ended up doing all I wanted to with the code, but I’ve now open sourced the few bits and pieces I did write. Of particular interest, to me anyway, is the implementation of the algorithm for converting from standard notation to pitches. Might be useful to someone else!

Posted in Code | Comments closed

ArchiveTeam Yahoo Messages Followup: Success

The response to my post last week was amazing. As I write this, there are nine items left processing out of what ended up being more than 200,000 – a mixture of groups of threads and forum pages.

For some perspective, when I posted that there were only eight days left, about 5,000 items had been processed in the five preceding days. Within 24 hours, more than 160,000 items had been processed, as various individuals and companies (including AirBnB engineering) spun up hundred of instances to help break the task.


I had dropped to sleep, exhausted, after spending all Friday night mucking around with the portion of the code responsible for the Yahoo Messages job, building the AMIs, writing a blog post and submitting it to Hacker News.

By the time I woke up, six hours later, it was on the front page and people were getting involved. I spent the rest of the day helping people get up and running along with others in the IRC channel, eventually making the AMI available in other regions (using the recently released AMI copy functionality).

I also ended up writing a tool to help people trim down the many spot instances they had launched (several hundred each), slayer (using boto and ansible). There turned out to be a long tail of very large threads, no more than a thousand, that have taken the last five days to slowly work down.

Parts of the archive are already being pushed to (though not in an immediately browsable format); you can see updates here.

I hadn’t been involved with the Archive Team before this, just a long time appreciator of the work they’ve done – it was supremely gratifying to be able to give back, and there are plans for spot instances to play a much bigger role in future Archive Team endeavours.

Thanks to all who got involved or spread the word, and major thanks to Jason Scott and the rest of the Archive Team who have been tirelessly saving our history from before we knew that was what we were losing.

If you’d like to get involved, check out, the excellent Tracker, and join us in #ArchiveTeam on EFnet.

Posted in Code | Comments closed

Shell blindness

I wonder how many programmers realize just how weird shell scripting is compared to pretty much every other programming activity one might engage in. I hadn’t really thought on it too deeply before, but I started digging into shell more deeply several months ago. Previously, I had been reasoning about it as just sort of an archaic programming language, and that usually got me where I needed to go, though usually with some frustration along the way.


The thing is, shell scripting really only bears a passing resemblance to general purpose programming languages. It has control flow, variables and arrays, but that’s about it. You can’t compile it. It invokes system programs by design – anything longer than a few lines is probably non-deterministic. Any individual executed command’s output may change drastically at any moment (through updates, redirections, etc). As a result of that, you can’t reasonably model its behaviour; you can only test it. Like tapping commands into the prompt directly, it is interpreted line by line. The OS will change, bits and pieces will move around – the environment changes piecemeal.

Different shells have different, completely (or worse, subtly) incompatible syntax. There’s a standard that remains unimplemented (unimplementable? undesirable?) despite valiant efforts, and a smattering of documents and tools in the OS building community to help navigate this minefield.

Despite this, shell scripts are the glue of modern *nix operating systems, deeply embedded in bootstrapping and service management. The POSIX specification (and man pages of every shell) actually spells it out quite clearly: the shell is a command language interpreter.

It’s a little embarrassing to admit that, despite considering myself 50/50 on the developer/sysadmin front, I’ve spent most of my career only using enough shell to bootstrap myself into a higher level language. Sometimes this has certainly been necessary, but I suspect a percentage of it would have been better (or just as well) served by pure shell. I guess it’s fair to call it a sort of schlep blindness – shell blindness.

“The most dangerous thing about our dislike of schleps is that much of it is unconscious. Your unconscious won’t even let you see ideas that involve painful schleps. That’s schlep blindness.”

Wikipedia actually posits something about shell scripting which is interesting: “Shell scripts often serve as an initial stage in software development.” To people used to building applications with frameworks like rails, this sounds strange. In fact, I have encountered relatively few developers who either really understand, or enjoy working in, shell.

It’s interesting to think that there are probably a swathe of programs one could write in carefully crafted shell that would work, without dependencies, across a wide variety of unix-style operating systems, and be compatible with systems stretching decades into the past.

To that end, I wrote shlint (shell lint) to try and help me get a better grasp on how to write portable shell.

Github statistics – while not representative of the entire software development world, are certainly a reasonable representation of developers I’ve worked with – suggest that shell is actually the fifth most used language, 8% of code hosted on Github being shell. That is not inconsiderable – It’d be interesting to know how much of it is duplicated shell boilerplate/libraries of one sort or another.

Ryan Tomayko provides an excellent introduction to the world of shell programming in this talk. Definitely worth watching if you’re not someone who regularly writes shell.

Posted in Code | Comments closed

ArchiveTeam + Yahoo Messages Shuttering + EC2 Spot Instances = MegaCrawl

Update (24th March) Stop your engines! The response has been amazing, in 24 hours we’ve managed to crawl almost everything, and are processing the final few batches now! No need for more instances!

I’ve created a tool to help people who decided to fire up a whole bunch of spot instances slowly trim down their cluster as the workload winds down. Check out Slayer.

See the follow-up post.

Original post:

You may have missed the news that Yahoo is shuttering its old message boards, taking a huge amount of Internet history with it. Old news, maybe, but Internet historians (or people’s family, friends, or just interested parties) should have that data available to them in future.

Here’s where the Archive Team, fronted by Jason Scott (of comes to the rescue.

In this instance, the Yahoo Message Boards are shutting down in just eight days. There’s been very little notice and it’s become a race to try and get the entire history of the boards before they’re wiped from the Internet. The Archive Team have supplied a virtual appliance for use with Virtualbox, VMware, etc, as a sort of folding@home style distributed system. Unfortunately Yahoo is rate limiting, which is making for slow progress even with all those who are helping.

Over at the Archive Tracker, you can see the real time progress of the archival process.

If you can afford to throw a couple of dollars at this and have an AWS account, this is a good opportunity to experiment with spot instances on EC2 – I did.
I set it up to use micros at a cost of $0.005/hour, which is a good chunk below the standard on demand price for a micro, already quite low. I’m rocketing my way up the rankings right now (duggan on the leaderboard.)

To make it super easy, I’ve thrown together a public AMI (search for 149682410612 or see AMI id list below) which takes a username in the userdata field (if blank it’ll default to “hackernews”).
It’s just a basic Alestic Ubuntu 12.04 LTS image with some short installation scripts.

Update: due to demand, I’ve made the image available in all other regions:

N. Virginia: ami-2400984d
Ireland: ami-d8d2d8ac
Tokyo: ami-a361e1a2
Singapore: ami-6e703c3c
Sydney: ami-4e0e9f74
Sao Paolo: ami-9d7aa180
N. California: ami-94f6dbd1
Oregon: ami-cf9206ff

All will be called “ArchiveTeam Warrior Yahoo Messages” under account 149682410612 (you can also search for this number in the AMI screen). This means that you can now run a max of about 800 spot instances per account (max of 100 per region) without going to Amazon to look for more.

The script for setting up the image (without the scrubbing of history and private keys) can be found here, if you’d rather not trust my AMI :) It’s not complex.

Help save a bit of web history!

Note: The reason I’m advocating using spot instances to get around the rate limiting, is because I don’t believe the rate limiting is intentional on Yahoo’s part (in the context of Archive Team’s cause), just bureaucratic slowness in removing anti-spam measures. Hopefully they’ll get on the case next week, but until then, we should do what we can.

Edit: conroy on Hacker News has provided this little snippet of Python for those who use boto:

import boto.ec2
    conn = boto.ec2.connect_to_region("us-east-1")
    conn.request_spot_instances('0.005', 'ami-2400984d',
                                instance_type='t1.micro', user_data='USERNAME')

In the comments, Ian McEwan has put together a quick and dirty guide for the AWS uninitiated:

a.) the AMI is only in us-east, as far as I can tell
b.) once you have an AWS account, go to the dashboard and to “AMIs” in the sidebar
c.) search “public images” and “all platforms”, wait a while for it to actually finish searching, filter to ‘warrior’, and choose the one that matches the number here
d.) click ‘spot request’ button, fill in form with price/etc. I’d recommend turning “Persistent” on and setting an end date of April 1.
e.) click through and do whatever bits it asks, mostly you don’t need to care
f.) profit

Posted in Technology | Comments closed

Programming is not my idea of “fun”

My motivation for learning to code was that I wanted to build things; not just anything, either, stuff that solved a problem I or someone else had. I don’t necessarily enjoy programming for the sake of programming, which seems to be unusual amongst a lot of developers I talk to.

Which is not to say I don’t like what I do. A lot of people seem to describe programming as “fun” (anecdotally, the most cited reason I’ve seen for writing ruby, a language I’ve been spending a lot of time with in the last few months).

“Fun”, to me, describes entertainment, recreation; like listening to music, watching a film or going quad biking.

Sometimes it’s fun, but that’s sort of a side-effect of being able to execute well. I pride myself in the overall elegance of the craft – being able to deliver a good, well defined balance on the scope/cost/schedule axes, quickly get used to a new tool, or know when to toss out an oft-used one, etc.

That might mean a lot of frustration along the path to that point. It might mean knowingly incurring technical debt. It might mean working in a language or technology I’m unfamiliar with, doing something tedious-but-necessary, or working on a timescale I find uncomfortably tight.

These are challenges that make me a better developer. They take me out of my comfort zone. I’d like to think it helps me be more open, objective and skeptical (without being cynical).

Posted in Code | Comments closed

Naming things

I’ve been trying to articulate something which has been bugging me recently – the way we name software.

There’s a well known quote attributed to Phil Karlton on the subject:

“There are only two hard things in Computer Science: cache invalidation and naming things.”

Naming is hard, but there are some traditions and patterns we can use to avoid bad practice and confusion.

First, I’d like to draw a distinction between naming software and the already well established practice of using stylistic conventions in code. What I’m talking about is more akin to what title you’d give a book than how you’d structure the text inside.

Some observations

Naming software is, of course, a subset of general nomenclature, quite a complicated (and subjective) field. I think, though, that we can narrow down how we currently name software to a few general models: abbreviation, acronym, noun, proper name and metaphor.

There are plenty of exceptions to each of these, but I think it’s a useful way of understanding usage.

Examples: grep, ls, man
Attributes: Tends to be used for single-purpose tools. Name encapsulates utility.

Examples: JSON, HTTP, TCP/IP
Attributes: Used in protocols, standard libraries and aspirant standard tools. Names are composed of standard technical terminology when unrolled.

Examples: requests, hash, ping
Attributes: Like acronyms, used for standards, canonical resources and aspirants. Name is standard technical terminology.

Proper name
Examples: Ubuntu, Skype, git
Attributes: Product name. Can encapsulate a lot of disparate functionality and verbiage. Name doesn’t directly imply utility.

“A word that answers the purpose of showing what thing it is that we are talking about but not of telling anything about it.” (Wikipedia) Ubuntu, for example, is an African word meaning “humanity to others” but for all intents and purposes in computing, it refers to the Ubuntu operating system. Skype is similar. Git is another. They are names which do not convey utility in anything but the most abstract sense.

Examples: Homebrew, Parody, Chef
Attributes: Based on abstracted analogous relations. Uses real world objects to imply internal (and sometimes external) relationships. May allow one to infer function based on comprehension of the metaphor.

Metaphor involves taking the abstract concepts from a piece of software, finding commonalities with otherwise unrelated “real world” or pre-existing systems, and using these pre-existing terms as a scaffold for fleshing out the abstract software concepts. This is fundamental to userland software – the “desktop” and associated documents, folders, etc, is probably the best known metaphor in computing.

The problem

Metaphors bug me. They bug me because they seem so elegant in their first draft, and that sort of elegance appeals to software developers. They bug me because they can work well to inform proper names, but make “overmetaphorizing” easy. They bug me because widely used metaphors like the “desktop”, are so ingrained that people tend to forget how unintuitive they can be to a newcomer.

Issues that stem from metaphor:

  • When the software needs to be extended, it may stretch or break the metaphor (diluting meaning)
  • The metaphor may needlessly restrict the function of the software
  • It can be meaningless without context

What really bugs me though, is that metaphors are a promise often broken.

There is the inevitable mixing of metaphors as projects mature. The most usable software projects I use don’t beat about the bush with their naming system. Imagine “git clone” was “git boba” or something equally absurd, tossing out perfectly understandable terminology for a Star Wars reference.

Functional inference is the dragon chasing of software metaphor.

There’s so much software out there that performs mundane or straight forward tasks but have “clever” names for one reason or another (you probably only have to look at the Gemfile of your nearest ruby project to find several examples). Python tends not to be too bad for this, but (I don’t wish to single this out, it’s just an example) there are still some silly things you’ll have to hear, like references to “the cheese shop” which is the Python package index. This terminology seems to have fallen out of official usage, but older mentions of the package index by name which might be seen by new users tend to be accompanied with an explanation that it is the package index.

This sort of stuff is a bit of fun at first, but you can imagine that as these things build up and metaphors fracture, or other metaphors make their way in, one ends up with a tangled web of vocabulary which repeatedly requires explanation. It is needless additional cognitive burden. That’s what it boils down to – needless additional cognitive burden in software bugs the hell out of me. It’s one of many subtle “learning curve steepeners” that creep into projects. The individual effect might not be much, but the cumulative effect is frustration.

Things to consider when naming

  • If you’re building a simple tool that performs a specific task, try to use an abbreviation or English word(s).
  • If you’re building a library, especially one which aims to be a standard in its area, use industry terminology to name it.
  • Metaphor should be considered carefully before being used.

The future of naming

While discussing this with Eamon Leonard, he recalled the IUPAC naming scheme for chemicals that we were taught in school. The point of this system was to establish a limited vocabulary that one could use to describe any chemical compound. Biology also has several methods of classification and naming for organisms. While I think the software in its entirety is probably the only analogue to IUPAC naming, I wonder if we might find some inspiration in biology for giving canonical, descriptive names to our software?

Posted in Programming | Comments closed

Using gems to package polyglot CLI tools

Just in time for Halloween! A true horror, but with a sort of tortured beauty. How (and why, WHYYYY?!) you can annex one of the ruby community’s most pervasive technologies to distribute your filthy, heathen, non-ruby code.

I’m not going to lie to you – this is not for the faint of heart. There’s not a lot of ruby involved, but you will need to edit some. If you’re a veteran ruby developer and packager – I am so sorry.


Rubygems, for the uninitiated, are how the ruby community does ad-hoc package management. Despite the potential security nightmare involved in allowing anyone at all to add to the primary rubygems repository, it’s pretty much the only way packages get distributed for ruby.

Whatever qualms one may have about that particular aspect of the gem architecture, from a developer and consumer’s perspective (if not a sysadmin’s), it’s a fantastic system. It’s also very straight forward to actually build gems themselves.

If you want to get some software in front of a bunch of developers, rubygems are a delicious low hanging fruit. The only problem, really, is that it’s pretty much expected that you’re writing the bulk if your stuff in ruby. Maybe you’ll be patching in some native extensions, etc, but otherwise why would you be using gems?

Well, for starters, rubygems comes preinstalled on every OSX machine for the last few years, putting it one step ahead of the likes of brew, macports, etc in terms of penetration. It also does not have the centralized (if benevolent) authority of these resources (a mixed blessing).

Another reason is that there are other languages you can expect to be on a Unix system if Ruby is on it – some sort of shell for starters, but probably a smattering of others depending on your audience.

You’re a savvy, globetrotting developer; you know there’s more to programming than picking a language and running with it like someone who has just recently discovered scissors.

In any case, here’s what you need to get your abomination into the world.

Before we continue

You’re gonna want to have ruby and rubygems installed. This is sort of a pre-req. There are many truly horrifying ways to get ruby on to your system; you’re a coder (if you’re not, bail here), you’ll figure it out.

You will need to have your code, at least, all sitting in a directory on your hard drive.

Dependencies, etc, will be a bit of a show stopper. If you’re going to be dropping executables on machines, you’re probably going to expect whatever runs them to already be there. For the purposes of shlint, which uses POSIX shell and Perl, we can expect most Unix based systems to either have these preinstalled, or for them to land on a system before you’ll want something like shlint.

Yes, haphazard.

First step – the gem specification

So the major thing that makes a gem a gem is the gemspec. The gemspec is a file, written in ruby, which describes your code. It’s a really straight forward bit of code, and you can read the gemspec for shlint here:

# -*- encoding: utf-8 -*-
lib = File.expand_path('../lib/', __FILE__)
$:.unshift lib unless $:.include?(lib) do |s|        = "shlint"
  s.version     = "0.1.1"
  s.platform    = Gem::Platform::RUBY
  s.authors     = ["Ross Duggan"]       = [""]
  s.homepage    = ""
  s.summary     = "A linting tool for shell."
  s.description = "Checks the syntax of your shellscript against known and available shells."
  s.required_rubygems_version = ">= 1.3.6"
  s.files        = Dir.glob("{bin,lib}/**/*") + %w(LICENSE
  s.executables  = ['shlint', 'checkbashisms']
  s.require_path = 'lib'

Ok, so the biggest chunk of this is pretty straight forward, I basically have no idea why the chunk at the top is there, other than that it’s in the example I used as a reference. Ruby voodoo?

The interesting things are the s.files, s.executables and s.require_path directives.

We’re going to use this sort of directory structure (again, see shlint for a working example):


For the sake of simplicity, you’re going to throw the executables you wish to run (shell, perl, etc) into a directory named lib. You are going to do something horrible in the bin directory (nobody will forgive you).

LICENSE and are going to be in the root of your gem, naturally.

Second step – the executable

Ok, so you’ve got your alien code in lib, but what are we putting in bin? Well, unfortunately when you install a gem, it gets interpreted as ruby, meaning your unruby will cause it to throw a total fit.

(Un)fortunately, there’s a solution! For each tool you want available on the command line, you’re going to want something that looks like this:

#!/usr/bin/env ruby
spec = Gem::Specification.find_by_name("shlint")
gem_root = spec.gem_dir
gem_lib = gem_root + "/lib"
shell_output = ""
IO.popen("#{gem_lib}/shlint #{ARGV.join(" ")}", 'r+') do |pipe|
  shell_output =
puts shell_output

Haha, it is so evil, but it works!

The gist of what’s happening:

spec = Gem::Specification.find_by_name("shlint")
gem_root = spec.gem_dir
gem_lib = gem_root + "/lib"

Here, we’re interrogating the gem system to find out where the hell we’re executing from, using that to direct to what we have sitting in the lib directory of our gem.


shell_output = ""
IO.popen("#{gem_lib}/shlint #{ARGV.join(" ")}", 'r+') do |pipe|
  shell_output =
puts shell_output

We’re opening a pipe and basically shunting all arguments into our tool to deal with. You can get clever here if you like, but I felt just passing it all through was preferable. The result gets printed to screen.

Import side note here: if you’re executing the tool directly like I am here you’ll need the shebang set correctly. Otherwise, you’ll want to invoke the code prefixed by whatever executable you hope is going to be running it (like perl #{gem_lib}/ #{ARGV.join(" ")}, etc.)

Third step – packaging

Once all the bits and pieces are in place, you can try packaging your code with:

gem build mypackage.gemspec

If you’re lucky, you’ll have gotten everything right first time and you’ll now have a .gem file sitting in the directory you ran the command in. If not, the error output is pretty good, you’ll muddle through.

Next thing you’ll want to do is install your freshly forged gem with gem install mypackage-0.1.gem and see if it actually works. I went through several iterations to get the path stuff worked out, but you should find it easier.

Once you’re satisfied the gem is working, it’s time to magic it out to the world.

Fourth step – rubygems

So to get your gem out to the rest of the world, you’ll need to sign up for an account at This process is simple, and once you’ve got an account, you can then proceed to run this command:

gem push mypackage-0.1.gem

The first time this is run, it’ll ask for your account details, fill these in and boom, your gem will be published to the world!

Now try booting a VM or something to see if you can install it from anywhere by just running gem install mypackage.

Final thoughts

This is a pretty horrible hack, but it has its benefits, frankly the largest of which is the sheer glee of building such a Frankenstein. Maybe though, if you’ve got a system where you’re pretty confident that it’s both got ruby on it and a particular subset of other languages, it’s a fun way to subvert the intended usage.

Posted in Code | Comments closed