EC2: Create AMI from a running instance

Log into the AWS web console, find the instance you want to create an AMI from, right click and select “Create Image (EBS AMI)”. Follow the wizard.

All the top results in Google are for The Long Way™ to do this. The Long Way has a bunch of useful things to take into consideration (security, for example), but the actual process of creating an AMI from a running instance has been made simpler by Amazon in the last few months.

Right-click context menus in web pages aren’t the most obvious metaphor for people who have been using the web for a number of years. I know people new to EC2 who have completely missed this and could have saved themselves some time. Hopefully this will bubble up and save people a half hour or so :)

Posted in Technology | Comments closed

Gender breakdown for software development in Ireland

This post was inspired by this TED talk from Sheryl Sandberg, which reminded me of this brilliant blog post by Jolie O’Dell. The whole women in tech thing is something which interests me (as a techie, as well as a human being), and there are loads of great discussions (and terrible ones) about it.

Have been trying to figure out a good way to get an idea of the size and gender breakdown of the somewhat nebulous software development community in Ireland. I’ve settled on LinkedIn as probably the best way to figure this out, Facebook appears to be useless for it (despite having that information). I want to compare these to the official numbers provided by government and other agencies to see how they measure up, though it involves a good few caveats.

These numbers are at least interesting to play around with. I suggest trying out LinkedIn’s DirectAds system yourself; don’t have to pay to get to the “targetting” stuff at step 2.

Currently, LinkedIn lists 416,030 people as working in Ireland. The National Skills Bulletin 2010 (pdf) states that there were 1.88 million people working in full or part time employment in Q4 2009. As that number has probably only gone down since then, I think LinkedIn has a pretty sizeable chunk of the workforce listed; maybe more than a quarter of the total, though probably a heavy bias towards networking-enabled industries with strong computer usage.

Of those listed on LinkedIn, 37,322 describe themselves as working in “Engineering” or “Information Technology” – two quite broad descriptions, both of which will include a portion of people who have absolutely nothing to do with software development. At this level, the breakdown is 28,387 male, 6,722 female. Looks to be about 2,213 who have not listed gender.

It gets tricky at this point – if you wish to breakdown by Industry, you’ve got a max of ten to choose from a list of hundreds, and techies work in almost every industry. To try and extract the main corpus of “software developers” from the Industry section, I did a quick poll of my own LinkedIn connections and chose: Computer Hardware, Computer Software, Computer Networking, Internet, Information Technology and Services, Computer & Network Security, Wireless, Online Media, Publishing and Information Services. I figure this cuts out most of the mechanical/bio/pharma engineers.

This gives me 13,080 people in total. The gender breakdown for this is 10,141 male, 2,148 female (791 unknown).

To see how these numbers match up with regular LinkedIn search (if I were looking for software people), I searched for people in Ireland with a couple of different keywords as their current job title. “Engineer” turned up 13,848 (24,446 including past jobs). “Systems Administrator” turned up 330 (997 including past jobs). “Developer” turned up 2,930 (6,414 including past jobs) – not sure how many of these might be “business developer, property developer” or similar, but the first few pages of results looked about right. “Operations” turned up too many misses (COO, ad operations, etc), so haven’t included it.

This tots up to about 17,108 people. I might guess that my DirectAds version of this is missing 4,028 people who have listed themselves as being in finance, banking, etc, or engineers who have nothing to do with software. Either way, not totally disparate figures.

So, does 13,000 sound like a good ballpark figure for the number of software developers in Ireland? 84% male vs 16% female?
CareersPortal.ie has some information compiled from the CSO and the National Skills Bulletin 2010. They conclude that there are 9,000 people employed in software development in Ireland, and that the gender breakdown is 89% male, 11% female. Interesting that while there’s a difference in the “totals” for each, the percentages are pretty close. Considering the time gap between the Forfás data (about a year old, sometimes more) and the LinkedIn (almost realtime), I think those numbers are a pretty good indicator.

Posted in Technology | Comments closed

Dublin Bus route statuses

I’ve thrown together a little hack that gives current Dublin Bus route status information in JSON format.

Endpoint: http://rossduggan.ie/stuff/bus/

Simply calling the endpoint will return a JSON object of all bus routes and their associated statuses.

Append ?route=x to the endpoint and you’ll get just the results for the route specified (or what it thinks you mean if it doesn’t understand the route):

GET http://rossduggan.ie/stuff/bus/?route=15

You’ll get something like this:

{
	"status":"Operating on normal route.",
	"match":
		{
			"exact":"15"
		}
}

If you look for one of the routes that Dublin Bus merge together (for whatever reason; 42A and 42B are like this):

GET http://rossduggan.ie/stuff/bus/?route=42a

You’ll get:

{
	"status":"Unable to serve Edenmore or Harmondstown. Operating via Springdale Rd and Tonlagee Rd.",
	"match":
		{
			"closest":"42a\/b"
		}
}

The reason for this is because I threw this together in about 30 minutes as an excuse to familiarize myself with XPath, and I thought some error handling would be better than no error handling :)

Once the bad weather goes away this will probably break since Dublin Bus will change the layout of their notices, but I’ll try to keep it functional for as long as I can. I have a horrible feeling the data source is being copied and pasted by whomever is maintaining it from an old copy of MS Word into a WYSIWYG editor for the dublinbus.ie homepage.

In usage:

I pushed this out on Twitter at about 7:30pm, and by about 10:00pm @walmc had already thrown together a neat little node.js growl notifier using it!

Posted in Codetry | Comments closed

Links for Friday, 3rd December 2010

Paul Conroy has written a little bookmarlet to add imgur previews to the Twitter web interface, and has a detailed explanation of how he’s done it.

The Algorithm + the Crowd is not enough. I have a response post gestating.

Defecting by Accident, A Flaw Common to Analytical People. This is “How to Win Friends and Influence People” digested into a blog post for nerds.

Potentially Consistent, or why your MySQL master-slave setup is not “Eventually Consistent”.

This post and this video are a good introduction to Clojure.

Kafka is a roll-your-own SNS from the guys at LinkedIn (though I’m sure they wouldn’t describe it that way).

First class APIs, by @h.

What the HTTP is CouchApp?

Engineering Shortage Is Real. Former Digg Engineer Gets 7 Offers, Takes One for $150K. Good news for software engineers.

This is a really cool take on introducing an application.

Posted in Codetry, Technology | Comments closed

Extracting information from a lot of images on disk using find

If you need to extract information from a large number of images on disk (and you’re using a *nix system), you could do worse than using find with Imagemagick’s command line tools.

If you’re unfamiliar with find, I’d recommend reading the beginners guide on Linux.ie. It has terse and initially daunting syntax, but is one of the most powerful tools available to *nix users and proficiency with it is massively useful, especially for sysadmins and developers.

Here’s how you’d go about finding all jpg, gif, png and bmp images in a directory, excluding anything in a “thumbs” directory, getting their dimensions, compression type and filesize, separate each piece of information with a comma and writing it our to a file:

find . -path "*/thumbs/*" -prune -o -type f \(\
 -iname "*.jp*g" -o -iname "*.gif" -o -iname "*.png" -o -iname "*.bmp"  \)\
  -exec identify -format "%i,%wx%h,%m,%[size]\n" {} + > /tmp/images.info

Broken down:

find .

Searches in the current directory (.) – you can specify a path just as easily (find /path/to/directory/)

-path "*/thumbs/*" -prune

Exclude (prune) paths that match the preceding pattern. You can specify this multiple times (or not at all).

-o

This is the OR operator. AND is implied between each modifier if left out.

-type f

Specifies that we’re looking for a file (a directory would be -type d)

\(\
 -iname "*.jp*g" -o -iname "*.gif" -o -iname "*.png" -o -iname "*.bmp" 
 \)\

( opens a group, ) closes it. The backslashes escape the parentheses and newline (I’ve just used the newline to make it more readable). The -iname directive specifies a case-insensitive filename, in this case matching file extensions. The usage of the -o operator is more obvious here, as without it we’d be asking that each file match .jpg AND .png AND .gif – which wouldn’t really work.

-exec ... {} +

This executes a command on each item found, the “current” found item being contained in the {} placeholder. + is the terminator in this case. \; can also be used (again, backslash as escape), but the + terminator batches results and performs much better with large numbers of files. This is roughly equivalent to piping into xargs on older systems which may not have the + terminator available (pre-2005 builds).

identify -format "%i,%wx%h,%m,%[size]\n"

In this case, the command we’re executing is Imagemagick’s identify tool. There’s quite a lot of information available here, it’s prudent to use the -format option to limit the information to what you need. Helpfully, there’s a list of escape characters to let you know what can be extracted.
Here, I’m getting the file path (%i), the width(%w), the height (%h) and putting in a literal ‘x‘ to separate them. After that, there’s the compression type (%m) and the filesize in KB (%[size]). I separate each value with a literal comma and ending each line with a newline (\n).

> /tmp/images.info

Finally, rather than output this information to the screen (by default), we direct the output into a file in the tmp directory. If there are a lot of files to process, you won’t immediately see data start to pour in here, as it’ll be batched using the + terminator mentioned before. You’ll probably see it populate in lumps of several thousand.
You should get a file containing results that look something like this:

./images/3tm9wzz4z9kzd51168cef0a9cc77ca616916128aaa3d.JPG,640x480,JPEG,22.8KB
./images/226te3jc3m85519d6348418bdde11ee08d77ffd338ff.JPG,626x639,JPEG,44.6KB
./images/2s9262f4uix2e26113b8007a2a3dfadb6aa3fa7aa0ee.JPG,384x288,JPEG,36.6KB
./images/3572wcuya3pi3fb0f68eff3d6104a7b94d5725b2b526.jpg,480x640,JPEG,50.9KB
./images/5wby49rxay9lcc890e914b4d52e9909700f8d5227bb9.jpg,354x142,JPEG,11.9KB
./images/1c6cf3icti8v9c2b997592c0c7c51c25e900969eaec4.JPG,478x640,JPEG,41.4KB
./images/53h1y0x1q37q22d65cc682f6d7994db2510cab013ddf.JPG,478x640,JPEG,28.1KB
./images/4r8ck3kn1ezi809f7d4a63c0fb95b4f07053641bd8d3.JPG,478x640,JPEG,33.5KB
./images/156m118zdn7n4a10fef7d6c88067482f0803db2837e6.JPG,478x640,JPEG,25.5KB

If you spot any typos, mistakes or ways you think this might be improved, feel free to let me know.

Posted in Codetry | Comments closed

Links for 7th April, 2010

Leslie Jensen-Inman via A List Apart, examines the use of colour. Worth a read, especially for the tools she highlights:

Clay Shirky – “The Collapse of Complex Business Models”

Microsoft’s TERMINATOR project – solving the halting problem… for a finite number of “real” systems.

Damien Katz gives up on git for managing CouchDB. He’s a smart dude, hope he elaborates.

Bruce Schneier – Privacy and Control

Posted in Technology | Comments closed

Links for 31st March 2010

Prescient – “The End of Practical Obscurity” (dated 21/05/2003)

Further evidence supporting the idea that Apple is planning to cut the legs out from underneath Google.

Discussion: Students and professional developers take on the “skill shortage” in the IT sector (Boards.ie). Mark Dennehy’s take.

Somewhat prescient, somewhat wacky – What the future looked like in 1993 (video)

Novel usage of Twitter data – http://sleepingtime.org – find out Twitter users’ sleeping patterns.

Posted in Technology | Comments closed

Privacy by obscurity

In the same minute that I pressed “publish” on this blog entry, it was downloaded by search engines, chopped up into keywords and indexed for the world to find. It was downloaded into RSS readers. If it were something significant or vaguely interesting to someone, it might be copied to another blog, a forum, linked to or printed out. A wandering spambot will likely download it, scan it, and attempt to insert a comment promoting some sort of pharmaceuticals. It might also be used by that spambot in an attempt to stuff another fraudulent website with “real content” in order to boost that websites search engine ranking.

Knowing all this, I still pressed that button. But how many others would hesitate doing the same if they knew this? When it comes to self-published blogs, it’s quite likely many authors do know this – at least on some level. They may not understand the full ramifications, but they likely have a reasonable grasp of the situation. If you’re a professional web developer or work in the online industry you are likely aware of all of this. What of the majority of Internet users though? What of the Facebook or Bebo users?

The title of this post is inspired by a phrase from the world of security engineering, “security by obscurity,” which describes a system secured by an outsiders lack of knowledge/understanding of the design or implementation of the security system. Privacy by obscurity then, could be defined as the belief that information is under control where it is in fact, not.

Security through “obscurity” is effectively a social deterrant to crime; a locked door or closed window won’t stop a determined thief. Online privacy would be the same – if the house had no doors, windows, or walls.

Privacy has no easy template

People have a fuzzy, stratified concept of privacy. There are things that you will tell one friend that you wouldn’t tell another, or things you might share with your siblings that you might not with your parents, etc. The stratification is broad, multidimensional and different for each person. It changes between each data point in a particular block of information, per person, over time.

For example – Bob goes out to a nightclub on Wednesday and meets Sally. He has a few drinks and smokes a couple of cigarettes. A friend takes a polaroid photo of Bob and Sally dancing; Bob goes home at 2am feeling a bit ill due to some dodgey pints.

Bob’s friends want to know what he was up to on Wednesday, and as it was perfectly innocuous, he’s probably happy to tell most of them; however, some considerations:

  1. Bob might not want his workplace knowing that he was out til 2am during a working week, especially since his performance on Thursday was less than fantastic.
  2. Bob promised his mother that he would stop smoking.
  3. Bob’s friend, Joe, used to go out with Sally, and would not be happy to find out that Bob and Sally were hitting it off.
  4. In a few years time, Bob may not be so happy about that photo of him.

This is only a short example, but hopefully it illustrates that this block of information does not fit what most people would like broadcast about them on the nine o’clock news (if in the imaginary situation that Bob suddenly received a lot of attention), nor is it exactly deeply personal.

However, it is also technically possible that one of Bob’s friends will inadvertently tell Bob’s mother that he smokes, or that the polaroid will be shown to Joe at some point. The idea that “unencrypted” information can be confined to the specific target audience of the person involved is flawed. In real life, we’re used to judging the probability of this bleeding of information because the number of data points (the polaroid, the fact Bob smoked) and nodes (Bob’s friends) are small enough for us to naturally calculate the risk.

Network effect

The problem occurs with the Internet because the data points are infinitely replicable at zero cost (copying a photo to your hard drive, or to your Facebook profile is trivial and takes virtually no time at all – in many cases even this is automated). The network effect of Internet technologies also means that each node is massively more connected than before (everyone you know is probably listed as a “friend” on your social networking profiles regardless of your relationship to them).

The low cost of replication and the network effect quickly eradicate this flimsy stratification, treating it as an error, and reduces our personal stratification of information control to its simple, boolean reality – private, or public. Like heat increases the rate of chemical interactions by increasing the rate at which molecules collide, the Internet increases the rate at which data points collide with nodes.

This is not an entirely new concept, however. From the emergence of language to the invention of the printing press, ways of disseminating information faster and more easily have paved the way for light to be shone into the dark recesses of ignorance and secrecy. Sharing of information is literally the cornerstone of civilization – it’s one of the major behavioural traits that set us apart from the animal kingdom.

Much like the printing press, the Internet has catapulted us into a new and ever changing revolution in the way we see ourselves, each other and the world around us.

Privacy is boolean

With this understanding, comes the inevitable realization that information can really only be reliably defined in one of two fundamental ways:

  • Private – This is information only you have access to, stored by you – probably on your local machine or in a secure email inbox. Making this information non-private requires, at minimum, blackhat hacking of some description.
  • Public – This is information that someone other than you has access to. This could be information your friends can see, or just information an informal group has access to; a vague circle of trust.

Public is where the majority a person’s online information falls into (intentionally or not). There are a lot of people for whom what they consider private information actually falls very much into the public sphere, through simple lack of understanding of the Internet.

The Internet has a long memory, and it’s improving

Whoever falls within your circle of trust now may not do so in future – relationships of all kinds can change over time. Conversely, due to the persistent and transferrable nature of bits, a copy of a given piece information is just as good as the original, just as transferrable, and importantly, eternally subject to the whims of whomever has access to that copy.

If you’re not routinely conscious of your personal information, this proposition should, to put it mildly, alarm you. Never in the history of humanity has so much information about people been willingly offered up. This information is being regularly scoured by farms of automated bots owned by criminals looking for information on how best to scam you. It’s also being regularly scoured by “legitimate” marketing agencies who want to find ways to make you buy their tat, as well as find ways to prevent you from talking about poor experiences with them or their clients. The more online services you use, the less anonymous you are.

There is an argument that people should stand over what they say and do online. Maybe they should think before going on a wild rant against a company they had a poor experience with. Maybe they shouldn’t publicize how they’re giving work the runaround by taking sick days and going on holiday. Maybe they shouldn’t fill out personality tests or give their email login details to “help” them find more “friends”. I agree, to an extent. People need to think before they share, but to do so they need to understand what they’re doing in the first place.

This tearing down of percieved walls may be something we’re going to have to come to terms with if we wish to live in a more connected society. We may be able to prolong the illusion of information siloing, but the longer it continues, the higher the risk to each of us and the higher the reward for those who exploit it.

Maybe we will decide the illusion is worth it, as each of us operating online as though we were a celebrity, endlessly pruning our public image, may have conseuqences we’re not prepared to accept. However, the concept of the Internet as an Orwellian monitoring device, powered by the gossip-hungry schadenfreude of the people who use it may not be far from the truth in the near future.

Posted in Technology | Comments closed

Are Apple making a play to cripple Google/Microsoft?

Just a thought that came out of a café conversation with Conor; the increasing effort from Apple to eradicate Adobe’s Flash platform from their devices could be a subversive attack on the advertising revenue of both Google and Microsoft.

We know that Apple owns the high end of consumer laptop purchases and holds a significant chunk of the smartphone market. We also know that a savvy, well-to-do audience commands the best per-click advertising rates. It wouldn’t be strange then for one speculate that in the demographic of “higher income earners” there is not insignificant crossover between high-end targeted online advertising and consumers of Apple mobile products.

Flash is popular because of video sites like YouTube, Vimeo, etc, and for various gaming applications (Farmville on Facebook being an obvious current example). Flash-based advertising has been a side effect of this widespread popularity, cleverly piggybacking on consumer desire for video/gaming content in order to deliver more effective campaigns. Flash advertising is the price we pay for video and gaming content.

Moves by Apple to explicitly ditch Flash coupled with the W3 Consortium’s inexorable march towards the implementation of HTML5 are converging to remove the reward for having Flash available as a content delivery mechanism. If the only thing Flash delivers is advertising, why would anyone wish to have it?

Advertisers rely heavily on Flash to deliver rich media advertising to Internet users.  It’s hard to imagine an industry, one which has had great difficulty in embracing the Internet in the first place, mobilizing to change their entire infrastructure from one based almost exclusively around Adobe’s proprietary Flash product to something unknown, overnight.

Clearly, this is not something that is actually going to occur overnight, though Apple, by jettisoning Flash from the devices of a high-yield demographic, could reap huge rewards relatively quickly. It doesn’t even need to come up with it’s own advertising solution to replace the hole it leaves – simply destroying that chunk of revenue for Google and Microsoft may be enough to begin destabilizing the incumbent online advertising market.

Posted in Technology | Comments closed

RE: Is Facebook unethical, clueless or unlucky?

1. Is Facebook clueless, unethical or just unlucky? Why?

I don’t believe that Facebook could possibly be clueless; they’re one of the few companies that gets to take their pick of what talent is available to the industry.

This is a slip into the unethical, at the very least it’s a slip into the grey area – every web company that survives on advertising revenue (ie, nearly every social media company) is under constant pressure from their own advertising, marketing and public relations teams (or whatever fulfills the tasks of such traditional elements in their business) for better, deeper information and bigger numbers for that information. At the end of the day, it boils down to targeted advertising, and must do for a company like Facebook, judging by their popular D.I.Y. advertising model.

I don’t for a moment believe that their motives are *evil*, but I do think they are misguided. I think they have probably “dogfooded” themselves into believing that it’s a harmless way to increase the value of their product (and their product is the users of Facebook, their customers are the people who buy the ad space).

2. Will Facebook’s latest behavior result in more lawsuits and/or industry regulation?

There are stirrings of regulation on this side of the pond (though still only rumbles) and one has to suspect that eventually the problem of regulating Internet companies activities online is going to get attention – but I’m not sure this particular event will be the final straw. I think it’s likely that companies will keep pushing the boundaries until one finally does something that creates a scandal and brings the whole privacy house of cards tumbling down.

3. Do you trust Facebook with your information?

I was initially wary of them for their “enterprising” (and now widely adopted) strategy of scraping address books and email inboxes for contact details to “helpfully” invite others to the service. I think the value they bring to the table is limited, and that their success is based, in large part, on the ferociousness and tenacity of their contact harvesting spampaign and gimmicky features that result in email notifications.

So no, I don’t trust Facebook to not exploit what information I give them; I’m waiting to see just how far that exploitation goes.

Posted in Technology | Comments closed