About Me
26 year old software developer located in Dublin, Ireland.
I work for EngineYard on Orchestra, the PHP Platform as a Service.
Email: ross.duggan@acm.org
-by DuckDuckGo
I run wthax.org and co-founded Ireland's first anime convention, EirtaKon.-
Recent Posts
- Reading list for scaling Solr
- I’m joining EngineYard to work on Orchestra
- Bots are crawling new domain registrations and namesquatting Twitter handles
- “Levelling the playing field” in education
- Munin plugins for Solr
- Google Plus
- Getting Windows 7 onto a USB stick using Ubuntu
- Searching Boards.ie – Solr, EC2, SQS, SNS, Node.js
- EC2: Create AMI from a running instance
- Gender breakdown for software development in Ireland
Recent Comments
- Leon Woodward on Bots are crawling new domain registrations and namesquatting Twitter handles
- SamFisher@SamsungHD on Bots are crawling new domain registrations and namesquatting Twitter handles
- Declan on Bots are crawling new domain registrations and namesquatting Twitter handles
- Sully on I’m joining EngineYard to work on Orchestra
- Ross on I’m joining EngineYard to work on Orchestra
Categories
Development
Recreational
Technology
Reading list for scaling Solr
Brain dump time. I kinda need this as a memory aid for myself, and I figure it’ll be useful to anyone else who is building a Solr cluster. There’s probably a lot of crossover here for tuning any JVM-based application servicing a large number of requests, but this is my first, so it’s all together.
Some of the background for this list can be read here, and for some further context, this is some of what I read to build something probably more powerful than websolr’s top tier offering (those guys are probably worth investigating before building your own cluster, by the way). There were some pretty “out there” requirements for Boards.ie though (potentially thousands of FQ permutations per search phrase, lots of big, ugly, old data, etc).
Some of the issues I ran into scaling Solr are relatively unique, but the general approach should be the same for everyone:
Figure out what you’re actually running
If you’re unfamiliar with the world of Java (or a rusty shade of green like me), you might be
horrifiedsurprised to discover that there are a few different implementations (let alone versions) of the Java Virtual Machine (JVM) available to you. What’s more, the best documented and supported one, the “Oracle” JVM (still documented almost everywhere as the Sun JVM) is probably not what you’re running if you’re running Ubuntu Server.There’s also a difference between engineering numbers and product numbers, which may not be immediately apparent from the outset, and often they appear to be used interchangeably.
Understanding the JVM
Understanding Solr
http://www.lucidimagination.com/content/scaling-lucene-and-solr
Somewhat passive aggressive Lucid Imagination advice - http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr/
An example of some of the hilarious bureaucracy in Solr development - https://issues.apache.org/jira/browse/SOLR-1143
There’s probably plenty more, but those are the ones I have saved