This is going to be a bit of a rant - but a rant from something that came up recently where someone was considering MongoDB.
I was just reading MongoDB Set to Become the ‘New Default’ Database…
Just… wow. Quite a bold statement there. To save people giving details on the form (another personal bugbear of mine… so I filled it with junk) - here’s the link to the relevant piece.
First things first let’s pick apart the minor error in the press release that eWeek clearly didn’t check up on.
All tests were performed with 400M records distributed across three servers, which represents a data set larger than RAM.
Our setup consisted of one database server and one client server to ensure the YCSB client was not competing with the database for resources. Both servers were identical.
Load 20M records using the “load” phase of YCSB
So that’d be mistake one… it wasn’t three servers at all. That is a gross error as the read statistics for Cassandra would be way off as a result. In fact they say as such in the Conclusions.
We focused on single server performance in these tests. Multi-server deployments address high availability and scale out for all three databases. We believe that this introduces a different set of considerations, and that the trade offs may be quite different.
My point is that it looks like the creators of MongoDB have commissioned and paid for this report. If they haven’t then really the press release and news around it is tripe and if they have… where’s the notification of bias.
It’s worth adding that the three databases tested are completely different! Cassandra, MongoDB and CouchBase each have very different use cases. It’s not overly fair to pit them off against each other. If you were to pit MongoDB and CouchDB against each other, that would be fairer. CouchBase is really CouchDB but prettier and with a very very clever caching front end on it.
I have deployed a large Cassandra and very large CouchDB set up. I wouldn’t use either one for the other’s workload.
Docker is a hot topic at the moment in the DevOps world. I use it almost every day and want to look at how automation can be achieved in terms of security and monitoring.
Containers in computing aren’t new. In fact FreeBSD had containers before Google was using them in Linux; although it call them jails.
Docker is great in that it’s brought containers to the masses. Once the reserve of people with the patience to set up LXC on Linux or the painful jails on FreeBSD - side note: it’s very painful I might talk about that another time.
We can talk to Docker via it’s RESTful API and libraries exist for almost every language. The two popular obvious ones are Go and Python - I say obvious, but it’s more that I just prefer these two languages. I’m sure the Ruby one is awesome too.
The downside of Docker that’s coming up more and more is managing security of containers. People often just use official images without a second thought and these end up in production. There’s posts containing loads of FUD on the topic which exist already - but in general how do you ensure you keep your containers’ operating system packages up to date?
Sounds like a task for a script. I broke it down into the following tasks:
- Connect to Docker (boot2docker in my case)
- Get a list of installed packages in debian:jessie image
- Get a list of packages from security.debian.org
- Compare the two
I need to add I used Python 3.4 for this. This makes the syntax seem a little odd to a Python 2.x view so needed to say!
Let’s get connecting out of the way:
1 2 3 4 5 6 7 8 9 10
Took me a small while to figure out the issue where OpenSSL 1.0.2a causes problems with quite a few libraries and talking to APIs. To get out of it for now I disable the verify part of requests - It’ll complain a lot about it.
Now we’re connected we can make a container and get some stuff out:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
We now have a stack of packages in a dictionary keyed by package name. To do this we make good use of dpkg-query to get a CSV like list of package,version.
What we want next is a similar dict for up to date packages. Now, I know a lot of people who might read this would launch into apt-get update and then query the global list of packages. Would you do that in production? Really? You just want a list of stuff… Let’s just get it from security.debian.org directly.
1 2 3 4 5 6 7
A small point here… We make use of the gzip library directly to ungzip the file downloaded via Requests. To do this we use ‘r.raw’ like a file which GzipFile can use without any issue.
Now the format of this file is a bit weird. It’s a list of key value pairs for each package with a blank line between packages. The two keys we’re interested in for each package are Package (the name) and Version.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Perfect! We now have a dict with all the security updates in Jessie keyed by the package name again.
With these two dicts we can intersect them and only get elements that are in both. If the version doesn’t match spit it out. I had to fake an update to exist to test this properly as when I tested there were no out of date packages.
1 2 3 4 5 6 7 8 9 10
And, there we have it (spot the whining from requests…):
1 2 3 4 5 6 7 8
Awesome! There we have it - a quick way to grab and compare packages against containers.
It’s about time I updated this site. I go through stints of bothering with it; which is very common I find with a lot of people who still blog.
However, as I’m using Twitter less and less (can’t put my finger on why) and I like to keep my Facebook more private than most… it’s about time I bothered once more.
So, new theme. Went off, got the Octostrap theme. It’s awesome and well worth it.
I did look at Octopress 3 - but I don’t like the way it works. The approach of using rake still works for me and it seems like separating things for the sake of doing so… bit like something Hubot has done over the past year too.
As for the ‘new start’ - I’m going to try and blog more. Adding to that I do have a Tumblr I post random things to as well which may be more up to date.
In fact - ways to find stuff I’m doing are:
And because it does happen…
I probably spend way too much time configuring my VIM setup. It tends to change depending on what I’m working on. So, at the moment the following things matter to me most:
There would be Scala, but I use the excellent IntelliJ IDEA product for that. Nothing can beat it, so there’s no point trying to get VIM to do it.
It matters to me that my editor works cross platform too. Not fussed so much about VIM on Windows (although it’s nice when that works too) but more between OSX and Linux as they are the main two Operating Systems I use.
So I felt I’d do a post about how I manage my VIM config as it may/may not be useful for others.
Let’s start nice and empty:
1 2 3 4 5
Why do this? Well, simply put - this way your .vim folder can be easily stored in Git or another VCS you fancy. Job done!
Right, so what next? vundle all the things.
Now you need a small bit at the top of your .vimrc file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Now we have a basis of a working VIM we can work on. Let’s set up some cool stuff now…
Some obvious bootstrap things
By default, VIM likes to behave a little bit old fashioned. We want some niceties from the off - so let’s do that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
You’ll notice that 2 spaces is the default but, obviously, Python is a good example of a language that uses 4.
This way you’ll see, we get to customise each language. It’s nice. ‘au’ is short for auto. As in… Automatically run this when the FileType is python.
This is the Batman utility belt. It’s also easy to set up and serves as a good example of how Vundle works.
Job done. Make sure this goes between the Vundle begin and end calls.
Now save that and we’ll online reload/install:
This will load up vim, install all the things and then exit when done.
Just spent the weekend at FOSDEM 2014. It’s the first time I have been to FOSDEM and checked out more of the Open Source world.
Seven of us went from Green Man Gaming and the only thing I will remember for the future is that I need to turn up to talks I want to go to very much in advance.
Rooms were always very quickly packed out. Managed to meet lots of cool and awesome people though.
Go was the language of the conference
There’s no way you could avoid this. Go is mainstream now. It’s been heading this way for a while - but it’s very clear that this language that people wondered the point of is now relevant to the point of obsession. The room was constantly ram packed and people staying for talk after talk.
I’m not the only one that laughs at MongoDB
Yep, turns out lots of people find the stability amusing.
There tons of PostgreSQL I don’t know
This was always going to be a given. The RDBMS is still relevant and still attracting a lot of attention. There was a definite lack of MySQL and MariaDB however. Maybe that swings it a bit.
There’s loads of stuff coming in 9.4.x for JSON and the like. The main thing I got from the talks though was an understanding of TOAST.
I need to do more here
Next year… I need to make more of an effort to plan and attend more talks.
I will be here next year!
I was trying to make HTTP calls using Finagle today and all I would get was this traceback from my logs:
1 2 3 4 5 6 7 8 9 10 11 12 13
It turns out I needed to set up my ClientBuilder a little differently:
1 2 3 4 5 6 7
The important bits, that don’t appear well documented are tcpConnectTimeout and requestTimeout. The normal ‘timeout’ usually used on the ClientBuilder is not what you want.
This was more a note for me - but figured people Googling might find it useful also.
Recently I was catching up on talks from DjangoCon EU 2012. Wish I could have been there. This talk on Flasky Goodness (or, why Django sucks) sort of rang a bell with me.
Why? Well, for me the point of writing everything like it’s going to be open source seems like a great way of doing things. It’s a great philosophy to have. Seriously.
Interesting tidbit posted by The Register last week. It’s a topic fairly close to my heart as I don’t have a degree. I do, weirdly, get asked about this quite a bit - “should I get a degree?”. If you think you should, you should. Don’t let any “IT pro” sway your decision.
This is especially important now that degrees are just so damned expensive. According El’Reg, you’re looking at £27k for your degree. Wow!
Personally, when I look at people’s CVs, the first thing I’ll do is Google them. Then, I’ll take a look at their Github profile. Then I look at their education. It’s important that if they did a degree, they did well - but it doesn’t matter if they didn’t do one at all.
A few people spring to mind who this applies to. They know who they are.
Originally from QuestionCopyright.org
This trick has saved me today and I’ve had to use it before… so I’m going to demonstrate how to do this here in my own words to save me Googling for every time!
So imagine the scenario. You’ve done a load of commits and you’re not ready to merge them back into master (or, even worse you’re in master) and you realise you made a massive mistake a few commits back and you need to just squash all the correction commits into it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Well that’s crap really isn’t it? So how do we go about sorting this? Well, we make a good use of rebase and tags to achieve this. There’s a nice answer to this on Stack Overflow - but I’ll replay it here. Props to Charles Bailey for this process.
So, here’s how we fix this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Not overly simple - but brings, in this case, master to where I wanted it to be!