Everything is a Ghetto

While reading this controversial link bait, consider buying my product/service

Securing Cloud Backups With EncFS

Assuming you are on Debian/Ubuntu, install encfs

# apt-get install encfs

Simply run

encfs ~/PATH/TO/ENCRYPTED_FOLDER ~/PATH/TO/FOLDER

to create an encrypted version of a folder

$ encfs ~/ENCFS/ENC/ ~/ENCFS/CLEAR/
Creating new encrypted volume.
Please choose from one of the following options:
 enter "x" for expert configuration mode,
 enter "p" for pre-configured paranoia mode,
 anything else, or an empty line will select standard mode.
?> p

Paranoia configuration selected.

Configuration finished. The filesystem to be created has
the following properties:
Filesystem cypher: "ssl/aes", version 3:0:2
Filename encoding: "nameio/block", version 3:0:1
Key Size: 256 bits
Block Size: 1024 bytes, including 8 byte MAC header
Each file contains 8 byte header with unique IV data.
Filenames encoded using IV chaining mode.
File data IV is chained to filename IV.
File holes passed through to ciphertext.

New Encfs Password: 
Verify Encfs Password: 

Then copy a few files into CLEAR (the unencrypted folder), you can see there is the same number and size of files in the encrypted folder

./CLEAR:
total 19M
-rw-r----- 1 thattommyhall thattommyhall 5.5M Jul 17 17:10 Developing_Backbonedotjs_Applications.epub
-rw-r----- 1 thattommyhall thattommyhall 5.2M Jul 17 17:10 Developing_Backbonedotjs_Applications.mobi
-rw-r----- 1 thattommyhall thattommyhall 7.7M Jul 17 17:10 Developing_Backbonedotjs_Applications.pdf

./ENC:
total 19M
-rw-r----- 1 thattommyhall thattommyhall 5.5M Jul 17 17:10 3R4b6qteJLZmTzMGit2cWajhzkVG,rLw6xry2PujL4LSbtg3EgftaMWfhQk4fM5mc-C
-rw-r----- 1 thattommyhall thattommyhall 7.7M Jul 17 17:10 EnMiuriWbkgXYz7F9GsAzRlCBSpmHub7kvObd8fyFswVkhlUi-FzHaT7twrIHXsEOy5
-rw-r----- 1 thattommyhall thattommyhall 5.2M Jul 17 17:10 Z8izkCiMcMqBf,9a3a2smF3C6RAHQmicTi8UIgl7eCZfcRnOUVtwDq4kxrLM61Yc-n6
`

Getting md5 sums for the files.

~/ENCFS/ENC$ md5sum *
027a7ff02f785dbe211a91e9f218bea7  3R4b6qteJLZmTzMGit2cWajhzkVG,rLw6xry2PujL4LSbtg3EgftaMWfhQk4fM5mc-C
bd2635929cfab11349ba5dde73e0b303  EnMiuriWbkgXYz7F9GsAzRlCBSpmHub7kvObd8fyFswVkhlUi-FzHaT7twrIHXsEOy5
08786c915984a10532a2485f0e21ee4e  Z8izkCiMcMqBf,9a3a2smF3C6RAHQmicTi8UIgl7eCZfcRnOUVtwDq4kxrLM61Yc-n6

If you edit one file, just one changes

~/ENCFS/ENC$ md5sum *
73a2f5ec8389aba5806b5b3eb267ed72  3R4b6qteJLZmTzMGit2cWajhzkVG,rLw6xry2PujL4LSbtg3EgftaMWfhQk4fM5mc-C
bd2635929cfab11349ba5dde73e0b303  EnMiuriWbkgXYz7F9GsAzRlCBSpmHub7kvObd8fyFswVkhlUi-FzHaT7twrIHXsEOy5
08786c915984a10532a2485f0e21ee4e  Z8izkCiMcMqBf,9a3a2smF3C6RAHQmicTi8UIgl7eCZfcRnOUVtwDq4kxrLM61Yc-n6

If you create a directory structure, it is recreated in the encrypted folder with an obfuscated name

~/ENCFS/CLEAR$ mkdir -p one/two/three
~/ENCFS/CLEAR$ mv Developing_Backbonedotjs_Applications.epub one/two/three/
~/ENCFS/CLEAR$ cd ..
~/ENCFS$ tree ENC/
ENC/
├── EnMiuriWbkgXYz7F9GsAzRlCBSpmHub7kvObd8fyFswVkhlUi-FzHaT7twrIHXsEOy5
├── ViMb6X99i1TkJiAoyT-FHKTE
│   └── 7EVvTVP53WUQF0SiJFhG-jzO
│       └── azretaLwQ40MO38NDhqOlNXJ
│           └── mpIK4ZSAq0gRwTeQWOaJRE,rbxSAAW0V4Cp8VdGXFj37q6IRkHVqdwPeSU0sq0JGQi4
└── Z8izkCiMcMqBf,9a3a2smF3C6RAHQmicTi8UIgl7eCZfcRnOUVtwDq4kxrLM61Yc-n6

This fact means that it will play nice with file based backups, except if they do a diff style copy as the whole file is likely to look changed with each update.

To remount, simply run the same command as before and enter your password

$ encfs ~/ENCFS/ENC/ ~/ENCFS/CLEAR/
EncFS Password: 

You can put the encrypted folder in your Dropbox, copy it to S3, rsync/scp it to a VPS somewhere and they will never be able to see your data, neither will the NSA.

OSX

Of course, if you use OSX - it’s easier. According to Emmanuel Bernard you just have to remember that the encfs in homebrew uses fuse4x which is now deprecated in favour of OSXfuse and simply do

brew install https://raw.github.com/jollyjinx/encfs.macosx/master/encfsmacosxfuse.rb

from a random github repo and wait while it compiles.

Windows

I did not know that fuse has been ported to windows as dokan and has encfs available here, should be easy enough to set up.

Android

Cryptonite looks good, but will not be as easy to set up as the others

Heml.is Snakeoil

I have been meaning for a while to post about using public key crypto to secure cloud backups on services you can’t trust (ie all of them) but the recent launch of Heml.is made me get nerd-rage the other day and I just have to say something.

To quote them a little:

Open Source

We have all intentions of opening up the source as much as possible for scrutiny and help! What we really want people to understand however, is that Open Source in itself does not guarantee any privacy or safety. It sure helps with transparency, but technology by itself is not enough. The fundamental benefits of Heml.is will be the app together with our infrastructure, which is what really makes the system interesting and secure.

Your server only?

Yes! The way to make the system secure is that we can control the infrastructure. Distributing to other servers makes it impossible to give any guarantees about the security. We’ll have audits from trusted third parties on our platforms regularly, in cooperation with our community.

Technology like public key crypto does not rely on particular servers, that it works on insecure transports is kind of the point (unless they know better than 1000s of mathematicians and security experts over the last 4 decades)

I’m glad to see they use some of the existing work (PGP and XMPP), and good on them for raising 150k to build a pretty messaging app, but until they explain how only their servers can successfully pass around PGP encrypted messages then I’m calling bullshit.

What to do instead?

If you want secure IM encryption now just use libotr with XMPP and be interoperable

Take a look at the brilliant (and superbly named, I so wish I thought of it) Prism Break for more ways to protect yourself from snooping, not as colourful as Heml but ready to install on any operating system, for free, right now.

Moving on From Forward

After 2.5 years at Forward I am moving on.

The time there has been great, I worked with some of the smartest people I know on some interesting projects. Things that are now core to me I used for the first time there - Hadoop, Hive, HBase, Node.js, Puppet, Clojure and I got a lot of exposure to ideas that have changed the way I think about solving problems. Working on a system handling 100s of millions of requests a day in pretty much all regions of EC2, merging 2 VMware infrastructures, trying to make sense of terabytes of data in Hadoop, speeding up and moving a bunch of high traffic action sports sites to EC2 in dealing with what was nicknamed the Greek bailout of tech debt, stream processing in Esper and Storm, being able to host DevOps meetups and the Clojure Dojos there, getting involved in the Coder Dojo, becoming a trustee of the Forward Foundation, plus the sum of all those tiny interactions that being in a place full of people who know more than you and are keen to share permits. My team in particular - fierce smart, results focused, metrics driven and with a wonderful shared simplicity aesthetic. I will miss it.

I am moving to FutureLearn, not much to see at the moment but an exciting take on the MOOC (Massive Open Online Course) concept. Owned by the Open University and building on their 40+ years of distance learning experience, partnering with the British Museum, the British Library and dozens of our top universities. I think it has a good chance to be a big part of the growing MOOC ecosystem. Anyone that knows me will know this is exciting and massively aligned with my values and interests. At the interview they spoke about pedagogy and engagement and really impressed me with their ambition, I will work hard to help realise it.

MOOC kind of started with the AI and ML courses last year and the two companies that came from them, Coursera and Udacity, are doing a great job, as is EdX but I’ve been following along with things from MIT’s OpenCourseware for years and lots of universities have always shared resources online. I think there is space to make something a little different, increase engagement and we have some amazing partners so keep an eye on us.

101 Goals in 1001 Days, Again

Some of you will remember my last 101 Goals In 1001 Days, it led to quite a lot of fun, gave me something to aim at over an extended period and completely transformed my fitness. People laughed when I said I would do a marathon and now I’ve done 2, I’m glad I surprised them.

I like that now people ask me “What adventures have you got planned?” and have been a bit sad about not having so much on the horizon and have been thinking about doing another 101 goals almost as soon as the last finished.

I held off officially starting as I have an ankle injury and a few of them are running and triathlon goals and I was still finalising the list, but I’ve decided to stop putting it off and just announce it. Let me know if you want to join in any.

I didn’t want to carry over any that I failed to do last time, but 3 on there are repeats I could not resist trying for again. I think the list is more realistic this time - less long and expensive trips on there, more ‘stepping stone’ type goals towards bigger ones.

Reading

1 - Read King James Bible
2 - Read Quran
3 - Read Paradise Lost
4 - Read Beowulf
5 - Memorise opening of Beowulf
6 - Read the Aeneid
7 - Read 5 old english poems

German

8 - Read Faust
9 - Read Kant
10 - Read Marx
11 - Learn every day for 6 months

Fitness

12 - Attend MMA Classes for 6 months
13 - Fight in a cage
14 - Squat 150kg
15 - Deadlift 150kg
16 - Press 150kg
17 - 100 press ups
18 - 10 pullups

Nature/Travel

19 - Trans Siberian Railway
20 - Meet David Attenborough
21 - See bears in the wild
22 - Okavango Delta

Running

23 - 20 min 5k
24 - 45 min 10k
25 - 2h Half Mara
26 - 4h30 Mara

Learning

27 - A Student’s Introduction to English Grammar
28 - Watch all Teaching Company Linguistics Courses
29 - Finish 10 online courses
30 - Fill all country names on a blank world map from memory
31 - Learn to dance
32 - Get some sort of postgrad qualification
33 - Learn To Read Hieroglyphs
34 - Learn to tapdance

Ancient Civilisations

35 - Visit Çatalhöyük
36 - Cork Megaliths / Brú na Bóinne / Skellig Michael
37 - Skara Brae
38 - Uffington White Horse

Hacking

39 - Finish 4clojure
40 - Seasoned Schemer
41 - Reasoned Schemer
42 - AIMA in clojure
43 - Build a robot
44 - Write a Go AI using Monte Carlo Tree Search
45 - Start using paredit
46 - Start using org-mode
47 - Write me a scheme in 48h
48 - Purely Functional Data Structures
49 - Structure And Interpretation Of Classical Mechanics
50 - Take part in Lisp In Summer Projects
51 - Write a goal tracking site
52 - Get something on KickStarter

Cycling

53 - Long distance cycle
54 - London -> Brighton
55 - Hardknott Pass
56 - Cycle to Amsterdam

Swimming

57 - Quarterly Wild Swim
58 - Swim 100
59 - Swim 500m
60 - Swim 1 mile
61 - Swim the Hellespont
OK this is a bit of a ‘stretch’ goal :-D

Random

62 - Visit Mosslands (my secondary school)
63 - Visit 24th Wallasey (my old Scout troup)
64 - Visit Highclere Castle
65 - Play the harmonica
66 - Complete every Mario Game
67 - Write a book on infinity
68 - See Electric Six again

Triathlon

69 - Sprint
70 - Olympic
71 - Half Ironman
72 - Ironman

Outdoors

73 - Do bigish multi-pitch climb
74 - Get Single Pitch Award
75 - Climb HVS
76 - Ice climb
77 - Mountaineer a good grade
78 - Mt Blanc
79 - Elbrus
80 - Nehru Institute Of​​ Mountaineering
This was on last time but I so so want to do it
81 - Multi-day canoe trip
82 - Sea kayaking
83 - Summer ML
84 - Winter ML
85 - Horseriding
86 - Catch and cook something
87 - Spend a night in a snow hole

Walking

88 - Walk London Loop
89 - Walk Capital Ring
90 - Yorkshire 3 peaks
91 - Do 20 munroes
92 - Do a UK long distance path at >30miles/day
93 - Walk 50 miles in a day
94 - Walk 75 miles in a day
95 - Walk 100 miles in a day
96 - Write an ebook on UK ultralighting
97 - Irish 3000ers
98 - Welsh 3000ers

Lifestyle

99 - Start a not (just) for profit venture
Being a trustee of the Forward Foundation I have been impressed by people doing amazing work in London and Africa and want to do something myself.

100 - No internet for a month or a day a week for 6 months
101 - Have a 3 month break from work
I am tempted by Hacker School but will see what I wind up doing, plan is to let side projects be the only projects for a while.

There it is then, any you want to join in on let me know.

Evolving Cellular Automata - the Code

My last post about automata was light on code, mostly because I got tired and lost some time doing the simulations inside the post in a way that worked in Firefox and Chrome.

This was as a warmup exercise for Lisp In Summer Projects, if you like Lisp - get involved!

Just after starting on it I noticed David Nolen had ported a demo of Minecraft in Javascript by @notch to clojurescript, keeping it fast using a few macros and sticking to using straight-up JS datatypes while staying as functional as possible, check it out. Particularly impressive is the output code being ~400 lines due to the Google Closure compiler.

You may want to read up on Genetic Algorithms in general first, but in short you

  • find a way to encode a particular solution to the problem as a genome
  • start with an initial population of random genomes
  • work out their ‘fitness’ (ie how well they perform some task)
  • choose the next generation by selecting and ‘breeding’ them (with some mutation for novelty)
  • repeat until a good solution appears

Remember the goal is to evolve a strategy to solve the Majority Problem using a 1D cellular automata with radius 3. I opted to have the main population on the server, do the selection and breeding there but have the fitness simulations (hopefully most of the computation needed) in browsers. This meant I had to fudge the generations thing a bit: So no results are wasted the workers get a sample of the population and post back the fitnesses and the population grows until at some point I shrink it, using the same selection method (Fitness proportionate selection)

Go here and help it evolve if you have not already

Evolving Cellular Automata

This post is about evolving Cellular Automata to perform computation. The example I use is from the book Complexity - A Guided Tour

The most famous cellular automata is Conway’s Game of Life, a 2D automaton with simple rules to determine whether each cell lives or dies on each iteration based on the adjacent cells.

Above is the well known Glider Gun that is at the heart of many complex Life configurations, including a Universal Computer. A perfect example of complex results arising from simple rules.

The automata we are looking at today are much simpler and one dimensional.

Above is ‘rule 30’, so called because of its Wolfram Code (see here on the Mathworld site for more information). It is a binary (has two states), nearest-neighbour (each cell can “see” its 2 neighbours), one-dimensional automaton. As you can see it can be in 8 ($2^{3}$) different states and the binary below it is equal to 30. The bottom half of the picture is a ‘space-time’ diagram, the top row is the initial configuration and each row beneath represents the next step after each cell lives or dies according to its rule.

The Majority Problem

The 1D automata I am interested in are binary also, but each cell’s neighbourhood includes its 3 adjacent neighbours on both sides (ie $2^{7} = 128$ different states as it can see itself and 6 neighbours). The world wraps like a torus so the 0th cell is adjacent to the 100th in our 101 cell world. The problem that we want to solve is the Majority Problem, ie if the initial configuration is mostly alive/dead after some number of steps all should be alive/dead, if so then the rule has successfully performed the task.

Go here and watch it evolve if you have not already.

Backblaze Storage Pods Available in EU

A while ago I saw the Backblaze storage pod and was impressed, super cheap and space for 45 drives in 4U.

I found a place to have them made in the UK (in matt black, not the Backblaze red) and though I sold most of them I have still got a few left for £800 each (including the nylon stand-offs and the Port multipliers), mail me at thattommyhall@gmail.com if you are interested, if you need more than one I can give you a discount too.

After the first run I had them completely done by the fabrication place so they are precision assembled (I was hopeless at it, almost as hopeless as I am at selling things.)

ZFS

As soon as you see so many disks in a case like that, it’s hard not to think of Sun’s Thumper and ZFS.

I’ve blogged about ZFS before and given talks on it. With so many disks to fail (either noisily or silently)  data loss is inevitable (and worse - you may not even be alerted), ZFS would solve this (or at least ensure you know about it). BackBlaze use custom application logic to work around this, using TomCat and HTTPS.

It’s Not Highly Available

An ex-Sun guy has a critique here that is totally spot on and he makes a few great points about subtle changes to Sun’s design to accommodate vibration, noise and electromagnetic radiation. In so many ways the hardware is inadequate and does not have the uptime characteristics of an enterprise SAN/NAS. There are however a few smart software solutions to work around hardware failures, so the availability of a particular device is not so important.

It wont be fast

That is largely a feature of the disks and the controllers, using the Port Multiplers slows you down too. A very cool feature of ZFS Hybrid Storage Pools allows for using SSD as a second level cache, that would help.

In linux dm-cache or one of these could probably achieve something similar.

How can you make it HA?

The landscape has shifted a little since I last blogged about it, Ceph and RiakCS being interesting additions. Ceph has an object store, block device and POSIX Filesystem with distributed metadata in the works, the one to watch I think.

Filesystem

Object Store

Block Device

Moving DNS From Rackspace to Amazon Route53 Using Fog

Had to move a few domains and knocked up this script. It mostly just glosses over some differences in the formatting of the records, A and CNAME records translate pretty well but TXT and MX are combined in R53 and have separate attributes in Rackspace. We didn’t have any other record types.

I opened an issue on Fog for strange behaviour if you have 100+ domains in Rackspace.

require 'fog'
require 'pp'

RACKSPACE = {
  :provider => 'Rackspace',
  :rackspace_api_key => 'EDIT' ,
  :rackspace_username => 'EDIT'
}

R53 = {
  :provider => 'AWS',
  :aws_access_key_id => 'EDIT',
  :aws_secret_access_key => 'EDIT'
}

class Provider
  def initialize(connection_spec)
    @connection = Fog::DNS.new(connection_spec)
  end

  def get_records(domain,types=nil)
    types = %w[CNAME A] unless types
    get_domain(domain).records.select{|r| types.include? r.type}
  end

  def get_domain(domain)
    @connection.zones.select{|z| z.domain == domain  or z.domain == add_dot(domain)}[0]
  end

  def create_zone(domain)
    unless get_domain(domain)
      @connection.zones.create(:domain => domain)
    end
  end

  def new_record(domain,name,value,ttl,type)
    name = name.downcase
    # some of the rackspace entries had uppercase but when added to R53 it went to downcase.
    existing =  get_records(domain,[type]).select{|r| add_dot(name) == r.name}
    unless existing.empty?
      puts "#{name} => #{value} (#{type}) EXISTS"
      return
    end
    puts "#{name} => #{value} (#{type})"
    get_domain(domain).records.create({
                                        :name => name,
                                        :value => value,
                                        :ttl => ttl,
                                        :type => type
                                      })
  end

  def add_dot(s)
    return s if s[-1] == '.'
    "#{s}."
  end
end

def migrate(domain)
  puts "MIGRATING #{domain}"
  r53 = Provider.new(R53)
  rs = Provider.new(RACKSPACE)
  r53.create_zone domain
  rs.get_records(domain).each do |record|
    r53.new_record(domain, record.name, record.value, record.ttl, record.type)
  end
  rs.get_records(domain,["TXT"]).group_by{|r| r.name}.each_pair do |name,records|
    texts = records.map(&:value)
    texts.map!{|t| "\"#{t}\""}
    r53.new_record(domain, name, texts , records[0].ttl, "TXT")
  end
  rs.get_records(domain,["MX"]).group_by{|r| r.name}.each_pair do |name,records|
    entries = records.map{|r| "#{r.priority} #{r.value}."}
    r53.new_record(domain, name, entries , records[0].ttl, "MX")
  end
end

DOMAINS = %w[something.com somethingelse.net]

DOMAINS.each do |d|
  migrate d
end

Typesetting Maths, Pretty Syntax Highlighting

So, the Wordpress import went OK - using exitwp. It was not perfect, but I fixed up the problems in the top 10 posts in analytics (annoyingly most of my traffic goes to this old post about measuring IOPS in Linux)

Some of my posts had used $\LaTeX$ to typeset mathematics so I wanted to fix that up and found these instructions

I can do inline $\LaTeX$ like $e^{i\pi} + 1 = 0$ and blocks like $$ \imath\hbar\frac{\partial}{\partial t}\Phi (x, t) = \hat{H}\Phi (x, t) $$ Pretty nice, the only downside is switching to kramdown has stopped the triple backtick code block filter from working.

Now I didnt particularly like the syntax highlighting and stumbled across Gorgeous Octopress Codeblocks with CodeRay, which looked nice and also required kramdown (so maybe it’s a good idea…). I am not good at frontend stuff and have not looked into how exactly all the filters etc work but am not quite happy using the {% coderay %} filter for codeblocks in place of the triple backticks, somehow liquid just feels less nice in a markup that’s supposed to be human-readable.

(ns euler55
  (:require [clojure.string :as string]))

(defn palendrome? [s]
  (= s
     (string/reverse s)))

(defn is-lychrel?
  ([n] (is-lychrel? n 0))
  ([n depth]
     (let [numstring (str n)]
       (cond (and (> depth 0)
                  (palendrome? numstring))
             false
                                                 
             (> depth 50)
             true 
                                                 
             :else
             (recur (+ n
                       (bigint (reverse-string numstring)))
                    (inc depth))))))

(println (count (filter is-lychrel? (range 10001))))

Again, pretty nice. I may try and extend the formatting to the gist embeds and get the triple backtick thing to use CodeRay.

Started Using Octopress

Well, I have long been wanting to get away from Wordpress and begin to blog again a little more.

I have decided to use Octopress, if it does not do what I need (like typeset maths etc, then hopefully I can fix it. I can write posts in Emacs and only need to have static file hosting so dont have to worry about the expense.

Time to see if I can do an export from Wordpress!