I was pleased to be speaking at the Clojure Exchange recently (at least it was recently when I started writing this blog post). I suggested to Bruce that a talk on concurrency and parallelism would be good because since most of the books were published reducers and core.async have appeared giving us even more options in Clojure and we have a new runtime in Clojurescript (I also ranted that our books get it a bit wrong).
The goals of the talk were:
Give people solid working definitions of concurrency and parallelism
Refresher on reference types (and that you don’t automatically get parallelism with them)
Show some easy ways to get parallelism in your programs
Demo a nice pattern serialising access to something non-threadsafe with an agent
Explain Reducers as a way of getting parallelism (and the idea reduce/fold itself is no good for it)
Introduce CSP via core.async as a way of programming concurrently (that sometimes runs in parallel)
Point people at people way smarter than me talking for a bit longer than I had about just one of the things I tried to
Slides are here, video here but I will expand on a few ideas here too
Concurrency is not parallelism
It has been said better by Rob Pike in this talk but the definition is:
Concurrency is the composition of independently executing ‘processes’
Parallelism is the simultaneous execution of (possibly related) computations
Concurrency is about the way we structure programs
Parallelism is about the way they run
The nice thing about this definition is it’s then easy to find examples of concurrent but not parallel (running multiple threads on a single core) and parallel but not concurrent (instruction-level parallelism, data parallelism), and even when concurrency and parallelism are both present but to a different extent (running 50 threads multiplexed across 4 cores).
Joe Armstrong in Programming Erlang
I like this one as it brings out the two main ways of doing concurrency, either by message passing with no shared state or by managing access to the shared state.
I have the three main books on Clojure so took a look at how they introduced the idea
Both definitions assume shared state, though it captures the idea that parallel means running at the same time and that to get the most from your hardware you want to avoid coordination overhead.
Joy Of Clojure
I don’t like ‘roughly the same time’ here, and again seems to presume shared state.
I think that saying parallel programs execute concurrently is wrong (replace it with ‘at the same time’ and I think it is right) and the first bullet misuses ‘concurrent’ at least once.
This is not to say it is a massive deal and stops people understanding and using the features of Clojure the books go on to explain, just that until seeing Rob’s talk I don’t think I could have given a clear definition of what the terms mean.
Again this was just a refresher and has probably been done better elsewhere.
All of the reference types hold a value and are updated by functions with the same signature
(<changer> reference f [args*])
All can be derefed with @reference or (deref reference)
The main difference between them if they are synchronous or asynchronous, coordinated or uncoordinated.
Synchronisation - Does the operation block?
Coordination - Can multiple operations be combined in a transaction?
(Must be inside a dosync transaction)
send (executed on a bounded thread pool)
send-off (executed on an unbounded thread pool)
Using agents to serialise access to non-threadsafe resources
If we start multiple threads and start writing to the console with println from them, very quickly you start to get overlapping writes.
One way around it is to wrap the contested resource in an agent
Here the write function takes something writeable and a string, writes the string to it with a newline at the end and then (importantly) returns the object (this is the behaviour of doto). This allows it to be passed to an agent, this way the value of the agent is always the writer and as the agent serialises application of the functions passed into it the writer is only written by one function at a time.
The guidance for whether to send or send-off to an agent is if the function does any IO or not, send-off is for IO operations and is on an unbounded pool whereas send is for CPU bound operations and is on a fixed-size pool. Either way the message queue is unbounded and you might eventually suffer memory issues (this is one of the reasons core.aync and CSP are better as they provide back-pressure and make you think up front about buffering and queue size)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.RuntimeException: Agent is failed, needs restart
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
Agents and STM
Thanks to Philip Potter for reminding me to mention that agents are integrated with STM, any dispatches made in a transaction are held until it commits, and are discarded if it is retried or aborted so you can use them to safely trigger side-effects from within transactions. See the docs for more.
Delays and futures
runs code on deref, blocks
runs code in another thread (from a pool)
blocks on deref (till ready)
So now we have a chance at some parallelism!
We can use future as a way to just kick off new threads, so we can use future where we had start-thread (and we gain a bounded thread pool to run them on)
pmap and friends
Here is a version of a function called pmap that captures that idea
So we can do
and it will take 1s or thereabouts.
The actual pmap looks like this, notice the warning that f needs to be expensive enough to justify the overhead
It also has a few friends
Waiting for futures
You need to call shutdown-agents to exit when the agents/futures are done.
;; Note: If you leave out the call to (shutdown-agents), the program will on
;; most (all?) OS/JVM combinations "hang" for 1 minute before the process exits.
;; It is simply waiting for background threads, created by the future call, to
;; be shut down. shutdown-agents will shut them down immediately, or
;; (System/exit <exit-status>) will exit immediately without waiting for them
;; to shut down.
;; This wait occurs even if you use futures indirectly through some other Clojure
;; functions that use them internally, such as pmap or clojure.java.shell/sh
Just pushed the bits I was playing with up to github
I hope that was interesting, I am going to stop for now as it has already taken 2 months to write this. I have already described CSP and core.async here and here so won’t cover that again but I will try and describe reducers (and why reduce is considered harmful) but to be honest you would probably be better just looking at my references from the talk (especially the Guy Steel talk)
Sounds easy but I have not really swam for 20 years, managed to remember to breath enough to do this while on holiday in Greece in July. Some more swimming goals to come.
50 - Take part in Lisp In Summer Projects
I wanted to do an alife sim using Genetic Programming, while the sim itself is a bit raw I blogged about Genetic Programming and Cellular Automata work I did along the way. I loved the ‘hack month’ style of the contest, giving longer than the typical weekend to comeplete it.
44 - Go AI with Monte Carlo Tree Search
For the Clojure Cup, Jamie and I did one in the browser in Clojurescript. I am afraid I leaned on him quite a bit as I had to disappear and do the Shine London nighttime walking marathon. He bloged about it and later some optimisations.
Trivial really but I just needed to sit down and do it at some point. Long overdue and if you paren and dont use it, you should, it’s awesome
40 - Seasoned Schemer
Great book, I just wish I came to it sooner as not much was new to me.
43 - Build a robot
I decided to make my Raspberry Pi the brain of a car.
I did a bit of research and found I needed to get a RasPiRobot to drive motors so I got one and did my first bit of soldering in years. Simon Monks book is excellent and describes pretty well how to build a robot with the Magician Chassis
I went to NodeBots of London and it was an amazing day, the venue and the people were great. I arrived late so only had time to do the soldering of the RasPiRobot but did later get it hooked up to the Pi. I was lucky I bought both kits as the battery thing from the 4WD does not work with the RasPiRobot so I had to use the magician one.
Last week I joined the London Hackspace so should get this finished and be up to more roboty stuff in future.
39 - Finish 4clojure
I think Christmas 2011 I did my first ‘hard’ problem after getting bored of the easy ones, took me an entire day to do it. I did it intermittently in 2012 and had ~30 left coming up to Christmas, I got it to 25 by December 1st so I could do it as an advent calender and got down to (I think) 6 left by the new year. Some have since been added and I had another push last month so I am down to 2 and have finally caught up with Jen (who had completed them all until the recent additions). I think this Christmas it’s getting done at last!!!
57 - Quarterly Wild Swim
I swam in the sea in Greece, bit scary after having only just swam again for first time in ages
5 - Memorise Beowulf Opening
As discussed above
28 - Watch all Teaching Company linguistics courses
I have pretty much come to the conclusion that till I have more time I just cannot keep up with their artificial deadlines so have taken to watching them at my own pace. I don’t get much value from the ‘massiveness’ of MOOCs (typically ‘there is a forum over there and deadlines every week’) and lots of them I would learn more just spending my time reading as the videos are not particularly good. I do think it is worth finding the real gems though.
42 - ‘Artificial Intellegence - A Modern Approach’ in Clojure
I started this a year ago, translating the Python and Common Lisp code examples into Clojure. I think having done the AI course I now know a lot more of the content of the book so I should revisit it soon.
17 - 100 Pressups
As I could not do one after breaking both my elbows I am pleased I can now do 20
65 - Play Harmonica
Well, I say ‘planning’ - my sister bought me one at least ;-)
I had some fun doing it last year so I thought I would have a go again. This weekend was the qualifier. You can see the questions here
We basically find the bottom leftmost and top rightmost black cell and check that the count of black cells is the same as the area of the square they describe
If the input was bigger then I would have found min/max in a single reduce but I like the clarity here and naming things is good right?
Here we sort the players according to the rule, get the two teams using take-nth starting at the first or second.
The order the players come onto the pitch is a bit tricky, the ones with lowest draft numbers are on first but then they leave in order of highest first then the people on the bench come on lowest first, so the list of players in the order they come on the pitch is (concat (reverse (take p team)) (drop p team)) which we cycle so it’s infinite then use (partition p 1 cycled-team-list) to get a sliding window of P players moving up in ones. A speedup I did not do would have been using modular arithmetic for large m (both cycles repeat in the team size so we could replace m with their LCM rather then taking the mth of the infinite list directly)
I did not get this one right, I thought the tree was too big to actually sum all of the different ways Tennison can win so tried a Monte Carlo approach which works but is too slow to converge on the precision required.
After looking at the solutions they suggest a nice dynamic programming approach, but still they do calculate every way tennison can win, here is a clojure version that I think is still a bit slow on the input. I could not find a way to use the memoize function for the recursive function.
Please let me know any improvements to tennison you can think of, code is on github
I had the idea to do an asynchronous Game Of Life in the browser using clojurescript’s core.async. I thought it would be fun if there was no global state, no external time and no datastructures - have the cells be channels just signalling to their neighbours when they transitioned between dead and alive.
I will assume you have seen my other post with resources to learn CSP and core.async
I think the implementation is true to the idea of Asynchronous cellular automata but fear it has bugs as the steady state often settles on something that is not a valid GoL position (in particular I often see clumps of alive cells with too many neighbours, all surviving)
As it is only 100 lines I may as well walk through them here.
Fairly uninteresting initialisation, prob is the probability a cell is alive/dead at the beginning. I made width height and cell-size atoms because I reset them all on calling init later.
Standard canvas drawing stuff
I thought it neat to have the draw function as a channel too, the timeout here sets the velocity of the simulation via backpressure on the unbuffered channels
The hardest bit to explain (and probably where the remaining bugs are…)
cell takes an [x y] position and returns 2 channels, one for input that neighbours can use to inform it about their alive/dead transitions and one to be told of new neighbours, via passing in their input channel, so it’s a channel of channels :-)
Each cell creates a single go block that listens for new neighbours and state transitions from neighbours on it’s input channels. Whenever something comes from a neighbour (either 1 or -1 to signal it becoming alive or dead) it updates it’s neighbour count, works out if it’s state transitions and tells it’s neighbours if it did and also sends draw a signal too. It is all kicked off by the alive cells telling their new neighbours they are alive as they connect.
There is an asymmetry in the alive/dead transitions, 3 neighbours
Getting neighbours for a position
This is the loop that kicks everything off (OK I cheat and have a collection here). We loop through the [x y] positions and create a cell (ie an [input new-neighbour] pair) for each and store the result in the cells map. Then we look up the neighbours for each cell and then pass into their new-neighbour channel it’s input channel. This starts our cascade.
The init function just sets the atoms, grows the canvas to fill the space it is given, draws the empty grid then kicks off draw-loop
Hope you enjoyed it, let me know if I can clear up anything (and if you spot a bug).
The select statement is a control structure somewhat like a switch that lets you control the behaviour of your program based on what communications are able to procede at any moment. In fact this select statement is a key part of why concurrency is built into Go as features of the language rather than just a library. It’s hard to do control structures that depend on libraries…
This is where the power of macros is apparent, we can (somewhat) easily add CSP to clojure as a first class citizen alongside all the other great concurrency models that have existed since it was created. It is amazing that core.async works on the JVM and in clojurescript.
I can’t resist including one example myself as it’s so neat a programming model
Here we create 2 channels and 3 go blocks that are to be thought of as acting concurrently, the first one is an infinite loop pulling off both channels and printing the message received (alts! gets whatever result is ready from the list of channels provided), the next two simply send (with the >! function) the strings “hi” and “there” into c1 and c2 respectively.
This post is another update to my Lisp In Summer Projects project. I blogged before (12) about warm-up exercises but I have not yet explained the goal, I intend writing an Artificial Life simulation where the organisms themselves are Lisp programs that runs (via clojurescript) inside peoples browsers.
You may also want to check out a video of a talk I gave (in part) about this stuff for the london clojurians, slides here
After doing Genetic Algorithms, both in Coffeescript and Clojurescript the next step was to look at Genetic Programming. GP is the same as GA except that the genome is the AST of a programming language (Lisp makes it easier as S-Expressions are directly manipulabele).
Take a look a the Wikipedia Article before we continue with how I did it in Clojure using zippers.
As I said in the other post about GA, the basic idea is to:
find a way to encode a particular solution to the problem as a genome
start with an initial population of random genomes
work out their ‘fitness’ (ie how well they perform some task)
choose the next generation by selecting and ‘breeding’ them (with some mutation for novelty)
repeat until a good solution appears
In the GP world the first step changes, the initial genomes are valid programs rather than fixed length byte sequences. The ‘breed’ function is more complicated too, where before we could get away with choosing a random ‘split point’ and taking the left part of one and the right part of its mate and vice versa we have to be careful about where we slit and how we join the S-Expressions, also the length is not bounded anymore.
Enter The Zipper
In Clojure zippers are found in clojure.zip and I’ll quote the description given there:
A zipper is a data structure representing a location in a hierarchical data structure, and the path it took to get there. It provides down/up/left/right navigation, and localized functional ‘editing’, insertion and removal of nodes. With zippers you can write code that looks like an imperative, destructive walk through a tree, call root when you are done and get a new tree reflecting all the changes, when in fact nothing at all is mutated - it’s all thread safe and shareable. The next function does a depth-first walk, making for easy to understand loops.
$ encfs ~/ENCFS/ENC/ ~/ENCFS/CLEAR/
Creating new encrypted volume.
Please choose from one of the following options:
enter "x" for expert configuration mode,
enter "p" for pre-configured paranoia mode,
anything else, or an empty line will select standard mode.
Paranoia configuration selected.
Configuration finished. The filesystem to be created has
the following properties:
Filesystem cypher: "ssl/aes", version 3:0:2
Filename encoding: "nameio/block", version 3:0:1
Key Size: 256 bits
Block Size: 1024 bytes, including 8 byte MAC header
Each file contains 8 byte header with unique IV data.
Filenames encoded using IV chaining mode.
File data IV is chained to filename IV.
File holes passed through to ciphertext.
New Encfs Password:
Verify Encfs Password:
Then copy a few files into CLEAR (the unencrypted folder), you can see there is the same number and size of files in the encrypted folder
I have been meaning for a while to post about using public key crypto to secure cloud backups on services you can’t trust (ie all of them) but the recent launch of Heml.is made me get nerd-rage the other day and I just have to say something.
To quote them a little:
We have all intentions of opening up the source as much as possible for scrutiny and help! What we really want people to understand however, is that Open Source in itself does not guarantee any privacy or safety. It sure helps with transparency, but technology by itself is not enough. The fundamental benefits of Heml.is will be the app together with our infrastructure, which is what really makes the system interesting and secure.
Your server only?
Yes! The way to make the system secure is that we can control the infrastructure. Distributing to other servers makes it impossible to give any guarantees about the security. We’ll have audits from trusted third parties on our platforms regularly, in cooperation with our community.
Technology like public key crypto does not rely on particular servers, that it works on insecure transports is kind of the point (unless they know better than 1000s of mathematicians and security experts over the last 4 decades)
I’m glad to see they use some of the existing work (PGP and XMPP), and good on them for raising 150k to build a pretty messaging app, but until they explain how only their servers can successfully pass around PGP encrypted messages then I’m calling bullshit.
Take a look at the brilliant (and superbly named, I so wish I thought of it) Prism Break for more ways to protect yourself from snooping, not as colourful as Heml but ready to install on any operating system, for free, right now.
The time there has been great, I worked with some of the smartest people I know on some interesting projects. Things that are now core to me I used for the first time there - Hadoop, Hive, HBase, Node.js, Puppet, Clojure and I got a lot of exposure to ideas that have changed the way I think about solving problems. Working on a system handling 100s of millions of requests a day in pretty much all regions of EC2, merging 2 VMware infrastructures, trying to make sense of terabytes of data in Hadoop, speeding up and moving a bunch of high traffic action sports sites to EC2 in dealing with what was nicknamed the Greek bailout of tech debt, stream processing in Esper and Storm, being able to host DevOps meetups and the Clojure Dojos there, getting involved in the Coder Dojo, becoming a trustee of the Forward Foundation, plus the sum of all those tiny interactions that being in a place full of people who know more than you and are keen to share permits. My team in particular - fierce smart, results focused, metrics driven and with a wonderful shared simplicity aesthetic. I will miss it.
I am moving to FutureLearn, not much to see at the moment but an exciting take on the MOOC (Massive Open Online Course) concept. Owned by the Open University and building on their 40+ years of distance learning experience, partnering with the British Museum, the British Library and dozens of our top universities. I think it has a good chance to be a big part of the growing MOOC ecosystem. Anyone that knows me will know this is exciting and massively aligned with my values and interests. At the interview they spoke about pedagogy and engagement and really impressed me with their ambition, I will work hard to help realise it.
MOOC kind of started with the AI and ML courses last year and the two companies that came from them, Coursera and Udacity, are doing a great job, as is EdX but I’ve been following along with things from MIT’s OpenCourseware for years and lots of universities have always shared resources online. I think there is space to make something a little different, increase engagement and we have some amazing partners so keep an eye on us.