Archive for May, 2012

Playing with Go: a concurrent quicksort

Thursday, May 31st, 2012

A little while ago google released a concurrent programming language called go that appeals to my curiosity as both an academic and a guy who likes to write code. I road tested it by coding up a concurrent quicksort implementation.  This summarizes my generally positive impressions of the language.

Go appeals to me as an academic because it supports the CSP concurrency model.  CSP, introduced in a paper by  C. A. R.  Hoare,  is a programming system that supports concurrent threads that interact primarily by sending synchronous messages.  The synchronous messages are unusual in that a message is never in flight.  Sender and receiver have to both stop to exchange the data, so a message exchange is a combination of synchronization – like acquiring a lock – and data exchange.  In order to  put more concurrency into a system where both parties to a data exchange have to synchronize, CSP introduces non-determinism to some language constructs involvoing communication requests.  A programmer can say “I’ve got three messages to send out. If any of the receivers is ready, send to them, and if more than one is ready I have no preference for which message you send.”

This combination encourages designers to think in unusual ways and it’s a fun paper to expose students to.  I’ve often wanted to sit down and code up some algorithm in CSP and see how well the model really works.  Go is probably the highest profile CSP-ish implementation, so it’s attractive.  I say CSP-ish because it keeps the spirit of the paper but extends the ideas in practical ways – communication channels are made independent of thread names and allow multiple threads to communicate on a channel; non-determinism is extended to both reading and writing; things like that.

Before I talk about Go, I’m going to review quicksort and how to make it concurrent.

Quicksort

Quicksort is a fast sorting algorithm that is amenable to concurrent implementation.  Until I sat down to write this, it hadn’t occurred to me that the quicksort was also created by C. A. R. Hoare. What that means –  other than Hoare is a smart guy – I don’t know, but I have a sudden desire to try to work monitors in here as well.

Quicksort is based on the clever observation that one can sort an array by partitioning the elements around a pivot element and then doing the same to each of the partitions.  The partitioning step consists of picking an pivot element in the array, p,  and moving the elements in the array around so that all the elements less than p have indices less than p and vice versa.  Partitioning will usually move several array elements, including p, around.   Notice that the partitioning has put p into the place it belongs in the array – no element will ever move from the set that’s smaller than p to the set that’s larger; once a partitioning step is complete, its pivot element is in its sorted position.

The diagram below illustrates partitioning on a small array of 5 integers (1 through 5).  The two arrows indicate the range of the array to partition (the whole array) and the pivot element is in green.  We arbitrarily pick the first element as the pivot here.  The array on top is the array before partition, and the one on the bottom (below the dotted line) is the partitioned array.

partition

Partition Step

Quicksort is a recursive algorithm – an algorithm that repeats the same procedure on sub-problems to get the answer.  Once the partitioning step is completed, the process starts over again on the two sub-arrays on either side of the pivot element.  The pivot element is in the right place, so it is in neither sub-array.  A partition step on a one-element array or a zero element array returns immediately, without partitioning the sub-arrays.  The figure below shows the complete sorting of our 5 element array using quicksort.

callgraph

One worker sorts an array

Time goes from top to bottom of that diagram, so the final, sorted array is the bottom array (on the right).  Each dotted line is a partition step, and each pair of solid lines is the recursive step of partitioning the sub-arrays.  Each partition step is carried out on the sub-array defined by the arrows that touch the array at that step.  The single thread of control follows the blue arrow.

The quicksort algorithm has been studied a lot and is the basis of most in-memory sort routines.  It is also amenable to a concurrent implementation.  Each step in the recursion – each time the partitioning routing is called – operates on an independent subarray.  That’s the observation that leads directly to a concurrent implementation, as we’ll see below.

Concurrent Quicksort

My simple concurrent implementation – and I’m sure I’m not the first to do it – uses  a collection of worker threads and a coordinator thread.  The coordinator sends a message to an idle worker telling it to sort the array and waits to receive messages from the workers about the progress of the algorithm.

A worker partitions a sub-array, and every time that worker gets ready to call the partition routine on a smaller array, it checks to see if there is an idle worker to assign the work to.  If so, it sends a message to the worker to start working on the sub-problem; if not the current worker makes calls the partition routine itself. 

After each partitioning, two recursive calls are (usually) made, so there are plenty of chances to start other workers.  The diagram below shows two workers sorting the same 5-element array. Each blue line represents the flow of control of a worker thread, and the red arrow represents the message sent from one worker to start the other. (Since the workers proceed working concurrently, it is no longer guaranteed that the smaller elements in the array will be ordered before the larger; what is certain is that the two workers will never try to manipulate the same elements.)

calgraph2

Two workers sort an array

A worker can complete working either because it has directly completed all the work sorting the subarray it was initially called on, or because it has ordered a subset of that array but has passed some or all of the remaining work to other workers.  In either case, it reports the number of elements it has ordered back to the coordinator.  (The number of elements a worker has ordered is the number of partitions of sub-arrays that have 1 or more members).

When the coordinator hears that all the elements in the array have been ordered, it tells the workers that there is nothing left to do, and the workers exit.

That’s the basic idea.  In the code attached, I incorporated some common optimizations to the quicksort algorithm, but the basic idea is that at every step where a worker might call the quicksort routine on a smaller sub-array, it passes that job to another worker if a worker is idle.

So, how do we do that in Go?

Implementing in Go

The whole commented package, including test code, is available. I’m only going to talk about the parts of it that coordinate the multiple workers traversing and sorting the array. Most of the code snippets below are directly from the package, and the one that is not is a minor simplification.

I’m not going to go into great detail about Go syntax here. The language specification, other references, and tutorials at http://golang.org are quite good.

My implementation is wrapped in a class that keeps the sorting parameters together. It holds the channels for communicating between the coordinator and the workers, and between workers, as well as the number of workers to start. In addition, when a sub-array is small enough, my implementation stops using the recursive quicksort algorithm and does a simple insertion sort to finish the job; the size of “small enough” is also stored in the class instance. The data structure that holds these looks like:

// Parameters that hold across the sort
type SortContext struct {
    workers int
    lim int
    work chan *Chunk
    done chan int
}

The number of workers to start is in workers, an integer. If a sub-array has fewer than lim elements (an integer), insertion sort is used. Idle workers listen on work, a channel that carries pointers to chunks of work; a nil Chunk means that the worker can terminate. When a worker completes a chunk of work it sends the number of array elements it put in order to the coordinator on the done channel, which carries integers. The number ordered is the number of recursive quicksort calls plus the number of elements passed to insertion sort.

Starting a worker thread in Go is as simple as prepending the go keyword to a function call.  Starting a set of workers is as simple as:

// Start up a set of workers
for i := 0; i < self.workers; i++ {
    go self.worker()
}

Each of those workers has access to the channels through the enclosing class structure. That loop is how the coordinator starts a concurrent sort.

The worker routine looks like this:

func (self *SortContext) worker() {
    for c := range self.work {
        if c == nil { break }
        self.done <- self.p_qsort(c.a , c.l, c.u)
    }
}

The worker loops, assigning each chunk it reads from the work channel to the c variable as it gets the chunks. If the chunk is a null pointer (nil) the worker breaks out of the loop and exits. Otherwise it calls the parallel quicksort function (p_qsort) with the array to sort and the bounds (c.l is the lower bound and c.u is the upper). That function returns the number of elements it directly ordered, which the worker sends to the done channel, and waits for the next chunk.

That’s fairly terse code, but pretty clear and idiomatic after a surprisingly short time. Of particular note is the for var := range channel idiom, which reads messages on a channel until the sender closes it. This code explicitly exits the loop because there are many potential senders on that channel; each worker may send on it, so no worker closes it.

Multiple workers can execute that loop without interfering with each other’s ability to read messages from the channel. If more than one worker is blocked on self.work and a worker sends a message on it, exactly one of the blocked workers will receive the message. There is no guarantee which, of course.

That construct makes it simple to start as many or as few workers as the environment dictates.

After starting the workers, the coordinator does this:

// Deliver the work to the first worker who listens
self.work <- &Chunk{a, 0, a.Len()}

That chunk of work is to sort the array called a from its first element to its last element. The array a (really a slice for you Go sticklers) is a parameter to the member function that started the sort.

Making that call blocks the coordinator until a worker starts listening for a chunk of work. As soon as one blocks to receive a chunk, that first chunk is delivered, the worker starts working and the coordinator moves on to:

// Count up the ordered elements
for i:=0 ; i < sz; {
    i += <-self.done
}

The coordinator waits for each worker to report completion of each chunk of work it does. When the whole array is sorted (sz is set to the total elements in the array), that loop terminates and the coordinator shuts down the workers by sending them all a nil chunk of work.

// Tell the workers to terminate
for i := 0; i < self.workers; i++ {
    self.work <- nil
}

The only thing left to sketch is how workers start other workers. After each partition step, a worker has (at most) 2 sub-arrays that need to be partitioned. For each subp-array this worker does the following (where l and u are the lower and upper bound of the subarray to be partitioned):

select {
    case self.work <- &Chunk{a, l, u}:
    default:
        rv += self.p_qsort(a, l, u)
}

The select statement will send the message (self.work <- &Chunk... ) if there is a worker listening on self.work. If not, the code will choose the default branch, which is a recursive call on self.p_qsort.

A couple details: The case: self.work ... line is guarding an empty block after the colon. If statements followed it, they would be executed after the message was successfully sent. No such additional statements are required here. In the default: branch, the rv += ... adds the number of elements ordered by the recursive call to the number ordered so far in this call. This routine will return rv to the caller and eventually to the coordinator. You can see how that works in the complete code.

The select is actually slightly simplified from the running code, which includes another bypass used when the insertion sort is invoked. Again, see the complete code.

This use of select is a particularly convenient construction. It is short enough that converting the recursion in the serial version to the choice of recursing or starting a worker in the concurrent version does not interrupt the flow of the code.

More importantly, it is encoding a fairly sophisticated concurrent operation – “hand this work off if there is an idle worker, but do it in this worker if not” – succinctly and idiomatically. One can perform the operation in conventionally threaded languages, like java, but each programmer generally rolls their own mechanism to implement it. That extra activation energy often gets the better of programmers. Go encodes the construction so simply that one hopes programmers will use it more often.

Seeing For Yourself

The code includes a couple benchmarks that can show that this implementation does take advantage of multiple processors.  To run it, install the source in your GOPATH and compare

$ go test -test none -bench '.*' psort
$ go test -cpu 4 -test none -bench '.*' psort

On my 4-core i5 the results look like:

$ go test -cpu 1 -bench .\* -run none psort
PASS
BenchmarkOneWorker 1 4176997000 ns/op
BenchmarkFourWorkers 1 4104459000 ns/op
BenchmarkTenWorkers 1 4220084000 ns/op
ok psort 12.655s
$ go test -cpu 4 -bench .\* -run none psort
PASS
BenchmarkOneWorker-4 1 4165567000 ns/op
BenchmarkFourWorkers-4 1 1633784000 ns/op
BenchmarkTenWorkers-4 1 1632001000 ns/op
ok psort 7.526s

Each of these benchmarks sorts 5 million integers. That’s one “op.” The BenchmarkOneWorker benchmark uses one worker thread and takes about 4.2 seconds to sort the array; BenchmarkFourWorkers uses 4 workers and BenchmarkTenWorkers uses 10.

In the first invocation, with -cpu 1, only uses one CPU, so no matter how many workers are used, there is no speedup. In the second invocation, with -cpu 4, adding workers adds CPUs, so there is an appreciable speedup, though not completely linear.

Feel free to play around with it or use it in projects, but there are corner cases where it will perform badly.

Closing Remarks

Overall I was pleased with the concurrency support.  It did take some time, but the language and features do lead a programmer toward efficient abstractions. 

The documentation often repeats the phrase “communicating for concurrency” which is an excellent guiding principle for getting the most out of Go.  I went down a couple blind alleys when I tried to do other things, like start as many goroutines as possible without communicating between them, or letting many messages pile up in buffered channels hoping that a goroutine would finally get around to reading them.  Neither of those works as well as sending a message when an idle worker is listening, which is the direction Go pushes you.

Review: Let’s Pretend This Never Happened

Saturday, May 12th, 2012

Jenny Lawson’s memoir, Let’s Pretend This Never Happened has clear roots in her entertaining blog.  I don’t read it regularly, but I follow references to it and have enjoyed much of what I have read there.  I know Lawson can write manic, funny anecdotes with great style.  I was happy to find her range to be wider than that.

The book is episodic, and each episode has its pleasures: apt turns of phrase, zany escalations of absurdity, and honest moments of revelation. This is an interesting and engaging person who tells her own story well.  The reader comes away with a sense of having met a singular person.

If I have a criticism, it is that it still feels somewhat like episodes that were built into a larger narrative.  There are worse recipes for a memoir, but I’m interested to see what Lawson can build if she were to build a work from the ground up.  I’m interested, but  if she decides she would rather  continue putting out collections of this quality, I’ll stay pretty happy.

Recommended.

Review: Honor in the Dust

Thursday, May 10th, 2012

I really enjoyed James Loewen’s books about how we record and pass on history.  One of the points he made was how few books there are that chronicle the US war in the Philippines, so I decided that when one came up on my radar I’d be sure to have a look.  Enter Gregg Jones and Honor in the Dust, which discusses Roosevelt, that war, and US imperialism.

Jones does a nice job corralling his facts and following the chronology of the conflict.  He tracks the US’s grab for Cuba and the almost incidental grab for the Philippines in support of their revolutionaries.  It is a good place to start as it frames Roosevelt’s character and support for military intervention well against the times before getting into the dirty details of the Philippine War.

The details are pretty dirty, and not at all surprising to any 21st century observer.  US soldiers in a hostile and grueling environment are ordered to use extreme measures to put down insurrections lead by desperate guerrilla fighters.  Slaughter, torture, and betrayal abound, and when these actions come to light the high command denies everything.  Except that with more than 100 years of time and investigation there is much stronger consensus about the misdeeds committed and the origin of them. It makes for depressing reading, especially when it rings so much like foreshadowing.

Jones has his facts straight and writes clearly, but there is a lack of urgency to his narrative.  Events proceed inevitably but there is little tension.  Some of this may be due to a desire not to oversensationalize the events, which are quite appalling enough without embellishment.  Some of it may be that Theodore Roosevelt, the most larger-than-life of the players, disengages quite a bit from the war itself as he becomes president.  And perhaps some of the lack of tension is that we know that the public and military are going to largely forget about the lessons that the war teaches.  Whatever the reason, the narrative flags somewhat in the later part of the book.

This is important, gut-wrenching stuff to know about, but to an extent it feels like literary vegetables.  It is nutritious but does not go down easy.

Recommended.

Review: Drift

Wednesday, May 9th, 2012

In Drift: The Unmooring of American Military Power, Rachel Maddow lays out the proposition that through the late 20th century the executive has slowly pulled the power to take America to war away from the people. She does an excellent job both laying down the research that led to that position and explaining how it fits together and why Americans should care.

I was careful to say that the executive had taken the power from the people, not from Congress, though that’s true as well.  One of Maddow’s key observations is that the 20th and 21st centuries have steadily compartmentalized the sacrifice involved in going to war.  Sacrifice motivates people to assess the benefits of warfare; blunting that pain removes an incentive to consider it.  It is a keen observation that she explains clearly and supports strongly.  By itself it illuminates a fair amount of policy.

She’s also clear and precise about the other, more commonly heard arguments about how the executive has drawn this power to itself with few setbacks.  There were some important ones, however, that indicate that the trend need not be inevitable.  After Vietnam, Congress did assert some amount of power and pull back some of the rights from the executive.  But Congress is directly responsive to the people in the best and worst senses of that.  When supporters and donors lose interest, congresspeople fight other battles.

That is all only a curiosity if she does not argue that Americans should care.  While you will not find a chapter in Drift called “Why You Should Care,” Maddow does a good job of underlining the problems without beating you over the head with them. The philosophical and practical issues both get some time in the spotlight, from who should bear the risk and cost of war to what it means to commit the power of the US military on the say-so of a few of the powerful.  These are important issues given appropriate weight.

Overall this is a timely, clear argument about the current state of our warmaking engine and a history of how it got to be that way.  It is well worth understanding and probably changing.

Strongly recommended.

Review: 13 Things That Don’t Make Sense

Wednesday, May 9th, 2012

Michael Brooks starts with a good theme in 13 Things That Don’t Make Sense, but his execution comes off the rails for me.  His idea is to pick 13 places that scientific consensus is weak or non-existent and highlight them as areas in which breakthroughs could come with new thinking.  This is a sound idea.  Things we don’t understand are spots where people are looking and new ideas are necessary, which is a recipe for shaking things up.

The problem is that the actual phenomena he highlights are hit or miss.  While there really is significant confusion if not downright incredulity around dark matter and dark energy, saying that there is less confusion about cold fusion and homeopathy is a big understatement. There is consensus that homeopathy is snake oil and that cold fusion is an over-hyped anomaly. The distinction is between what experts in the field make sense of and what the general public makes sense of.  There isn’t much chance of a scientific breakthrough coming from studying how Penn and Teller catch bullets in their teeth, even though most people don’t know how it works.

Brooks’s choices are not uniformly bad.  I did enjoy and learn from parts of 13 Things.  Overall, though, I found the mixing of real conundrums and simple misunderstanding to be very distracting.