Alex Kahn

My Haskell Reading List

April 18, 2011

Lately I’ve been spending some time experimenting with Haskell, attempting to learn the fundamentals of functional programming. This post is meant to share some of the interesting resources I’ve come across in exploring this new world. How did I, a humble Rubyist, get mixed up with Haskell? There were a couple forces pushing me in the functional direction. One was Jonas Westerlund. Jonas is a Swedish hacker who works in Python and JavaScript by day and Haskell by night. I eventually became attracted to the folds, lazy evaluation, and infinite lists he showed off continually in IRC. Reading the source for his IRC bot, norby, especially its IRC parser and the main bot module – which responds to commands in separate threads using one little forkIO call – has been a gentle way to get acquainted with some real world Haskell code.

I was also exposed to functional programming ideas recently when I read Gregory Brown’s Ruby Best Practices. This is a fantastic book and definitely helped me level up in Ruby. Relevant here is that this book dedicates a whole chapter to discussing functional programming techniques in Ruby. The author concludes that while Ruby is by no means a functional language, its functional influences have yielded some very powerful abilities. The chapter goes through several examples of idiomatic Ruby that take functional approaches to problem-solving. Some examples are lazy evaluation (with a discussion of MenTaLguY’s lazy.rb), memoization, Enumerable#inject (and why folds are not very efficient in Ruby), infinite lists, and some snazzy Symbol#to_proc magic. This chapter is a testament to the elegance of functional approaches to solving problems and to the power Ruby unlocks by tapping into some of Smalltalk’s functional features. Flip through it and you might find a new appreciation for some of Ruby’s design decisions.

Another reason for my exploration of functional programming is Node.js. The explosive popularity of Node and the resulting backlash by concurrency curmudgeons has been interesting to follow but has not always been illuminating. The “events versus threads” debate has been going on for something like twenty years: which side is right? It turns out that, after brushing aside the trolls, flamebait and hype, this isn’t the right question to ask. Threads and evented programming are both tools that can be used to write concurrent programs; which technology is correct depends on the problem at hand. This is illustrated most clearly in a paper that Coda Hale pointed me to. In Combining Events And Threads For Scalable Network Services, published in 2007, Peng Li and Stephan Zdancewic describe building a network service that takes a hybrid approach by using threads and events. In the end, these don’t have to be competing techniques; they can compliment each other. Equally important, this paper demonstrates that callbacks are not a necessary byproduct of event-driven code. Indeed, the system the paper describes abstracts away the callback spaghetti that can often make up Node programs.

Much of the rest of my reading has been around concurrency. Continuing on the subject of non-blocking IO, Gregory Collins’ talk at the recent QCon London, High Performance Web Applications in Haskell (warning: 18 MB PDF), discussed Haskell’s threading model at length. From his slides, we learn that Haskell uses lightweight threads that are mapped N:M to OS threads. These threads are scheduled by the runtime system and have very low overhead in terms of memory use and context switching. All system calls in Haskell are non-blocking. This system yields performance similar to that of evented frameworks, but steers clear of the need for deeply nested callbacks.

Haskell has no problem dealing with tens of thousands threads, but what about synchronization? One option is software transactional memory (STM). STM allows the programmer to treat shared resources as they would a database transaction, rolling back changes if another thread has modified a resource. I first encountered this construct in David Nolen’s blog posts on using STM for bulk inserts with CouchDB and comparing concurrency in Clojure and Node.js. More recently I came across a 2006 interview with Simon Peyton-Jones and Tim Harris laying out their research and approach to implementing STM in Haskell. Other languages have follewed suit in implementing STM.

However, STM is not a silver bullet. Users of the feature have found that its performance can be unpredictable. In this discussion between Rich Hickey (creator of Clojure) and Cliff Click, Click describes a problem where a program written using STM becomes dependent on the runtime system and the hardware it’s run on. Changing the hardware or using a version of Clojure with updated STM internals resulted vastly different performance, sometimes requiring a rewrite of the program. This discussion, from 2008, illustrates some of the pitfalls of STM, and how it should be used appropriately. I’m curious to find out if and how these issues have been addressed in the years since this discussion.1

This covers some ground on the subject of concurrency on a single machine. But what about using Haskell for writing distributed programs? In Haskell for the cloud, a wonderfully clear and well-written paper, Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones describe implementing a distributed system that emulates the message passing functionality of Erlang, with some added Haskell flair. An interesting diversion from Erlang is the implementation of typed channels, reminiscent of Go (thanks to Ilya Grigorik for his explanation of typed channels in Go, which better suit Haskell’s strong typing than the Erlang message passing model, where a message of any type can be dispatched to any process. The code for this system, known as Cloud Haskell, is available on GitHub.

Another great discussion of Erlang, Haskell, and concurrency is a talk given by Simon Peyton-Jones, one of the creators of Haskell, at Erlang Factory London 2009 entitled Haskell and Erlang: growing up together. This entertaining presentation provides a brief overview of the history of each language, describes the features that distinguish them, and argues that despite their differences, both languages (and functional programming generally) are very relevant for solving today’s engineering problems. The talk provides a great survey of concurrency in Haskell, including a brief mention of data parellelism, an approach I still need to read more on. Be sure to follow along with the slightly cheesy but information-rich slides.

There are many more articles and resources that I have found interesting in my functional journey, but I’m out of space. I want to close with a shoutout to Miran Lipovača for writing Learn You a Haskell for Great Good, a phenomenal book. Don’t be fooled by the illustrations – this is a more straightforward book than Why’s (Poignant) Guide to Ruby. It’s a gentle introduction to functional programming and it presents some fairly mind-bending ideas in a way that somehow manages to be accessible. For example, the book presents monads only after walking readers through functors, applicative functors and monoids. By the time you reach monads, you are fully comfortable with the idea of a value wrapped in a context. The book recognizes that programming in a purely functional language requires new ways of thinking for someone with an imperative, object-oriented background and teaches in a way that is comfortable for users of traditional languages while still remaining true to Haskell. If you’re interested in buying the book, I have a one-time-use 30% off coupon code for No Starch press. Let me know you’re interested and I’ll share the code with you.

  1. For additional reading on STM, check out Hans-J. Boehm’s Transactional Memory Should Be an Implementation Technique, Not a Programming Interface and MenTaLguY’s implementation of STM in Ruby (and thanks to MenTaLguY for referring me to all of the STM links in this post).