Thursday, March 11, 2010

Lazy graph traversal in Clojure

Clojure is a new lisp-like programming language, that runs on JVM. It has tons of interesting features that are worth exploring. To name a few, concurrency model based on immutable data and software transactional memory, lazy sequences, etc.

I am currently re-reading a book on algorithms, so I decided that it might be a good idea to implement some of these algorithms on Clojure. I have implemented breadth and depth first graph traversals, using lazy-seq paradigm.

Some aspects of the implementation may appear to be slightly inefficient; for example I represent graph as a sequence of pairs, and each time I need to get all the neighbors of a node I traverse the entire sequence. But I did it that way so that I could concentrate on the algorithm at question.

For the experienced reader, if there is anything in my code that you think may be considered as a bad tone, please don't hesitate to comment.

Without further ado, here it is:

(ns algo.traversal)

"
 1--2-6--7
 |   \\ /
 3--4-5
"

(def graph (seq {1 2, 1 3, 2 6, 6 7, 3 4, 4 5, 2 5, 5 7}))

(defn transitions
  "Finds all the transitions going to and from the node.
  Transition to the node are reversed."
  [graph node]
  (map #(if (= node (first %))
          %
          (reverse %))
       (filter #(or (= node (first %)) (= node (second %))) graph)))

(defn breadth-first
  "Returns a lazy sequence of transitions in breadth-first order
  starting from the given node."
  [graph start]
  (let [walk 
        (fn walk [past-nodes trans]
          (let [next-trans (drop-while #(past-nodes (second %)) trans)]
            (when-let [next-node (second (first next-trans))]
              (lazy-seq
               (cons (first next-trans)
                     (walk (conj past-nodes next-node)
                           (concat next-trans (transitions graph next-node))))))))]
    (walk #{start} (transitions graph start))))

(defn depth-first
  "Returns a lazy sequence of transitions in depth-first order
  starting from the given node."
  [graph start]
  (let [walk 
        (fn walk [past-nodes trans]
          (let [next-trans (drop-while #(past-nodes (second %)) trans)]
            (when-let [next-node (second (first next-trans))]
              (lazy-seq
               (cons (first next-trans)
                     (walk (conj past-nodes next-node)
                           (concat (transitions graph next-node) next-trans)))))))]
    (walk #{start} (transitions graph start))))

As I mentioned before I define graph as a sequence of pairs, all the transitions are navigable in both directions.

breadth-first and depth-first functions are almost the same except for one tiny part; I will talk about it later.

Now, let's take a closer look at the structure of these functions.

Both of them do two things: define inner function walk and call it. Apparently all juice is in this function. Here's the one from breadth-first:

(fn walk [past-nodes trans]
  (let [next-trans (drop-while #(past-nodes (second %)) trans)]
    (when-let [next-node (second (first next-trans))]
      (lazy-seq
       (cons (first next-trans)
             (walk (conj past-nodes next-node)
                   (concat next-trans (transitions graph next-node))))))))

walk is used to recursively traverse the graph. It is closed around the outer function arguments [graph start], and it has two of its own arguments:

  • past-nodes - hash-set of nodes that have already been traversed and
  • trans - transitions yet to walk.

Next line defines a var next-trans, which is a sequence of transitions. It is guaranteed to start with a transition to a new non-traversed node.

next-trans (drop-while #(past-nodes (second %)) trans)
After that next-node var is assigned the next node id. If next-trans is empty the whole block will return nil.
when-let [next-node (second (first next-trans))]
Following is the call to lazy-seq, this function wraps its arguments in a lazy sequence, which allows Clojure stop the computation until further data is requested. I pass it a sequence with its elements being: the next transition and a recursive call to walk.

The recursive call represents the sole difference between the two algorithms. In the case of breadth-first I build the sequence of transitions to traverse by appending transitions from the current node to the end of the sequence,

(walk (conj past-nodes next-node)
      (concat next-trans (transitions graph next-node)))
when in the depth-first case I do the opposite by appending to the head of the sequence:
(walk (conj past-nodes next-node)
      (concat (transitions graph next-node) next-trans))