Showing posts with label algorithms. Show all posts
Showing posts with label algorithms. Show all posts

Sunday, January 6, 2013

Strongly connected components in Clojure

Recently I had fun implementing strongly connected components algorithm in Clojure. I thought some of the Clojurians might want to see it.

(defn dfs
  "Depth first search. Short form of the method passes through all the 
  nodes of the graph even if it's disconnected .
  (nodes-fn graph) expected to return list of all the nodes in the graph.
  (child-fn graph node) expected to return list of all the nodes linked
   to the given node.
  Returns hash-map where nodes are associated with a pair :idx, :leader.
  :idx stores finishing index of the node traversal (post-order counter)
  :leader first finishing index of the current DFS."
  ([graph nodes-fn child-fn]
     (second
      (reduce ;; Start DFS from each node of the graph
       (fn [[idx result passed :as args] next-node]
         (if (not (passed next-node)) ;; Don't do DFS if node is marked
           (dfs idx idx result passed graph next-node child-fn)
           args))
       [0 {} #{}] ;;Initial index, result, set of passed nodes
       (nodes-fn graph))))
  ([idx leader result passed graph node child-fn]
     (let [[idx result passed]
           (reduce (fn [[idx result passed :as args] child-node]
                     (if (not (passed child-node))
                       (dfs idx leader result passed 
                            graph child-node child-fn)
                       args))
                   [idx result (conj passed node)]
                   (child-fn graph node))]
       [(inc idx)
        (assoc result node {:idx idx :leader leader})
        passed])))

(defn pass-two 
  "Calls DFS making sure that traversal is done in the reverse :idx order."
  [graph result child-fn]
  (let [nodes-fn 
        (constantly (->> result 
                         ;;Sort by :idx in reverse order
                         (sort-by (comp :idx second)) reverse 
                         ;;Return only nodes
                         (map first)))]
    (dfs graph nodes-fn child-fn)))
  
(defn scc 
  "Finds strongly connected components of the given directed graph.
  Returns lists of nodes grouped into SCC.
  (nodes-fn graph) expected to return list of all the nodes in the graph.
  (incoming-fn graph node) expected to return all the nodes with
   transitions towards the given node.
  (outgoing-fn graph node) expected to return all the nodes with
   transitions from the given node."
  [graph nodes-fn incoming-fn outgoing-fn]
  (let [result (dfs graph nodes-fn incoming-fn)
        leaders-idx (pass-two graph result outgoing-fn)]
    (for [scc-group (vals (group-by (comp :leader second) leaders-idx))]
      (for [[node & _] scc-group] node))))

Implementation is quite generic and can be used for different graph implementations. Here's an example using list multimap. I use the same graph as in the course given by Tim Roughgarden on Coursera:

(defn list-multimap 
  "Builds list multimap: {key1 [val1 val2 ...], key2 [val3 val4 ...]}.
   Each call adds value to the list associated with the key."
  [m [k v]]
  (if (m k)
    (update-in m [k] conj v)
    (assoc m k [v])))

(defn reverse-graph 
  "Reverses list multimap based graph, see below."
  [graph]
  (reduce
   list-multimap
   {}
   (for [[key values] graph v values] [v key])))

(def test-graph 
  {6 [9], 2 [8], 4 [7], 3 [6], 8 [5 6], 1 [4], 9 [3 7], 5 [2], 7 [1]})

;Same graph but with letters in case you confuse indices and node names.
;(def test-graph
;  {'a ['g], 'b ['e], 'c ['i], 'd ['a], 
;   'e ['h], 'f ['c 'h], 'g ['d 'i], 'h ['b], 'i ['f]})

(def reverse-test-graph
  (reverse-graph test-graph))

And here's how to use the dfs and scc functions:

(dfs test-graph 
     ;;fn that returns set of nodes
     (constantly (into #{} (flatten (seq test-graph))))
     ;;(get graph node) returns list of related nodes.
     get) 

;{2 {:idx 8, :leader 3},
; 8 {:idx 7, :leader 3},
; 6 {:idx 6, :leader 3},
; 9 {:idx 5, :leader 3},
; 3 {:idx 4, :leader 3},
; 5 {:idx 3, :leader 3},
; 1 {:idx 2, :leader 0},
; 4 {:idx 1, :leader 0},
; 7 {:idx 0, :leader 0}}

(scc test-graph 
     ;;fn that returns set of nodes
     (constantly (into #{} (flatten (seq test-graph))))
     ;;works as incoming-fn using cashed reversed graph
     #(get reverse-test-graph %2) 
     ;;(get graph node) returns list of related nodes
     get)

;((8 5 2) (9 3 6) (1 4 7))

Full source code is here.

Monday, August 20, 2012

Use case for mutable objects in Clojure

I'm currently implementing Aho-Corasic dictionary indexing algorithm. For that I have to write an augmented trie, where each node can point to another branch of the trie. Such links are called failure links. For example, let's say a trie contains two strings: "abcd" and "fgabch". Here letters in the second string (starting from "a") will have failure links to corresponding letters of the first word.

Here's my ASCII "art":

a-b-c-d
^_____
      |
f-g-a-b-c-h

a-b-c-d
  ^___
      |
f-g-a-b-c-h

a-b-c-d
    ^___
        |
f-g-a-b-c-h

This is needed to guarantee matching in time proportional to the length of the text. If the text is "fgabcd" then matching algorithm will match all letters in "fgabc" and will fail to match last "d", then it will use failure link fgabc -> abc and continue matching from there. As the result it will report matching word with start and end indices [2,5].

Without failure links algorithm would have to restart matching from "g" finding nothing, then again from "a" where it would succeed. This property of never going back is crucial in complexity analysis and gives so much desired O(n) execution time.

Now, I have tried hard to implement this trie using immutable data structures. Following description from the book I need to first construct simple trie with all the words in it and then traverse it starting from root in breadth-first order to compute failure links.

Problem started to emerge when I realized that failure link may point to its parent node, for example when indexing string "aaa", second letter "a" would have failure link to the first letter "a", its parent. As it is impossible to create cycles using constructor-based initialization, immutable collections initialized only in constructors wouldn't work.

I know about delay primitive and how it can be used to create cycles. Korma library makes a very nice use of it. But in my case it is prohibitively expensive (citation needed). This algorithm may be used to index thousands of strings, it is not practical to create tens of thousands of delay objects.

To be clear, I'm not trying to argue against immutable data structures. It's just that sometimes we have to venture into the realm of mutability to achieve efficient code. And I think transients in Clojure are some sort of proof of that.

Thursday, March 11, 2010

Lazy graph traversal in Clojure

Clojure is a new lisp-like programming language, that runs on JVM. It has tons of interesting features that are worth exploring. To name a few, concurrency model based on immutable data and software transactional memory, lazy sequences, etc.

I am currently re-reading a book on algorithms, so I decided that it might be a good idea to implement some of these algorithms on Clojure. I have implemented breadth and depth first graph traversals, using lazy-seq paradigm.

Some aspects of the implementation may appear to be slightly inefficient; for example I represent graph as a sequence of pairs, and each time I need to get all the neighbors of a node I traverse the entire sequence. But I did it that way so that I could concentrate on the algorithm at question.

For the experienced reader, if there is anything in my code that you think may be considered as a bad tone, please don't hesitate to comment.

Without further ado, here it is:

(ns algo.traversal)

"
 1--2-6--7
 |   \\ /
 3--4-5
"

(def graph (seq {1 2, 1 3, 2 6, 6 7, 3 4, 4 5, 2 5, 5 7}))

(defn transitions
  "Finds all the transitions going to and from the node.
  Transition to the node are reversed."
  [graph node]
  (map #(if (= node (first %))
          %
          (reverse %))
       (filter #(or (= node (first %)) (= node (second %))) graph)))

(defn breadth-first
  "Returns a lazy sequence of transitions in breadth-first order
  starting from the given node."
  [graph start]
  (let [walk 
        (fn walk [past-nodes trans]
          (let [next-trans (drop-while #(past-nodes (second %)) trans)]
            (when-let [next-node (second (first next-trans))]
              (lazy-seq
               (cons (first next-trans)
                     (walk (conj past-nodes next-node)
                           (concat next-trans (transitions graph next-node))))))))]
    (walk #{start} (transitions graph start))))

(defn depth-first
  "Returns a lazy sequence of transitions in depth-first order
  starting from the given node."
  [graph start]
  (let [walk 
        (fn walk [past-nodes trans]
          (let [next-trans (drop-while #(past-nodes (second %)) trans)]
            (when-let [next-node (second (first next-trans))]
              (lazy-seq
               (cons (first next-trans)
                     (walk (conj past-nodes next-node)
                           (concat (transitions graph next-node) next-trans)))))))]
    (walk #{start} (transitions graph start))))

As I mentioned before I define graph as a sequence of pairs, all the transitions are navigable in both directions.

breadth-first and depth-first functions are almost the same except for one tiny part; I will talk about it later.

Now, let's take a closer look at the structure of these functions.

Both of them do two things: define inner function walk and call it. Apparently all juice is in this function. Here's the one from breadth-first:

(fn walk [past-nodes trans]
  (let [next-trans (drop-while #(past-nodes (second %)) trans)]
    (when-let [next-node (second (first next-trans))]
      (lazy-seq
       (cons (first next-trans)
             (walk (conj past-nodes next-node)
                   (concat next-trans (transitions graph next-node))))))))

walk is used to recursively traverse the graph. It is closed around the outer function arguments [graph start], and it has two of its own arguments:

  • past-nodes - hash-set of nodes that have already been traversed and
  • trans - transitions yet to walk.

Next line defines a var next-trans, which is a sequence of transitions. It is guaranteed to start with a transition to a new non-traversed node.

next-trans (drop-while #(past-nodes (second %)) trans)
After that next-node var is assigned the next node id. If next-trans is empty the whole block will return nil.
when-let [next-node (second (first next-trans))]
Following is the call to lazy-seq, this function wraps its arguments in a lazy sequence, which allows Clojure stop the computation until further data is requested. I pass it a sequence with its elements being: the next transition and a recursive call to walk.

The recursive call represents the sole difference between the two algorithms. In the case of breadth-first I build the sequence of transitions to traverse by appending transitions from the current node to the end of the sequence,

(walk (conj past-nodes next-node)
      (concat next-trans (transitions graph next-node)))
when in the depth-first case I do the opposite by appending to the head of the sequence:
(walk (conj past-nodes next-node)
      (concat (transitions graph next-node) next-trans))