Showing posts with label immutable. Show all posts
Showing posts with label immutable. Show all posts

Tuesday, September 4, 2012

Key/Value in-memory storage for Java

Managing multiple properties in Java may be tiresome, especially if these properties are of different type. What is a property you ask me? Anything that doesn't come in thousands and doesn't get changed thousand times a second. For example a class mapped to a database table with more than ten pairs of get* and set* methods is a good candidate.

Problems start to appear when you need to:

  • Make this class thread safe.
  • Or, make a defensive copy of it due to difficulties with the first point.

Classical way around concurrency problems would be to use an immutable class. But managing immutable classes may be unwieldy, especially in the circumstances I've just described. You would need a huge constructor or a special builder class.

So this is the problem that I tried to solve and I hope you don't hate me after you read how I did it.

The idea was very simple – to use hashmap to store properties, but I didn't know how to make it type safe as values would be of different type. Then I came up with a special kind of key that dictates its value type. Here's an excerpt from PropertyKey.java:

public class PropertyKey<T> {
    private final Class<T> clazz;
    private final String name;

    public PropertyKey(Class<T> valueType, String name) {
        this.clazz = valueType;
        this.name = name;
    }

    public boolean checkType(Object value) {
        if (null == value) {
            return true;
        }
        return this.clazz.isAssignableFrom(value.getClass());
    }
}

Constructing new key looks like that:

PropertyKey<String> ENTITY_NAME = 
        new PropertyKey<String>(String.class, "ENTITY_NAME");

A bit verbose and I'm sorry for that, but what I get in return is a key for a field ENTITY_NAME of type String. Class has a method checkType that returns true if its argument extends T. All this is used by a special implementation of immutable hashmap.

public class PropertyHolder {

    private final ImmutableMap<PropertyKey<?>, ?> storage;

    public <T> T get(PropertyKey<T> key) {
        return (T) storage.get(key);
    }

    /**
     * Adds key/value pair to the state and returns new 
     * {@link PropertyHolder} with this state.
     * 
     * @param key {@link PropertyKey} instance.
     * @param value Value of type specified in {@link PropertyKey}.
     * @return New {@link PropertyHolder} with updated state.
     */
    public <T> PropertyHolder put(PropertyKey<T> key, T value) {
        Preconditions.checkNotNull(key, "PropertyKey cannot be null");
        Preconditions.checkNotNull(value, "Value for key %s is null", 
                key);
        Preconditions.checkArgument(key.checkType(value), 
                "Property \"%s\" was given " 
                + "value of a wrong type \"%s\"", key, value);
        // Creates ImmutableMap.Builder with new key/value pair.
        return new PropertyHolder(filterOutKey(key)
                .put(key, value).build());
    }
}

You can download full source code and complete unit tests from here. In full implementation I require values to be Serializable only because it is needed in my project, feel free to get rid of it.

PropertyHolder is an immutable class with interface very similar to Map. Its methods put and remove return new copy of PropertyHolder with modifications applied to it.

The way I do it is not very optimal as every modification creates full copy of ImmutableMap, one way to solve it would be to use persistent collections. But if you don't put there more than 20 objects and don't update it too often current implementation should do just fine.

Now declaring keys is a little ugly but once done it is very easy to use them:

// Adding new k/v pair
PropertyHolder holder = new PropertyHolder();
holder = holder.put(ENTITY_NAME, "Person");

// Reading value for a key in a typesafe way
String name = holder.get(ENTITY_NAME);

That's it. Please let me know if you find this solution useful or if there's a way to improve it.

Feel free to use it as is or modify it to your needs. I'm publishing full sources and unit tests under Eclipse Public License .

Monday, August 20, 2012

Use case for mutable objects in Clojure

I'm currently implementing Aho-Corasic dictionary indexing algorithm. For that I have to write an augmented trie, where each node can point to another branch of the trie. Such links are called failure links. For example, let's say a trie contains two strings: "abcd" and "fgabch". Here letters in the second string (starting from "a") will have failure links to corresponding letters of the first word.

Here's my ASCII "art":

a-b-c-d
^_____
      |
f-g-a-b-c-h

a-b-c-d
  ^___
      |
f-g-a-b-c-h

a-b-c-d
    ^___
        |
f-g-a-b-c-h

This is needed to guarantee matching in time proportional to the length of the text. If the text is "fgabcd" then matching algorithm will match all letters in "fgabc" and will fail to match last "d", then it will use failure link fgabc -> abc and continue matching from there. As the result it will report matching word with start and end indices [2,5].

Without failure links algorithm would have to restart matching from "g" finding nothing, then again from "a" where it would succeed. This property of never going back is crucial in complexity analysis and gives so much desired O(n) execution time.

Now, I have tried hard to implement this trie using immutable data structures. Following description from the book I need to first construct simple trie with all the words in it and then traverse it starting from root in breadth-first order to compute failure links.

Problem started to emerge when I realized that failure link may point to its parent node, for example when indexing string "aaa", second letter "a" would have failure link to the first letter "a", its parent. As it is impossible to create cycles using constructor-based initialization, immutable collections initialized only in constructors wouldn't work.

I know about delay primitive and how it can be used to create cycles. Korma library makes a very nice use of it. But in my case it is prohibitively expensive (citation needed). This algorithm may be used to index thousands of strings, it is not practical to create tens of thousands of delay objects.

To be clear, I'm not trying to argue against immutable data structures. It's just that sometimes we have to venture into the realm of mutability to achieve efficient code. And I think transients in Clojure are some sort of proof of that.