tag:blogger.com,1999:blog-19180838997296408572023-11-16T11:44:58.720+01:00Developer's cornerVarious programming challenges I've ever come across.Anonymoushttp://www.blogger.com/profile/09857000646589913051noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-1918083899729640857.post-85795304210708911542013-01-06T08:12:00.000+01:002013-01-06T08:12:09.833+01:00Strongly connected components in Clojure<div dir="ltr" style="text-align: left;" trbidi="on">
<p>Recently I had fun implementing <a href="http://en.wikipedia.org/wiki/Strongly_connected_component">strongly connected components</a> algorithm in Clojure. I thought some of the Clojurians might want to see it.
<pre class="brush: clojure">
(defn dfs
"Depth first search. Short form of the method passes through all the
nodes of the graph even if it's disconnected .
(nodes-fn graph) expected to return list of all the nodes in the graph.
(child-fn graph node) expected to return list of all the nodes linked
to the given node.
Returns hash-map where nodes are associated with a pair :idx, :leader.
:idx stores finishing index of the node traversal (post-order counter)
:leader first finishing index of the current DFS."
([graph nodes-fn child-fn]
(second
(reduce ;; Start DFS from each node of the graph
(fn [[idx result passed :as args] next-node]
(if (not (passed next-node)) ;; Don't do DFS if node is marked
(dfs idx idx result passed graph next-node child-fn)
args))
[0 {} #{}] ;;Initial index, result, set of passed nodes
(nodes-fn graph))))
([idx leader result passed graph node child-fn]
(let [[idx result passed]
(reduce (fn [[idx result passed :as args] child-node]
(if (not (passed child-node))
(dfs idx leader result passed
graph child-node child-fn)
args))
[idx result (conj passed node)]
(child-fn graph node))]
[(inc idx)
(assoc result node {:idx idx :leader leader})
passed])))
(defn pass-two
"Calls DFS making sure that traversal is done in the reverse :idx order."
[graph result child-fn]
(let [nodes-fn
(constantly (->> result
;;Sort by :idx in reverse order
(sort-by (comp :idx second)) reverse
;;Return only nodes
(map first)))]
(dfs graph nodes-fn child-fn)))
(defn scc
"Finds strongly connected components of the given directed graph.
Returns lists of nodes grouped into SCC.
(nodes-fn graph) expected to return list of all the nodes in the graph.
(incoming-fn graph node) expected to return all the nodes with
transitions towards the given node.
(outgoing-fn graph node) expected to return all the nodes with
transitions from the given node."
[graph nodes-fn incoming-fn outgoing-fn]
(let [result (dfs graph nodes-fn incoming-fn)
leaders-idx (pass-two graph result outgoing-fn)]
(for [scc-group (vals (group-by (comp :leader second) leaders-idx))]
(for [[node & _] scc-group] node))))
</pre>
<p>Implementation is quite generic and can be used for different graph implementations. Here's an example using list multimap. I use the same graph as in <a href="https://www.coursera.org/course/algo">the course</a> given by Tim Roughgarden on Coursera:
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjC9VQScYQM2CcbkUcHiurx37lMMfOfo4e6yujkfkDvP1Pt67dltuU3e3jK0zJXtrtlWurThTUj79IGTzsrrzOm8qN08ou_ObVr6BXRpbNBVDjnGiVC67UxChYJ4x8zwhf780ukiPhFepaO/s1600/SCC+forward.png" imageanchor="1" style="margin-left:1em; margin-right:1em"><img border="0" height="110" width="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjC9VQScYQM2CcbkUcHiurx37lMMfOfo4e6yujkfkDvP1Pt67dltuU3e3jK0zJXtrtlWurThTUj79IGTzsrrzOm8qN08ou_ObVr6BXRpbNBVDjnGiVC67UxChYJ4x8zwhf780ukiPhFepaO/s400/SCC+forward.png" /></a></div>
<pre class="brush: clojure">
(defn list-multimap
"Builds list multimap: {key1 [val1 val2 ...], key2 [val3 val4 ...]}.
Each call adds value to the list associated with the key."
[m [k v]]
(if (m k)
(update-in m [k] conj v)
(assoc m k [v])))
(defn reverse-graph
"Reverses list multimap based graph, see below."
[graph]
(reduce
list-multimap
{}
(for [[key values] graph v values] [v key])))
(def test-graph
{6 [9], 2 [8], 4 [7], 3 [6], 8 [5 6], 1 [4], 9 [3 7], 5 [2], 7 [1]})
;Same graph but with letters in case you confuse indices and node names.
;(def test-graph
; {'a ['g], 'b ['e], 'c ['i], 'd ['a],
; 'e ['h], 'f ['c 'h], 'g ['d 'i], 'h ['b], 'i ['f]})
(def reverse-test-graph
(reverse-graph test-graph))
</pre>
<p>And here's how to use the <b>dfs</b> and <b>scc</b> functions:
<pre class="brush: clojure">
(dfs test-graph
;;fn that returns set of nodes
(constantly (into #{} (flatten (seq test-graph))))
;;(get graph node) returns list of related nodes.
get)
;{2 {:idx 8, :leader 3},
; 8 {:idx 7, :leader 3},
; 6 {:idx 6, :leader 3},
; 9 {:idx 5, :leader 3},
; 3 {:idx 4, :leader 3},
; 5 {:idx 3, :leader 3},
; 1 {:idx 2, :leader 0},
; 4 {:idx 1, :leader 0},
; 7 {:idx 0, :leader 0}}
(scc test-graph
;;fn that returns set of nodes
(constantly (into #{} (flatten (seq test-graph))))
;;works as incoming-fn using cashed reversed graph
#(get reverse-test-graph %2)
;;(get graph node) returns list of related nodes
get)
;((8 5 2) (9 3 6) (1 4 7))
</pre>
<p>Full source code is <a href="https://gist.github.com/4465693">here</a>.
<br /></div>Anonymoushttp://www.blogger.com/profile/09857000646589913051noreply@blogger.com0tag:blogger.com,1999:blog-1918083899729640857.post-55983389525895486742012-09-04T21:55:00.002+02:002012-09-04T21:55:43.056+02:00Key/Value in-memory storage for Java<div dir="ltr" style="text-align: left;" trbidi="on">
Managing multiple properties in Java may be tiresome, especially if these properties are of different type. <i>What is a property you ask me?</i> Anything that doesn't come in thousands and doesn't get changed thousand times a second. For example a class mapped to a database table with more than ten pairs of get* and set* methods is a good candidate.
<p>
Problems start to appear when you need to:
<ul>
<li>Make this class thread safe.</li>
<li>Or, make a defensive copy of it due to difficulties with the first point.</li>
</ul>
<p>
Classical way around concurrency problems would be to use an immutable class. But managing immutable classes may be unwieldy, especially in the circumstances I've just described. You would need a huge constructor or a special builder class.
<p>
So this is the problem that I tried to solve and I hope you don't hate me after you read how I did it.
<p>
The idea was very simple – to use hashmap to store properties, but I didn't know how to make it type safe as values would be of different type. Then I came up with a special kind of key that dictates its value type. Here's an excerpt from PropertyKey.java:
<pre class="brush: java">
public class PropertyKey<T> {
private final Class<T> clazz;
private final String name;
public PropertyKey(Class<T> valueType, String name) {
this.clazz = valueType;
this.name = name;
}
public boolean checkType(Object value) {
if (null == value) {
return true;
}
return this.clazz.isAssignableFrom(value.getClass());
}
}
</pre>
<p>
Constructing new key looks like that:
<pre class="brush: java">
PropertyKey<String> ENTITY_NAME =
new PropertyKey<String>(String.class, "ENTITY_NAME");
</pre>
<p>
A bit verbose and I'm sorry for that, but what I get in return is a key for a field ENTITY_NAME of type String. Class has a method checkType that returns true if its argument extends <b>T</b>. All this is used by a special implementation of immutable hashmap.
<pre class="brush: java">
public class PropertyHolder {
private final ImmutableMap<PropertyKey<?>, ?> storage;
public <T> T get(PropertyKey<T> key) {
return (T) storage.get(key);
}
/**
* Adds key/value pair to the state and returns new
* {@link PropertyHolder} with this state.
*
* @param key {@link PropertyKey} instance.
* @param value Value of type specified in {@link PropertyKey}.
* @return New {@link PropertyHolder} with updated state.
*/
public <T> PropertyHolder put(PropertyKey<T> key, T value) {
Preconditions.checkNotNull(key, "PropertyKey cannot be null");
Preconditions.checkNotNull(value, "Value for key %s is null",
key);
Preconditions.checkArgument(key.checkType(value),
"Property \"%s\" was given "
+ "value of a wrong type \"%s\"", key, value);
// Creates ImmutableMap.Builder with new key/value pair.
return new PropertyHolder(filterOutKey(key)
.put(key, value).build());
}
}
</pre>
<p>
You can download full source code and complete unit tests from <a href="https://sites.google.com/site/ivankoblik/web-storage/KVStore.zip">here</a>. In full implementation I require values to be Serializable only because it is needed in my project, feel free to get rid of it.
<p>
PropertyHolder is an immutable class with interface very similar to Map. Its methods <b>put</b> and <b>remove</b> return new copy of <b>PropertyHolder</b> with modifications applied to it.
<p>
The way I do it is not very optimal as every modification creates full copy of ImmutableMap, one way to solve it would be to use <a href="http://code.google.com/p/pcollections/">persistent collections</a>. But if you don't put there more than 20 objects and don't update it too often current implementation should do just fine.
<p>
Now declaring keys is a little ugly but once done it is very easy to use them:
<pre class="brush: java">
// Adding new k/v pair
PropertyHolder holder = new PropertyHolder();
holder = holder.put(ENTITY_NAME, "Person");
// Reading value for a key in a typesafe way
String name = holder.get(ENTITY_NAME);
</pre>
<p>
That's it. Please let me know if you find this solution useful or if there's a way to improve it.
<p>
Feel free to use it as is or modify it to your needs. I'm publishing <a href="https://sites.google.com/site/ivankoblik/web-storage/KVStore.zip">full sources and unit tests</a> under <a href="http://opensource.org/licenses/eclipse-1.0.php">Eclipse Public License </a>.
<br /></div>Anonymoushttp://www.blogger.com/profile/09857000646589913051noreply@blogger.com0tag:blogger.com,1999:blog-1918083899729640857.post-40994594889160898062012-08-20T21:54:00.000+02:002013-05-14T16:27:42.264+02:00Use case for mutable objects in Clojure<div dir="ltr" style="text-align: left;" trbidi="on">
<p>I'm currently implementing <a href="https://github.com/ikoblik/clj-index/blob/master/src/main/clojure/clj_index/aho_corasic.clj">Aho-Corasic</a> dictionary indexing algorithm. For that I have to write an augmented trie, where each node can point to another branch of the trie. Such links are called failure links. For example, let's say a trie contains two strings: "abcd" and "fgabch". Here letters in the second string (starting from "a") will have failure links to corresponding letters of the first word.
<br />
<p>Here's my ASCII "art":
<br />
<pre>a-b-c-d
^_____
|
f-g-a-b-c-h
a-b-c-d
^___
|
f-g-a-b-c-h
a-b-c-d
^___
|
f-g-a-b-c-h
</pre>
<p>This is needed to guarantee matching in time proportional to the length of the text. If the text is "fgabcd" then matching algorithm will match all letters in "fgabc" and will fail to match last "d", then it will use failure link fgabc -> abc and continue matching from there. As the result it will report matching word with start and end indices [2,5].
<p>Without failure links algorithm would have to restart matching from "g" finding nothing, then again from "a" where it would succeed. This property of never going back is crucial in complexity analysis and gives so much desired O(n) execution time.
<p>Now, I have tried hard to implement this trie using immutable data structures. Following description from the book I need to first construct simple trie with all the words in it and then traverse it starting from root in breadth-first order to compute failure links.
<p>Problem started to emerge when I realized that failure link may point to its parent node, for example when indexing string "aaa", second letter "a" would have failure link to the first letter "a", its parent. As it is impossible to create cycles using constructor-based initialization, immutable collections initialized only in constructors wouldn't work.
<p>I know about <b>delay</b> primitive and how it can be used to create cycles. Korma library makes a very nice use of it. But in my case it is prohibitively expensive (<i>citation needed</i>). This algorithm may be used to index thousands of strings, it is not practical to create tens of thousands of <b>delay</b> objects.
<p>To be clear, I'm not trying to argue against immutable data structures. It's just that sometimes we have to venture into the realm of mutability to achieve efficient code. And I think transients in Clojure are some sort of proof of that.
</div>Anonymoushttp://www.blogger.com/profile/09857000646589913051noreply@blogger.com0tag:blogger.com,1999:blog-1918083899729640857.post-33004967776562498422010-03-11T23:57:00.014+01:002010-03-14T02:23:23.584+01:00Lazy graph traversal in Clojure<p>
Clojure is a new lisp-like programming language, that runs on JVM. It has tons of interesting features that are worth exploring. To name a few, concurrency model based on immutable data and software transactional memory, lazy sequences, etc.
<p>
I am currently re-reading a book on algorithms, so I decided that it might be a good idea to implement some of these algorithms on Clojure. I have implemented breadth and depth first graph traversals, using lazy-seq paradigm.
<p>
Some aspects of the implementation may appear to be slightly inefficient; for example I represent graph as a sequence of pairs, and each time I need to get all the neighbors of a node I traverse the entire sequence. But I did it that way so that I could concentrate on the algorithm at question.
<p>
For the experienced reader, if there is anything in my code that you think may be considered as a bad tone, please don't hesitate to comment.
<p>
Without further ado, here it is:
<pre style="overflow: auto; width: 100%" class="brush: clojure">(ns algo.traversal)
"
1--2-6--7
| \\ /
3--4-5
"
(def graph (seq {1 2, 1 3, 2 6, 6 7, 3 4, 4 5, 2 5, 5 7}))
(defn transitions
"Finds all the transitions going to and from the node.
Transition to the node are reversed."
[graph node]
(map #(if (= node (first %))
%
(reverse %))
(filter #(or (= node (first %)) (= node (second %))) graph)))
(defn breadth-first
"Returns a lazy sequence of transitions in breadth-first order
starting from the given node."
[graph start]
(let [walk
(fn walk [past-nodes trans]
(let [next-trans (drop-while #(past-nodes (second %)) trans)]
(when-let [next-node (second (first next-trans))]
(lazy-seq
(cons (first next-trans)
(walk (conj past-nodes next-node)
(concat next-trans (transitions graph next-node))))))))]
(walk #{start} (transitions graph start))))
(defn depth-first
"Returns a lazy sequence of transitions in depth-first order
starting from the given node."
[graph start]
(let [walk
(fn walk [past-nodes trans]
(let [next-trans (drop-while #(past-nodes (second %)) trans)]
(when-let [next-node (second (first next-trans))]
(lazy-seq
(cons (first next-trans)
(walk (conj past-nodes next-node)
(concat (transitions graph next-node) next-trans)))))))]
(walk #{start} (transitions graph start))))
</pre>
<p>As I mentioned before I define <b>graph</b> as a sequence of pairs, all the transitions are navigable in both directions.
<p>
<b>breadth-first</b> and <b>depth-first</b> functions are almost the same except for one tiny part; I will talk about it later.
<p>
Now, let's take a closer look at the structure of these functions.
<p>
Both of them do two things: define inner function <b>walk</b> and call it. Apparently all juice is in this function. Here's the one from <b>breadth-first</b>:
<pre class="brush: clojure">(fn walk [past-nodes trans]
(let [next-trans (drop-while #(past-nodes (second %)) trans)]
(when-let [next-node (second (first next-trans))]
(lazy-seq
(cons (first next-trans)
(walk (conj past-nodes next-node)
(concat next-trans (transitions graph next-node))))))))</pre>
<p>
<b>walk</b> is used to recursively traverse the graph. It is closed around the outer function arguments [graph start], and it has two of its own arguments:
<ul>
<li><i>past-nodes</i> - hash-set of nodes that have already been traversed and </li>
<li><i>trans</i> - transitions yet to walk.</li>
</ul>
<p>
Next line defines a <i>var</i> <b>next-trans</b>, which is a sequence of transitions. It is guaranteed to start with a transition to a new non-traversed node.
<pre class="brush: clojure">next-trans (drop-while #(past-nodes (second %)) trans)</pre>
After that <b>next-node</b> <i>var</i> is assigned the next node id. If <b>next-trans </b>is empty the whole block will return <b>nil</b>.
<pre class="brush: clojure">when-let [next-node (second (first next-trans))]</pre>
Following is the call to <b>lazy-seq</b>, this function wraps its arguments in a lazy sequence, which allows Clojure stop the computation until further data is requested. I pass it a sequence with its elements being: the next transition and a recursive call to <b>walk</b>.
<p>
The recursive call represents the sole difference between the two algorithms. In the case of <b>breadth-first</b> I build the <u>sequence of transitions to traverse</u> by appending <u>transitions from the current node</u> to the <i>end </i>of the sequence,
<pre class="brush: clojure">(walk (conj past-nodes next-node)
(concat next-trans (transitions graph next-node)))</pre>
when in the <b>depth-first</b> case I do the opposite by appending to the <i>head</i> of the sequence:
<pre class="brush: clojure">(walk (conj past-nodes next-node)
(concat (transitions graph next-node) next-trans))</pre>Anonymoushttp://www.blogger.com/profile/09857000646589913051noreply@blogger.com0tag:blogger.com,1999:blog-1918083899729640857.post-82544875149329455412009-12-29T20:08:00.013+01:002010-03-14T01:35:24.553+01:00Rounding doubles in JavaStrangely enough there is no standard way in Java to round double values up to a certain order of 10 or 2. There are some ways to do that, but they seem to be too bulky and, maybe, not fast enough. For example, the one where you have to use BigDecimal, it seems to be too much of an overkill to create a new object only to round a value.<br/>
<br/>
After spending a little bit of time studying methods offered by standard java.lang.Math class I stumbled upon two methods that in combination would allow me to round values up to a power of two.<br/>
<br/>
<span style="font-weight: bold;">double Math.scalb(double value)</span> - multiplies given double value by the given power of two,<br/>
<span style="font-weight: bold;">double Math.iround(double value)</span> - discards everything to the right of the decimal point.<br/>
<br/>
So here's what needs to be done:<br/>
<ol><li>Divide the given double value by the given power of two.</li><li>Discard the decimal part.</li><li>Multiply the result by the given power of two.</li></ol>But there is a catch that you should know about; resulting value may still have something below given precision point, that's OK. As in fact what this method does is setting granularity up to the given power of two.<br/>
Here's the method:<pre class="brush: java">public static double round(double value, int power) {
// Divide value by 2^power
double scaled = Math.scalb(value, -power);
// Discard the fractional part, multiply back by 2^power
return Math.scalb(Math.rint(scaled), power);
}</pre>
This approach is great when you compare Doubles with tolerance and need equal hash codes for <i>equal</i> values.<br/>
<br/>
Don't forget that rounding is done up to a certain power of two:<br/>
<table border="1" bordercolor="#000000" cellpadding="3" cellspacing="0"><tbody><tr><td><b>Powers of two.</b></td><td><b>Values that the fractional part may assume.</b></td></tr><tr><td>-1</td><td>0.5</td></tr><tr><td>-2</td><td>0.25, 0.5, 0.75</td></tr><tr><td>-3</td><td>0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875</td></tr></tbody></table><br/>Anonymoushttp://www.blogger.com/profile/09857000646589913051noreply@blogger.com0tag:blogger.com,1999:blog-1918083899729640857.post-32038242155989542012009-06-28T14:24:00.016+02:002013-10-21T21:12:46.486+02:00EXIF with Python<div dir="ltr" style="text-align: left;" trbidi="on">
Recently I wrote a small script on Python that restores photo file's creation and modification dates to their original values, taken from EXIF.<br />
<br />
First of all you need to have <a href="http://www.python.org/download/" title="Python">Python</a> (make sure to get version 2.6.x, 3.x won't work). You will also have to install some libraries.<br />
<br />
<b>Getting the libraries</b><br />
<br />
<i>Win32 Extensions</i><br />
Linux users can skip this step. "Problem" with Windows is that files on FAT32 or NTFS have creation and modification date, while Python out of the box supports only the latter one. To fix that you'll need to get <a href="http://python.net/crew/mhammond/win32/Downloads.html" title="Win32 Extensions">Win32 Extensions</a> library and install it. If you skip this step, the script will work but it will be able to fix only the modification date, creation date will stay untouched.<br />
<br />
<i>pyexif</i><br />
To parse EXIF headers you will need to get the <a href="http://pyexif.sourceforge.net/" title="pyexif">pyexif</a> library. Just unpack it to the folder where you have python installed.<br />
<br />
<i>exif</i><br />
exif is a linux command line utility. In some respect it is better than the pyexif, as I had some pictures where <i>pyexif</i> failed to retrieve the data. For Linux users you need to check if you have <a href="http://dir.filewatcher.com/d/Debian/hurd-i386/graphics/exif_0.6.9-5_hurd-i386.deb.19958.html" title="exif">exif</a> installed.<br />
<br />
<b>The <a href="http://sites.google.com/site/ivankoblik/web-storage/exif_date.py?attredirects=0">script</a></b><br />
<pre class="brush: python">#!/usr/bin/env python
"""A simple utility to restore file creation and modification
dates back to their original values from EXIF.
This script requires exif module to be installed or the exif
command line utility to be in the path.
To function correctly under windows this script needs win32file and
win32con modules. Otherwise it will not be able to restore the creation
date."""
import os, sys, time, re, glob
from datetime import datetime, timedelta
try:
import win32file, win32con
__use_win_32 = True
except:
__use_win_32 = False
__path_to_exif = 'exif'
TEN_MINUTES = timedelta(minutes=10)
__description = """Restores file's creation and modification dates back to the original
value from EXIF.
usage: exif_date.py [File name mask]"""
def getExifCreationDate(path):
"""Gets the earliest date from the file's EXIF header, returns time tuple"""
timeStamp = None
try:
import exif
pf = exif.parse(path)
originalTime = pf.get('DateTimeOriginal')
if (originalTime):
timeStamp = datetime.strptime(originalTime, '%Y:%m:%d %H:%M:%S')
except:
pass
#sometimes exif lib failes to retrieve data
if (not timeStamp):
response = os.popen(__path_to_exif + ' -x "%s"' % path, 'r')
lines = response.read()
matches = re.findall('<Date_and_Time.+?>(.*?)</Date_and_Time.+?>', lines)
if (len(matches)):
timeStamp = min(*[datetime.strptime(x, '%Y:%m:%d %H:%M:%S') for x in matches])
return timeStamp
def getFileDates(path):
"""Returns a dictionary of file creation (ctime), modification (mtime), exif (exif) dates"""
dates = {}
dates['exif'] = getExifCreationDate(path)
dates['mtime'] = datetime.utcfromtimestamp(os.path.getmtime(path))
dates['ctime'] = datetime.utcfromtimestamp(os.path.getctime(path))
return dates
def setFileDates(fileName, dates):
"""Sets file modification and creation dates to the specified value"""
if __use_win_32:
filehandle = win32file.CreateFile(fileName, win32file.GENERIC_WRITE, 0, None, win32con.OPEN_EXISTING, 0, None)
win32file.SetFileTime(filehandle, *(dates['exif'],)*3)
filehandle.close()
else:
os.utime(fileName, (time.mktime(dates['exif'].utctimetuple()),)*2)
def fixFileDate(fileName):
"""Reads file's EXIF header, gets the earliest date and sets it to the file"""
dates = getFileDates(fileName)
if (dates['exif']):
cmp_time = lambda x, y: x - y > TEN_MINUTES
diff = [cmp_time(dates[x], dates['exif']) for x in ('mtime', 'ctime')]
if(sum(diff)):
setFileDates(fileName, dates)
return dates, diff
else:
return dates, None
def usage():
print __description
def main(args):
if (not len(args)):
usage()
return - 1
processedFiles = []
for fileNameMask in args:
if "*" in fileNameMask or "?" in fileNameMask:
print "Looking for files with mask " + fileNameMask
for fileName in filter(lambda x: x not in processedFiles, glob.glob(fileNameMask)):
processedFiles.append(fileName)
try:
dates, diff = fixFileDate(fileName)
except Exception, e:
print e
diff = None
print fileName + ' - ',
if (not diff):
print 'SKIP, NO EXIF'
else:
if (sum(diff) != 0):
print 'SET TO "%s" (updated M:%d, C:%d)' % (dates['exif'].strftime('%Y:%m:%d %H:%M:%S'), diff[0], diff[1])
else:
print 'OK'
return 0
if __name__ == '__main__':
sys.exit(main(sys.argv[1:]))</pre>
Click <a href="http://sites.google.com/site/ivankoblik/web-storage/exif_date.py?attredirects=0">here</a> or <a href="https://gist.github.com/ikoblik/7089165">here</a> to download the script.<br />
<br />
<b>Edit:</b> removed OptionParser from imports.<br />
<b>Edit 2:</b> improved handling of daylight savings time.</div>Anonymoushttp://www.blogger.com/profile/09857000646589913051noreply@blogger.com3tag:blogger.com,1999:blog-1918083899729640857.post-34562328559231470612009-02-18T20:44:00.020+01:002013-01-18T20:32:13.393+01:00XML persistence with Hibernate and XstreamNot a long time ago in the project I had been working on, we were writing a lot of blobs to a database. The information persisted was not of high interest, so we didn't even bother breaking the blobs into separate tables. Of course on very rare occasions we needed to see the data but couldn't.<br><br/>
The problem seemed to be hard to solve until I had stumbled across a very neat XML persistence library <a title="Xstream" target="_blank" href="http://xstream.codehaus.org/" id="t772">Xstream</a>. It allows you to persist and restore objects of any type to and from XML without <i>any</i> <i>additional configuration</i>. To learn more about the library I recommend you to skip through their very short but expressive <a title="tutorial" target="_blank" href="http://xstream.codehaus.org/tutorial.html" id="l7th">tutorial</a>.<br><br/>
We used <a title="Hibernate" target="_blank" href="http://www.hibernate.org/" id="tqex">Hibernate</a> for persistence, so I needed to teach it how to use Xstream. Thankfully Hibernate is an easily extensible library. I had written a persister that was capable of transparently persisting any object as an array of chars in form of XML. <br><br/>
My solution has a major drawback though, which I haven't fixed, max length of character array is restricted in the DBMS we use (in Oracle it is 4000 chars). So if you need to persist some really long sets of data or just <i>big</i> objects you might have to rewirte the persister to use CBLOB or any other alternative in your database.<br/>
<pre class="brush: java">package cern.oasis.util.persistence;
import java.io.Serializable;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Types;
import org.hibernate.Hibernate;
import org.hibernate.HibernateException;
import org.hibernate.usertype.UserType;
import com.thoughtworks.xstream.XStream;
/**
* Persists any serializable java object as XML string.
*
* @author Ivan Koblik
*/
public class XMLString implements UserType {
private static final int[] TYPES = { Types.VARCHAR };
private final static XStream stream = new XStream();
static {
//Here you can define aliases of some classes if you want to
//stream.alias("Impedance", Impedance.class);
}
@Override
public Object assemble(Serializable cached, Object owner)
throws HibernateException {
return null == cached ? null : stream.fromXML((String) cached);
}
@Override
public Serializable disassemble(Object value) throws HibernateException {
return null == value ? null : stream.toXML(value);
}
@Override
public Object deepCopy(Object value) throws HibernateException {
return null == value ? null : stream.fromXML(stream.toXML(value));
}
@Override
public boolean equals(Object x, Object y) throws HibernateException {
if (x == y) {
return true;
} else if (x == null || y == null) {
return false;
} else {
return x.equals(y);
}
}
@Override
public int hashCode(Object x) throws HibernateException {
return null == x ? 0 : x.hashCode();
}
@Override
public boolean isMutable() {
return true;
}
@Override
public Object nullSafeGet(ResultSet rs, String[] names, Object owner)
throws HibernateException, SQLException {
String persistedXml = (String) Hibernate.STRING.nullSafeGet(rs, names[0]);
return null == persistedXml ? null : stream.fromXML(persistedXml);
}
@Override
public void nullSafeSet(PreparedStatement st, Object value, int index)
throws HibernateException, SQLException {
String xmlToPersist = null == value ? null : stream.toXML(value);
if (null != xmlToPersist && xmlToPersist.length() > 4000) {
throw new RuntimeException(
"Can not persist strings longer then 4000 characters:\n" + xmlToPersist);
}
Hibernate.STRING.nullSafeSet(st, xmlToPersist, index);
}
@Override
public Object replace(Object original, Object target, Object owner)
throws HibernateException {
return this.deepCopy(original);
}
@Override
public Class returnedClass() {
return Serializable.class;
}
@Override
public int[] sqlTypes() {
return TYPES;
}
}</pre>That's it, the only thing that is left to do is to tell Hibernate to use this persister. Here's an example with annotations:<br/>
<pre style="font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 11px; padding: 5px; overflow: auto; width: 100%" class="brush: java">
@Entity
public class EntityClass {
...
@AccessType("field")
@Type(type = "cern.oasis.util.persistence.XMLString")
public Map<String, Boolean> getValueThatIsPersistentByXStream() {
return this.map;
}
}</pre>
Hope you find it useful. Comments are more then welcome!<br/>Anonymoushttp://www.blogger.com/profile/09857000646589913051noreply@blogger.com4tag:blogger.com,1999:blog-1918083899729640857.post-3559986230490180522008-11-29T12:24:00.014+01:002010-03-14T01:29:02.080+01:00Tricking the RemoteGroupLast time I promised you to write of another useful trick that helped me optimize group queries in Atlassian Crowd.<br><br/>
If you've ever used Crowd web interface you would've noticed that, Crowd when querying for a group also retrieves a list of all the group members. It may be too much of an overhead to do it before even displaying group attributes. Fortunately there's a way to outsmart Crowd!<br><br/>
If you take a look at the RemoteDirectory interface, you'll be able to find those three methods:<ol><li>public List<RemotePrincipal> findAllGroupMembers(final String groupName);</li><li>public RemoteGroup findGroupByName(String groupName, boolean onlyFetchDirectMembers);</li><li>public RemoteGroup findGroupByName(String groupName);</li></ol>First one returns list of principals that are members of the group. Second and the third methods return an object representing group with the given name. Crowd uses latter methods when you click on group name in Crowd's web interface, it then displays a page with group description and its attributes. You may even not want to see the list of principals that is accessible through the Members tab. <br><br/>
Problem is that RemoteGroup instances returned by the second and third methods must already be populated with a list of members; otherwise Members tab will be empty. So there seem to be not much of a choice, either waste time querying for principals or have non functional connector.<br><br/>
Let's take a closer look at RemoteGroup, it has an interesting method setDirectory(Directory directory). It appears that Crowd uses this directory if it sees that the object is not fully initialized. Thus we can return an empty RemoteGroup instance with only group name and activity flag set. Later Crowd will do the following:<blockquote>directory.getImplementation().findAllGroupMembers(...)</blockquote>if it needs a list of members.<br><br/>
Here's my extension of the Directory class:<br/>
<pre style="font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 11px; padding: 5px; overflow: auto; width: 100%" class="brush: java">class ReferencingDirectory extends Directory {
private static final long serialVersionUID = 1L;
/**
* Reference to the remote directory to avoid object creation.
*/
private final RemoteDirectory remoteDirectory;
public ReferencingDirectory(RemoteDirectory remoteDirectory) {
Utils.checkNull("Remote directory", remoteDirectory);
this.remoteDirectory = remoteDirectory;
}
@Override
public RemoteDirectory getImplementation() {
return remoteDirectory;
}
@Override
public String getImplementationClass() {
return remoteDirectory.getClass().getCanonicalName();
}
}</pre><br/>
I keep an instance of it in my implementation of RemoteDirectory and pass it to RemoteGroup whenever I need it.<br><br/>
That's it! Queries for groups are much faster now, and if really needed members are queried separately.<br/>Anonymoushttp://www.blogger.com/profile/09857000646589913051noreply@blogger.com0tag:blogger.com,1999:blog-1918083899729640857.post-47615530799269874992008-11-24T20:01:00.015+01:002010-03-14T01:27:32.847+01:00Atlassian Crowd Custom DirectoryToday I chose to share with you my experience with implementing custom directory connector to Atlassian Crowd. There is a rather straight forward <a href="http://confluence.atlassian.com/display/CROWD/Creating+a+Custom+Directory+Connector" id="eqjd" title="Crowd documentation, custom directory connector">interface</a> defined on Atlassian site which is not really hard to implement. What I would like to write about is a small number of tricks that helped me to achieve the desired results.<br/>
<div><div>By the way, if you take a look at the Atlassian documentation you will notice that implementation among others should extend <span style="font-family:Courier New;"><a title="DirectoryEntity" href="http://docs.atlassian.com/atlassian-crowd/current/com/atlassian/crowd/integration/model/DirectoryEntity.html" id="n-ph">DirectoryEntity</a> </span>which is not completely true, for me it was more than enough to implement the <span style="font-family:Courier New;"><a title="RemoteDirectory" href="http://docs.atlassian.com/atlassian-crowd/current/com/atlassian/crowd/integration/directory/RemoteDirectory.html" id="kjis">RemoteDirectory</a> </span>interface.<br/>
<span class="code-keyword"></span></div><div><h4>No roles</h4>Although notion of roles existed in my company's directory service still they were Windows specific and completely irrelevant to Atlassian products. I needed simple stub implementations of all the role related methods that would not break Crowd's work flow.<br/>
<br/>
</div>It was a good idea to return an empty list on <span style="font-family:Courier New;">findRoleMemberships(String principalName)</span><b> </b>method invocation, as throwing an exception would result in exception each time one would try to get principal's information on Crowd's site. Another method <span style="font-family:Courier New;">searchRoles(SearchContext context)</span> was also better off returning an empty list instead of throwing an exception.<br/>
</div><div><h4>Legacy data</h4>We had been using JIRA, Confluence, and Bamboo for several years and naturally user and group information hadn't been synchronized with the central directory. Our original intention <span style="background-color: rgb(255, 255, 255);">was</span> to integrate Atlassian tools as smoothly as possible, so for every tool I used a stack of two directories one of which was always a snapshot of the user information at the integration stage. Crowd was clever enough to merge data from both of them and it worked especially well for the users that chose same log-in names as in the company's directory, their group membership from the both directories was merged.<br/>
</div><div><u style="color: rgb(255, 0, 0);">Note</u>: order in which directories are listed actually matters, to use passwords from the central directory I had to put the custom directory first in the list.<br/>
</div><div><h4>Read-only<br/>
</h4>According to my company's safety rules I had to implement a read-only connector. So I chose to throw the <span style="font-family:Courier New;">UnsupportedOperationException </span>exception when user attempted to update any information concerning principal, group or role. Crowd behaved very well whenever the exception was thrown.<br/>
<h4><b></b>Modifiable internal directory<br/>
</h4>I also wanted to allow administrators, from JIRA for example, to modify information in Crowd's internal directory. So my policy was following, throw <span style="font-family:Courier New;">ObjectNotFoundException </span>exception if user and/or group didn't exist in the central directory, methods concerned were: <span style="font-family:Courier New;">addPrincipalToGroup</span><span style="font-family:Courier New;">, </span><span style="font-family:Courier New;">removeGroup</span><span style="font-family:Courier New;">, </span><span style="font-family:Courier New;">removeGroup</span><span style="font-family:Courier New;">, </span><span style="font-family:Courier New;">removePrincipalFromGroup</span><span style="font-family:Courier New;">, </span><span style="font-family:Courier New;">updateGroup</span><span style="font-family:Courier New;">, </span><span style="font-family:Courier New;">updatePrincipal</span><span style="font-family:Courier New;">, </span><span style="font-family:Courier New;">updatePrincipalCredential</span>. In the other case methods defaulted to throwing <span style="font-family:Courier New;">UnsupportedOperationException </span>exception. All this led to the next work flow (<span style="font-family:Courier New;">updateGroup </span>is taken as an example):<br/>
</div><div><ul><li>Invocation of updateGroup of the custom directory (CD).</li><li>Group doesn't exist in CD, throw the <span style="font-family:Courier New;">ObjectNotFoundException</span>.</li><li>Crowd proceeds to the next directory, which is the Internal Directory (ID).</li><li>ID processes the request graciously.</li></ul></div><div>To achieve the aforementioned functionality with groups I simply called <span style="font-family:Courier New;">this.findGroupByName(groupName)</span> and if it didn't throw an exception the method threw the<span style="font-family:Verdana;"> <span style="font-family:Courier New;">UnsupportedOperationException</span></span><b style="font-family: Verdana;">. </b><span style="font-family:Verdana;">To check </span><span style="font-family:Verdana;">principal existence it was enough to call <span style="font-family:Courier New;">this.findPrincipalByName(name)</span>.<br/>
<br/>
Here's an example:<br/><br/>
<pre style="font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 11px; padding: 5px; overflow: auto; width: 100%" class="brush: java">public void addPrincipalToGroup(String name, String groupName)
throws ObjectNotFoundException
{
this.findGroupByName(unsubscribedGroup);
throw new UnsupportedOperationException(CANNOT_EDIT_DIRECTORY);
}</pre>
</span><span style="font-family:Courier New;"><span style="font-family:Verdana;">In contrary to the given example if the group actually existed in the central directory </span>UnsupportedOperationException </span><span style="font-family:Verdana;">was thrown</span><b style="font-family: Verdana;"> </b><span style="font-family:Verdana;">which </span><span style="font-family:Verdana;">forced Crowd to stop processing the request. This perfectly corresponded to my wishes.<br/>
</span><h4><span style="font-family:Verdana;">Conclusion</span></h4><h4><span style="font-family:Verdana;"></span></h4><span style="font-family:Verdana;">As the result I managed to keep the existing data in editable state merged with the central directory.</span><br/>
</div>What I have described in this post doesn't go outside of scope of abstract implementation of <span style="font-family:Courier New;">RemoteDirectory</span> which I called <span style="font-family:Courier New;">ReadOnlyRemoteDirectory</span>. Someday I will describe another trick that helped me minimize the group retrieval delay.<br/>Anonymoushttp://www.blogger.com/profile/09857000646589913051noreply@blogger.com5