Sunday, June 28, 2009

EXIF with Python

I have decided to share with you a small script I wrote on Python. It restores photo file's creation and modification dates to their original values, provided the picture was take with EXIF enabled camera.

First of all you need to have Python (make sure to get version 2.6.x, 3.x won't work). You will also have to install some libraries.

Getting the libraries

Win32 Extensions
Linux users can skip this step. "Problem" with Windows is that files on FAT32 or NTFS have creation and modification date, while Python out of the box supports only the latter one. To fix that you'll need to get Win32 Extensions library and install it. If you skip this step, the script will work but it will be able to fix only the modification date, creation date will stay untouched.

pyexif
To parse EXIF headers you will need to get the pyexif library. Just unpack it to the folder where you have python installed.

exif
exif is a linux command line utility. In some respect it is better than the pyexif, as I had some pictures where pyexif failed to retrieve the data. For Linux users you need to check if you have exif installed.

The script
#!/usr/bin/env python

"""A simple utility to restore file creation and modification
dates back to their original values from EXIF.

This script requires exif module to be installed or the exif
command line utility to be in the path.

To function correctly under windows this script needs win32file and
win32con modules. Otherwise it will not be able to restore the creation
date."""

import os, sys, time, re, glob

__path_to_exif = 'exif'

__description = """Restores file's creation and modification dates back to the original
value from EXIF.
usage: exif_date.py [File name mask]"""

def getExifCreationDate(path):
"""Gets the earliest date from the file's EXIF header, returns time tuple"""
timeStamp = None
try:
import exif
pf = exif.parse(path)
originalTime = pf.get('DateTimeOriginal')
if (originalTime):
timeStamp = time.strptime(originalTime, '%Y:%m:%d %H:%M:%S')
except:
pass

#sometimes exif lib failes to retrieve data
if (not timeStamp):
response = os.popen(__path_to_exif + ' -x "%s"' % path, 'r')
lines = response.read()
matches = re.findall('(.*?)', lines)
if (len(matches)):
timeStamp = min(*[time.strptime(x, '%Y:%m:%d %H:%M:%S') for x in matches])
return timeStamp

def getFileDates(path):
"""Returns a dictionary of file creation (ctime), modification (mtime), exif (exif) dates"""
dates = {}
dates['exif'] = getExifCreationDate(path)
dates['mtime'] = time.localtime(os.path.getmtime(path))
dates['ctime'] = time.localtime(os.path.getctime(path))
return dates

def setFileDates(fileName, dates):
"""Sets file modification and creation dates to the specified value"""
try:
import win32file, win32con
filehandle = win32file.CreateFile(fileName, win32file.GENERIC_WRITE, 0, None, win32con.OPEN_EXISTING, 0, None)
win32file.SetFileTime(filehandle, *(dates['exif'],)*3)
filehandle.close()
except:
os.utime(fileName, (time.mktime(dates['exif']),)*2)

def fixFileDate(fileName):
"""Reads file's EXIF header, gets the earliest date and sets it to the file"""
dates = getFileDates(fileName)
if (dates['exif']):
cmp_time = lambda x, y: list(x)[: - 1] != list(y)[: - 1]
diff = [cmp_time(dates[x], dates['exif']) for x in ('mtime', 'ctime')]
if(sum(diff)):
setFileDates(fileName, dates)
return dates, diff
else:
return dates, None

def usage():
print __description

def main(args):
if (not len(args)):
usage()
return - 1
processedFiles = []
for fileNameMask in args:
print "Looking for files with mask " + fileNameMask
for fileName in filter(lambda x: x not in processedFiles, glob.glob(fileNameMask)):
processedFiles.append(fileName)
try:
dates, diff = fixFileDate(fileName)
except Exception, e:
print e
diff = None
print fileName + ' - ',
if (not diff):
print 'SKIP, NO EXIF'
else:
if (sum(diff) != 0):
print 'SET TO "%s" (updated M:%d, C:%d)' % (time.strftime('%Y:%m:%d %H:%M:%S', dates['exif']), diff[0], diff[1])
else:
print 'OK'
return 0

if __name__ == '__main__':
sys.exit(main(sys.argv[1:]))
Click here to download the script.

Edit: removed OptionParser from imports.

Wednesday, February 18, 2009

XML persistence with Hibernate and Xstream

Not a long time ago in the project I had been working on, we were writing a lot of blobs to a database. The information persisted was not of high interest, so we didn't even bother breaking the blobs into separate tables. Of course on very rare occasions we needed to see the data but couldn't.

The problem seemed to be hard to solve until I had stumbled across a very neat XML persistence library Xstream. It allows you to persist and restore objects of any type to and from XML without any additional configuration. To learn more about the library I recommend you to skip through their very short but expressive tutorial.

We used Hibernate for persistence, so I needed to teach it how to use Xstream. Thankfully Hibernate is an easily extensible library. I had written a persister that was capable of transparently persisting any object as an array of chars in form of XML.

My solution has a major drawback though, which I haven't fixed, max length of character array is restricted in the DBMS we use (in Oracle it is 4000 chars). So if you need to persist some really long sets of data or just big objects you might have to rewirte the persister to use CBLOB or any other alternative in your database.
package cern.oasis.util.persistence;

import java.io.Serializable;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Types;

import org.hibernate.Hibernate;
import org.hibernate.HibernateException;
import org.hibernate.usertype.UserType;

import com.thoughtworks.xstream.XStream;

/**
* Persists any serializable java object as XML string.
*
* @author Ivan Koblik
*/
public class XMLString implements UserType {

private static final int[] TYPES = { Types.VARCHAR };

private final static XStream stream = new XStream();
static {
//Here you can define aliases of some classes if you want to
//stream.alias("Impedance", Impedance.class);
}

@Override
public Object assemble(Serializable cached, Object owner)
throws HibernateException {
return null == cached ? null : stream.fromXML((String) cached);
}

@Override
public Serializable disassemble(Object value) throws HibernateException {
return null == value ? null : stream.toXML(value);
}

@Override
public Object deepCopy(Object value) throws HibernateException {
return null == value ? null : stream.fromXML(stream.toXML(value));
}

@Override
public boolean equals(Object x, Object y) throws HibernateException {
if (x == y) {
return true;
} else if (x == null || y == null) {
return false;
} else {
return x.equals(y);
}
}

@Override
public int hashCode(Object x) throws HibernateException {
return null == x ? 0 : x.hashCode();
}

@Override
public boolean isMutable() {
return true;
}

@Override
public Object nullSafeGet(ResultSet rs, String[] names, Object owner)
throws HibernateException, SQLException {
String persistedXml = (String) Hibernate.STRING.nullSafeGet(rs, names[0]);
return null == persistedXml ? null : stream.fromXML(persistedXml);
}

@Override
public void nullSafeSet(PreparedStatement st, Object value, int index)
throws HibernateException, SQLException {
String xmlToPersist = null == value ? null : stream.toXML(value);
if (null != xmlToPersist && xmlToPersist.length() > 4000) {
throw new RuntimeException(
"Can not persist strings longer then 4000 characters:\n" + xmlToPersist);
}
Hibernate.STRING.nullSafeSet(st, xmlToPersist, index);
}

@Override
public Object replace(Object original, Object target, Object owner)
throws HibernateException {
return this.deepCopy(original);
}

@Override
public Class returnedClass() {
return Serializable.class;
}

@Override
public int[] sqlTypes() {
return TYPES;
}

}
That's it, the only thing that is left to do is to tell Hibernate to use this persister. Here's an example with annotations:
  @Type(type = "cern.oasis.util.persistence.XMLString")
Hope you'll find it useful. Comments are more then welcome!

Saturday, November 29, 2008

Tricking the RemoteGroup

Last time I promised you to write of another useful trick that helped me optimize group queries in Atlassian Crowd.

If you've ever used Crowd web interface you would've noticed that, Crowd when querying for a group also retrieves a list of all the group members. It may be too much of an overhead to do it before even displaying group attributes. Fortunately there's a way to outsmart Crowd!

If you take a look at the RemoteDirectory interface, you'll be able to find those three methods:
  1. public List<RemotePrincipal> findAllGroupMembers(final String groupName);
  2. public RemoteGroup findGroupByName(String groupName, boolean onlyFetchDirectMembers);
  3. public RemoteGroup findGroupByName(String groupName);
First one returns list of principals that are members of the group. Second and the third methods return an object representing group with the given name. Crowd uses latter methods when you click on group name in Crowd's web interface, it then displays a page with group description and its attributes. You may even not want to see the list of principals that is accessible through the Members tab.

Problem is that RemoteGroup instances returned by the second and third methods must already be populated with a list of members; otherwise Members tab will be empty. So there seem to be not much of a choice, either waste time querying for principals or have non functional connector.

Let's take a closer look at RemoteGroup, it has an interesting method setDirectory(Directory directory). It appears that Crowd uses this directory if it sees that the object is not fully initialized. Thus we can return an empty RemoteGroup instance with only group name and activity flag set. Later Crowd will do the following:
directory.getImplementation().findAllGroupMembers(...)
if it needs a list of members.

Here's my extension of the Directory class:
class ReferencingDirectory extends Directory {
  private static final long serialVersionUID = 1L;
  /**
   * Reference to the remote directory to avoid object creation.
   */
  private final RemoteDirectory remoteDirectory;

  public ReferencingDirectory(RemoteDirectory remoteDirectory) {
    Utils.checkNull("Remote directory", remoteDirectory);
    this.remoteDirectory = remoteDirectory;
  }

  @Override
  public RemoteDirectory getImplementation() {
    return remoteDirectory;
  }

  @Override
  public String getImplementationClass() {
    return remoteDirectory.getClass().getCanonicalName();
  }
}

I keep an instance of it in my implementation of RemoteDirectory and pass it to RemoteGroup whenever I need it.

That's it! Queries for groups are much faster now, and if really needed members are queried separately.

Monday, November 24, 2008

Atlassian Crowd Custom Directory

Today I chose to share with you my experience with implementing custom directory connector to Atlassian Crowd. There is a rather straight forward interface defined on Atlassian site which is not really hard to implement. What I would like to write about is a small number of tricks that helped me to achieve the desired results.
By the way, if you take a look at the Atlassian documentation you will notice that implementation among others should extend DirectoryEntity which is not completely true, for me it was more than enough to implement the RemoteDirectory interface.

No roles

Although notion of roles existed in my company's directory service still they were Windows specific and completely irrelevant to Atlassian products. I needed simple stub implementations of all the role related methods that would not break Crowd's work flow.

It was a good idea to return an empty list on findRoleMemberships(String principalName) method invocation, as throwing an exception would result in exception each time one would try to get principal's information on Crowd's site. Another method searchRoles(SearchContext context) was also better off returning an empty list instead of throwing an exception.

Legacy data

We had been using JIRA, Confluence, and Bamboo for several years and naturally user and group information hadn't been synchronized with the central directory. Our original intention was to integrate Atlassian tools as smoothly as possible, so for every tool I used a stack of two directories one of which was always a snapshot of the user information at the integration stage. Crowd was clever enough to merge data from both of them and it worked especially well for the users that chose same log-in names as in the company's directory, their group membership from the both directories was merged.
Note: order in which directories are listed actually matters, to use passwords from the central directory I had to put the custom directory first in the list.

Read-only

According to my company's safety rules I had to implement a read-only connector. So I chose to throw the UnsupportedOperationException exception when user attempted to update any information concerning principal, group or role. Crowd behaved very well whenever the exception was thrown.

Modifiable internal directory

I also wanted to allow administrators, from JIRA for example, to modify information in Crowd's internal directory. So my policy was following, throw ObjectNotFoundException exception if user and/or group didn't exist in the central directory, methods concerned were: addPrincipalToGroup, removeGroup, removeGroup, removePrincipalFromGroup, updateGroup, updatePrincipal, updatePrincipalCredential. In the other case methods defaulted to throwing UnsupportedOperationException exception. All this led to the next work flow (updateGroup is taken as an example):
  • Invocation of updateGroup of the custom directory (CD).
  • Group doesn't exist in CD, throw the ObjectNotFoundException.
  • Crowd proceeds to the next directory, which is the Internal Directory (ID).
  • ID processes the request graciously.
To achieve the aforementioned functionality with groups I simply called this.findGroupByName(groupName) and if it didn't throw an exception the method threw the UnsupportedOperationException. To check principal existence it was enough to call this.findPrincipalByName(name).

Here's an example:

public void addPrincipalToGroup(String name, String groupName)
 throws ObjectNotFoundException
{
 this.findGroupByName(unsubscribedGroup);
 throw new UnsupportedOperationException(CANNOT_EDIT_DIRECTORY);
}


In contrary to the given example if the group actually existed in the central directory UnsupportedOperationException was thrown which forced Crowd to stop processing the request. This perfectly corresponded to my wishes.

Conclusion

As the result I managed to keep the existing data in editable state merged with the central directory.
What I have described in this post doesn't go outside of scope of abstract implementation of RemoteDirectory which I called ReadOnlyRemoteDirectory. Someday I will describe another trick that helped me minimize the group retrieval delay.