Sunday, June 28, 2009

EXIF with Python

Recently I wrote a small script on Python that restores photo file's creation and modification dates to their original values, taken from EXIF.

First of all you need to have Python (make sure to get version 2.6.x, 3.x won't work). You will also have to install some libraries.

Getting the libraries

Win32 Extensions
Linux users can skip this step. "Problem" with Windows is that files on FAT32 or NTFS have creation and modification date, while Python out of the box supports only the latter one. To fix that you'll need to get Win32 Extensions library and install it. If you skip this step, the script will work but it will be able to fix only the modification date, creation date will stay untouched.

pyexif
To parse EXIF headers you will need to get the pyexif library. Just unpack it to the folder where you have python installed.

exif
exif is a linux command line utility. In some respect it is better than the pyexif, as I had some pictures where pyexif failed to retrieve the data. For Linux users you need to check if you have exif installed.

The script
#!/usr/bin/env python

"""A simple utility to restore file creation and modification 
dates back to their original values from EXIF.

This script requires exif module to be installed or the exif  
command line utility to be in the path.

To function correctly under windows this script needs win32file and
win32con modules. Otherwise it will not be able to restore the creation 
date."""

import os, sys, time, re, glob
from datetime import datetime, timedelta

try:
  import win32file, win32con
  __use_win_32 = True
except:
  __use_win_32 = False

__path_to_exif = 'exif'

TEN_MINUTES = timedelta(minutes=10)

__description = """Restores file's creation and modification dates back to the original 
value from EXIF.
usage: exif_date.py [File name mask]"""    

def getExifCreationDate(path):
  """Gets the earliest date from the file's EXIF header, returns time tuple"""
  timeStamp = None
  try:
    import exif
    pf = exif.parse(path)
    originalTime = pf.get('DateTimeOriginal')
    if (originalTime):
      timeStamp = datetime.strptime(originalTime, '%Y:%m:%d %H:%M:%S')
  except:
    pass
  
  #sometimes exif lib failes to retrieve data
  if (not timeStamp):
    response = os.popen(__path_to_exif + ' -x "%s"' % path, 'r')
    lines = response.read()
    matches = re.findall('(.*?)', lines)
    if (len(matches)):
      timeStamp = min(*[datetime.strptime(x, '%Y:%m:%d %H:%M:%S') for x in matches])
  return timeStamp

def getFileDates(path):
  """Returns a dictionary of file creation (ctime), modification (mtime), exif (exif) dates"""
  dates = {}
  dates['exif'] = getExifCreationDate(path)
  dates['mtime'] = datetime.utcfromtimestamp(os.path.getmtime(path))
  dates['ctime'] = datetime.utcfromtimestamp(os.path.getctime(path))
  return dates

def setFileDates(fileName, dates):
  """Sets file modification and creation dates to the specified value"""
  if __use_win_32:
    filehandle = win32file.CreateFile(fileName, win32file.GENERIC_WRITE, 0, None, win32con.OPEN_EXISTING, 0, None)
    win32file.SetFileTime(filehandle, *(dates['exif'],)*3)
    filehandle.close()
  else:
    os.utime(fileName, (time.mktime(dates['exif'].utctimetuple()),)*2)

def fixFileDate(fileName):
  """Reads file's EXIF header, gets the earliest date and sets it to the file"""
  dates = getFileDates(fileName)
  if (dates['exif']):
    cmp_time = lambda x, y: x - y > TEN_MINUTES
    diff = [cmp_time(dates[x], dates['exif']) for x in ('mtime', 'ctime')]
    if(sum(diff)):
      setFileDates(fileName, dates)
    return dates, diff
  else:
    return dates, None

def usage():
  print __description

def main(args):
  if (not len(args)):
    usage()
    return - 1
  processedFiles = []
  for fileNameMask in args:
    if "*" in fileNameMask or "?" in fileNameMask:
      print "Looking for files with mask " + fileNameMask
    for fileName in filter(lambda x: x not in processedFiles, glob.glob(fileNameMask)):
      processedFiles.append(fileName)
      try:
        dates, diff = fixFileDate(fileName)
      except Exception, e:
        print e
        diff = None
      print fileName + ' - ',
      if (not diff):
        print 'SKIP, NO EXIF'
      else:
        if (sum(diff) != 0):
            print 'SET TO "%s" (updated M:%d, C:%d)' % (dates['exif'].strftime('%Y:%m:%d %H:%M:%S'), diff[0], diff[1])
        else:
          print 'OK'
  return 0

if __name__ == '__main__':
  sys.exit(main(sys.argv[1:]))
Click here or here to download the script.

Edit: removed OptionParser from imports.
Edit 2: improved handling of daylight savings time.

3 comments:

Olah Ambrus Sandor aolah76@freemail.hu said...

Running on Windows there is always 1 hour difference between the exif hour and the file date hour. The exif hour is always greater with 1 than the file date hour. Rerunning the script it writes out that the file date was set to the exif date hour but something is wrong.

Unknown said...

Good catch, I think it may be due to daylight savings time. I'll take a look.

Unknown said...

I have posted new version of the script which fixes the issue. Please note if you're going to look at file properties Windows offsets by one hour time stamps of files dated from different daylight savings period.

Anyway the new script stays consistent and once executed doesn't re-date the files anymore.