Migrating away from Picasa 3

I’ve used Picasa for over 10 years to manage my family’s photo library. We have about 50,000 images in there with 22,000 tags, stars, album memberships, etc. Now that Picasa will no longer be supported by Google, I had to find a replacement. And it really hasn’t been easy. I thought I’d share my strategy for anybody who might be in a similar situation.

First off, I should explain my workflow:

  • save images in directories named with this convention: YYYY/MMDD - eventname; typically, I dump images from my camera into a folder after a trip, or family gathering, etc.
  • star the best photos immediately
  • do some face tagging, but I haven’t been 100% consistent about that — there are still thousands of photos with no face tagging
  • put some photos into albums — for example, I have an album of “artistic” shots where I might add a few pictures each year

I’m really happy with this process, and I refuse to give up my directory organization. If I were to lose all metadata, at least I would have the file organization to help me find photos from that trip to Jamaica in June of 2016, for example. And speaking of metadata, I really don’t want to lose all the work I’ve done in starring photos, face tagging, and filing photos in albums.

One of the problems with Picasa is that its metadata is stored in databases external to the photos. These are stored as PMP files, which presumably is a proprietary format used by the Picasa team. You may find advice online saying that you can get this information from EXIF/IPTC data in the files, but I believe that refers to previous versions of Picasa. I did find meta data in EXIF for some of my photos, but definitely not all. I believe the only place to get all the meta data is from the PMP database files.

My first thought was to try to use Amazon Photos or Google Photos because it would be so easy, and I could even get the storage for free (for Amazon, as a prime member — Google is also free, but only if you’re willing to accept reduced resolution on your images).

Amazon Photos turned out to be very feature-incomplete. Not really good for much more than a backup of our photos.

Google Photos is richer, but still isn’t a 1-for-1 replacement for Picasa. You can’t control how your photo files are organized into folders (it’s essentially one big bucket), and it doesn’t support starring or tagging images, which is unfortunate. It does have cool searchability, though — you can search for photos of “flowers”, and even if you never tagged or captioned any of your photos with the word “flower” or “flowers”, it can find a lot of flower photos. Pretty incredible. But the direct photo management isn’t good enough for my purposes.

Having given up on cloud-based solutions, I hoped to find a nice Web-based solution that I could host on our home Raspberry Pi, allowing anybody in the family to access and manage the photos, regardless of what laptop (or other platform) they happen to be using. The nicest web-based photo manager I could find was Lychee. It really is a nice piece of work. But its feature set doesn’t really make it a good Picasa replacement:

  • photos are all in one giant directory with unique hashed filenames; my photos were already in nice YYYY/MMDD directory structures, and I’m not willing to give that up
  • a photo can only be in one album
  • tagging and starring is nice, but it doesn’t have the “filter” capability of Picasa, where, for example, you could view only starred images, but have them presented within their albums; if you view starred images in Lychee, you’ll see one giant “album” of starred images, with no context as to which albums they might belong

Ultimately, I gave up on Lychee. I hope that the developer continues to expand its capabilities, and someday, it might be a viable Picasa replacement.

After looking at a variety of image management desktop applications, the one that seemed the most versatile is digikam. I haven’t found any other tools with as much power to inspect embedded metadata, and its UI is fairly similar to Picasa’s. It also doesn’t care how you organize the files on disk; it even has nice support for accessing your images over a network share. And I don’t think I’ll have problems with metadata lock-in; it can write metadata to the image files if you want, and its databases are sqlite3 databases, which means I’ll have no problems exporting data from them someday if need be.

Having settled on digikam, I needed to get the metadata out of the PMP files and into the EXIF/IPTC data. There were four key pieces of data I wanted:

  • stars
  • album membership
  • tags
  • face tagging

I figured the best way to get this information was to move all of it into tags inside of Picasa, and then find a way to read the tags and write them to embedded metadata int he images. Then digikam can read the embedded metadata when it scanned the directory tree for new images.

First step was to get all my Picasa metadata into tags. From within Picasa:

  1. Turn on the “show only starred images” filter. Select all images in your entire library, and add the tag “pstar” to them.
  2. For each of your albums, select all images, and add a tag based on the album title, e.g. “palbum-architecture” or “palbum-wildlife”
  3. Open each group of face-tagged images, select all, and add a tag like “ppeople-john-doe”

Admittedly, some of these steps can be quite tedious. And there actually are other ways to get things like face tagging, but I preferred to use one technique for all my metadata.

Once I got the metadata into Picasa tags, I needed to find a way to read the metadata from the Picasa database files. Luckily for me, Wayne Vosberg has built a library to read these PMP files: picasa3meta. I was able to leverage it to build a tool to traverse my entire photo directory structure and extract tags for each file. The tool can then call exiftool to write embedded metadata to the image files. My tool is here: picasa_tags_to_exif.py. My script is very rough, and is intended to serve more as an example than a finished product. I can explain a few things about it.

My environment was pretty unique. I was running Picasa on a Windows machine, but my photos lived on a Samba share on my Raspberry Pi. So on the Windows machine, the path to the photos was P:\Photos, but on the Pi, the path is something like /mnt/usb-primary/Photos. Picasa’s databases, of course, referenced the files by their Windows paths. But I needed to substitute /mnt/usb-primary/Photos for P:\Photos in each file that is in the database. Also, I needed to convert the path delimiters from backslashes to forward slashes. This had to be handled inside of the picasa3meta code. At line 157 of thumbindex.py, I inserted two lines:

I am not submitting a pull request to make this change, because it doesn’t apply to all environments. It’s fairly unique to my environment where the files are in a Linux filesystem, but were indexed by Picasa on an SMB share.

Once the code change was made to thumbindex.py, I copied the Picasa database files from my Windows machine over to a directory, /tmp/picasa3db, on my Linux box, and I was ready to invoke the script (make sure you have exiftool installed and on your path):

This took many hours to run on 50,000 images. Be ready to let it run unattended. Use screen to continue running it in the background in case you get disconnected from your Linux machine.

At the end of the process, all the tagging is available inside of the images themselves, making it available to Digikam. I am running Digikam on an Ubuntu workstation, so I mounted the SMB share on my Raspberry Pi as a cifs volume, and then I pointed Digikam at that path, letting it scan for images, which took several hours. Once it was done, though, I had all my photos available, and I could restore the metadata that I had in Picasa.

Stars

To restore stars, use the left toolbar to select the “Tags” view. Select the tag “pstar”. You’ll see all your starred photos across all the albums you filed them in. Select all, and then star one of the images (all will get starred). It’s up to you how you want to translate a Picasa star into Digikam stars. You could give each Picasa-starred image a single star, which would give you flexibility to assign up to 4 additional stars to really good photos. Or maybe you’d prefer to give them all 3 stars, with levels above and below.

Albums

With my filing strategy, the directories I filed my photos in become Digikam “albums”. These don’t have quite the same meaning as an album in Picasa, where the photos may come from different directories on the filesystem, but belong together for some reason (subject, photo style, quality, etc.). If you opt for such a filing strategy, the Digikam albums won’t be very meaningful, but you can still create the old concept of Picasa albums via tags. It’s simple to view all the photos with a common tag by choosing the Tags view from the left toolbar and selecting the desired tag.

People

With this technique, the facial recognition from Picasa won’t be directly translated into the Digikam “People” concept. You’ll have tags for each person, so again, you would use the Tags view to filter your library for all the photos of your Great Aunt Matilda.

There might be a better way to get that face recognition data into Digikam. I have used exportpicasa to export all the recognition data into a nice XML file, complete with rectangles indicating where Picasa found each face. I just haven’t tried to get it into Digikam.

I hope this helps somebody who was feeling as frustrated as I was when Picasa was end-of-lifed. I think we have to be smart about our media management, and be realistic — no software tool will exist forever. But the value of a media library is in the curation, which is extremely time consuming. We absolutely need the metadata to be stored in a portable way. Storing the data inside the images themselves is a good way to make sure that any good media software can access it. Not everybody likes to do that for a variety of reasons (safety of rewriting the image files and performance issues come to mind), but even if you opt to not write the data to the images, Digikam’s data is stored in databases that use well-understood formats (sqlite3) and have public documentation of their schemas. I’m confident that this is a platform that will safely carry your media library into the next generation of media management.

Leave a Reply

Your email address will not be published. Required fields are marked *