Creating a photobook from a loooot of images (the linux and android way) pt. 2

With our first kid, me and my wife created new photobooks about every 3 months. Once the second kid arrived, we could not make that happen (everyone who has kids will understand …). He is almost 14 months old now, we have not created a single book yet and I am tasked with creating a photobook from the horrible amount of 25 thousand files. And of course, not only did we not create a photobook for a over a year, we also did not create the “good” folders …

Being a parent of two, using the computer for weeks every day to sort this amount of photos is really not an option. I can and only want to do that when the kids are not around but I also don’t want to do it every evening when the kids are asleep which would mean I wouldn’t have any time with my wife. So I figured I could propably successively do that with my tablet computer during the train ride to- and back home from work. But which tablet holds 127 gigabytes (that’s the “du -sh” of the 25k pictures…)? So first thing … downscale all the pictures. And here is the first subtle problem. With so many pictures, the picture counter (10k) on the camera wrapped, which means simply copying all pictures to one destination folder is unsafe because it might overwrite files.

So I moved all the date folders I needed to include in the photobook to one folder and then ran

for p in $(find photobookfiles -iname "*jpg" -o -iname "*jpeg"); do
 d=$(dirname $p)
 mkdir -p copytotablet/$d
 convert -geometry 1280 $p copytotablet/$p

This took a couple of hours and left me with around 10 gigabytes of downscaled photos in the new folder “copytotablet”. I did what the folder name suggested and started looking for the definitive android app to go through all these pictures and efficiently sort them.

I ended up using quickpic. Not only will it display the pictures by folders (which gives you an idea of how far you’ve come), but it also offers moving or copying images and, just like eog, it remembers the last used destination folder for copy/move operations. It also does not slow down with so many pictures on the device as did a lot of other apps. Aaaand, it automatically renames files if the target file already exists. That’s important due to the counter wrap issue explained above.

So for the next 6 weeks or so, I spent almost every train ride scrolling through pictures and hitting the “move to” button of quickpic whenever I liked a picture. Those 20 minutes of every ride are more than enough for one day because this task quickly becomes exhausting.

With now all the pictures I wanted to be in the photobook (still over 1000 files btw) in one folder (on the tablet computer) but the original files all back in their own date folders and the added problem of duplicate file names, I started scripting for about an hour without a satisfying result. Then I asked google and it pointed me to the wonderful tool findimagedupes. It can find duplicate images even if they have different resolutions. How it does that is explained in the man page.

I played around with it a bit and found that

findimagedupes -t 98 -R copiedfromtablet photobookfiles > dupes.txt

was a clean and effortless way (which took about 3 hours) to find the original full resolution files that should go into the photobook. By default, a 90% match is considered a dupe, but this tends to be a problem if you, like I do, take a lot of pictures within a short amount of time. I found 98% match to be a better value for the script. It produces output like

/home/dominik/Pictures/photobookfiles/20131009/IMG_8063.JPG /home/dominik/Pictures/copiedfromtablet/IMG_8063.JPG

For now getting the definitive list of full resolution files for the photobooksoftware, only one column is needed, which can be easily extracted by

grep -oE "\/\S+photobookfiles\S+" dupes.txt

Final step is to copy all those files into one folder in order to be able to import them into the photobooksoftware easily. Once again, it is important not to overwrite files. This simple script gets this done:

grep -oE "\/\S+photobookfiles\S+" dupes.txt | while read f; do
 p=$(basename $f)
 e=$(echo $f|grep -oE "[^\.]+$")
 cp -v $f filesforthebook/${p}_${RANDOM}.${e}

And that’s it. Hope this saves some parents some time.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s