Which Painting Do You Look Like? Comparing Faces Using Python and OpenCV

face recognition hack hacks opencv python · 22 comments · 1,250 words · Viewed ~24,933 times

Many years ago, as I was wandering around the Louvre, I came across a painting which bore an uncanny resemblance to me!

Spooky, eh?

Yeah, yeah, it's not the greatest likeness ever, but people who know me seem to think I look like the chap on the left.

This got me thinking... Wouldn't it be great if when you entered an art gallery, a computer could tell you which painting you look most like?

Well, I think it would be great. This is my blog, so what I say goes!

Getting The Data

I'm using the Tate's Open Data Set to grab scans of all their artwork. ~60,000 images in total.

Finding Faces

Not all paintings are of people. Some artsy types like to paint landscapes, dogs, starry nights, etc.

Using Python and OpenCV, we can detect faces in paintings. Then crop out the face and save it. The complete code is on GitHub - but here are the essentials.

import sys, os
import cv2
import urllib
from urlparse import urlparse

def detect(path):
    img = cv2.imread(path)
    cascade = cv2.CascadeClassifier("haarcascade_frontalface_alt.xml")
    rects = cascade.detectMultiScale(img, 1.3, 4, cv2.cv.CV_HAAR_SCALE_IMAGE, (20,20))

    if len(rects) == 0:
        return [], img
    rects[:, 2:] += rects[:, :2]
    return rects, img

def box(rects, img, file_name):
    i = 0   #   Track how many faces found
    for x1, y1, x2, y2 in rects:
        print "Found " + str(i) + " face!"  #   Tell us what's going on
        cut = img[y1:y2, x1:x2] #   Defines the rectangle containing a face
        file_name = file_name.replace('.jpg','_')   #   Prepare the filename
        file_name = file_name + str(i) + '.jpg'
        file_name = file_name.replace('n','')
        print 'Writing ' + file_name
        cv2.imwrite('detected/' + str(file_name), cut)   #   Write the file
        i += 1  #   Increment the face counter

def main():
    #   all.txt contains a list of thumbnail URLs
    for line in open('all.txt'):
        file_name = urlparse(line).path.split('/')[-1]
        print "URL is " + line

        if (urllib.urlopen(line).getcode() == 200):
            #   Download to a temp file
            urllib.urlretrieve(line, "temp.jpg")
            #   Detect the face(s)
            rects, img = detect("temp.jpg")
            #   Cut and kepp
            box(rects, img, file_name)
        else:
            print '404 - ' + line

if __name__ == "__main__":
    main()

We now have a directory of files. Each file is a separate face. We assume that no two faces are of the same person - this is important for the next stage...

Building Eigenfaces

Imagine that a picture of your face could be represented by a series of properties. For example

How far apart your eyes are.
Distance from nose to mouth.
Ratio of ear length to nose width.
etc.

That is, in grossly simplified terms, what an Eigenface is.

If I have a database of Eigenfaces, I can take an image of your face and compare it with all the others and find the closest match.

We'll split this process into two parts.

Generate the EigenFaces

We need the arrange the images so that each unique face is in its own directory. If you know that you have more than one picture of each person, you can put those images in the same directory.

E.G.

   |-path
    -|-Alice
      | |-0.jpg
      | |-1.jpg
      |
      |-Bob
      | |-0.jpg
      |
      |-Carly
      ...

This code is adapted from Philipp Wagner's work.

It takes a directory of images, analyses them, and creates an XML file containing the Eigenfaces.

WARNING: This code will take a long time to run if you're using thousands of images. On a dataset of 400 images, the resulting file took up 700MB of disk space.

import os
import sys
import cv2
import numpy as np

def normalize(X, low, high, dtype=None):
    """Normalizes a given array in X to a value between low and high."""
    X = np.asarray(X)
    minX, maxX = np.min(X), np.max(X)
    # normalize to [0...1].
    X = X - float(minX)
    X = X / float((maxX - minX))
    # scale to [low...high].
    X = X * (high-low)
    X = X + low
    if dtype is None:
        return np.asarray(X)
    return np.asarray(X, dtype=dtype)

def read_images(path, sz=None):
    X,y = [], []
    count = 0
    for dirname, dirnames, filenames in os.walk(path):
        for subdirname in dirnames:
            subject_path = os.path.join(dirname, subdirname)
            for filename in os.listdir(subject_path):
                try:
                    im = cv2.imread(os.path.join(subject_path, filename), cv2.IMREAD_GRAYSCALE)
                    # resize to given size (if given)
                    if (sz is not None):
                        im = cv2.resize(im, sz)
                    X.append(np.asarray(im, dtype=np.uint8))
                    y.append(count)
                except IOError, (errno, strerror):
                    print "I/O error({0}): {1}".format(errno, strerror)
                except:
                    print "Unexpected error:", sys.exc_info()[0]
                    raise
            count = count+1
    return [X,y]

if __name__ == "__main__":
    if len(sys.argv) < 1:
        print "USAGE: eigensave.py "
        sys.exit()
    # Now read in the image data. This must be a valid path!
    [X,y] = read_images(sys.argv[1], (256,256))
    # Convert labels to 32bit integers. This is a workaround for 64bit machines,
    y = np.asarray(y, dtype=np.int32)

    # Create the Eigenfaces model.
    model = cv2.createEigenFaceRecognizer()
    # Learn the model. Remember our function returns Python lists,
    # so we use np.asarray to turn them into NumPy lists to make
    # the OpenCV wrapper happy:
    model.train(np.asarray(X), np.asarray(y))

    # Save the model for later use
    model.save("eigenModel.xml")

After that has run - assuming your computer hasn't melted - you should have a file called "eigenModel.xml"

Compare A Face

So, we have a file containing the Eigenfaces. Now we want to take a photograph and compare it to all the other faces in our model.

This is called by running:

python recognise.py /path/to/images photo.jpg 100000.0

The "100000.0" is a floating-point number which determines how close you want the match to be. A value of "100.0" would be identical. The larger the number, the less precise the match.

import os
import sys
import cv2
import numpy as np

if __name__ == "__main__":
    if len(sys.argv) < 4:
        print "USAGE: recognise.py  sampleImage.jpg threshold"
        print "threshold is an float. Choose 100.0 for an extremely close match.  Choose 100000.0 for a fuzzier match."
        print str(len(sys.argv))
        sys.exit()

    # Create an Eign Face recogniser
    t = float(sys.argv[3])
    model = cv2.createEigenFaceRecognizer(threshold=t)

    # Load the model
    model.load("eigenModel.xml")

    # Read the image we're looking for
    sampleImage = cv2.imread(sys.argv[2], cv2.IMREAD_GRAYSCALE)
    sampleImage = cv2.resize(sampleImage, (256,256))

    # Look through the model and find the face it matches
    [p_label, p_confidence] = model.predict(sampleImage)

    # Print the confidence levels
    print "Predicted label = %d (confidence=%.2f)" % (p_label, p_confidence)

    # If the model found something, print the file path
    if (p_label > -1):
        count = 0
        for dirname, dirnames, filenames in os.walk(sys.argv[1]):
            for subdirname in dirnames:
                subject_path = os.path.join(dirname, subdirname)
                if (count == p_label):
                    for filename in os.listdir(subject_path):
                        print subject_path

                count = count+1

That will spit out the path to the face that most resembles the photograph.

Who Am I?

Well, it turns out that my nearest artwork in the Tate's collection is...

Sir John Drake!

So, there you have it. My laptop isn't powerful enough to crunch through the ~3,000 faces found in The Tate's collection. I'd love to see how this works given a powerful enough machine with lots of free disk space. If you fancy running the code - you'll find it all on my GitHub page.

22 thoughts on “Which Painting Do You Look Like? Comparing Faces Using Python and OpenCV”

Alex

Cool. You (or someone) should compile all this into a little web app.

Reply 2014-06-30 16:39
William

Hi Terence, i found this proyect very useful for my FGP. I´m having a problem when running the script eingesave.py. I´m using Python 2.7 on Windows 8 64 bits. I have all the libraries installed and my own database format. Jpg. When I run this script in the command windows, an error appears:

"OpenCV Error: Assertion failed (ssize.area ()> 0) in unknown function, file .. .. .. src opencv modules imgproc src imgwarp.cpp, line 1723 Unexpected error: Traceback (most recent call last): File "C: Users William Desktop THESIS Python facerec2 eigensave.py", line 118, in [X, y] = read_images (sys.argv [1], (256,256)) File "C: Users William Desktop THESIS Python facerec2 eigensave.py", line 87, in read_images cv2.resize im = (im, sz) cv2.error: .. .. .. src opencv modules imgproc src imgwarp.cpp: 1723: error: (-215) ssize.area ()> 0 "

What can be the problem?

Thank you very much.

Reply 2014-08-04 17:05
1. Terence Eden
  
  Are all your images already square? I don't use Windows, so I'm not much help there - sorry!
  
  Reply 2014-08-05 15:12
William Cikel

Yes, they are squares. Specially, i`ve problem with this function: [X, y] = read_images (sys.argv [1], (256,256))

Reply 2014-08-07 20:01
William Cikel

I managed to fix the read_images problem, now the problem is as follows:

File "C:UsersWilliamDesktopTESISPythonfacerec2eigensave.py", line 105, in y = np.asarray(y, dtype=np.int32) File "C:Python27Libsite-packagesnumpycorenumeric.py", line 462, in asarr ay return array(a, dtype, copy=False, order=order) ValueError: invalid literal for long() with base 10: 'C:UsersWilliamDeskto pTESISPythonfacerec2datawils1.jpg'

I read that is somithing about converting label to integer.

Thanks

Reply 2014-08-07 21:32
vtanvuong

i have a error here i run recognise.py OpenCV error: unspecified error (file can't be opened for writing!) traceback (most recent call last): file "recognise.py", line 35, in model.load("eigenModel.xml") cv2.error: ...facerec.cpp:398 help me, thanks.

Reply 2015-01-19 23:41
1. Terence Eden
  
  The error message is quite clear - your computer cannot write to the file. Ensure that your permissions are set correctly.
  
  Reply 2015-01-20 13:28
  1. vtanvuong
    
    thank you so much for your support
    
    Reply 2015-01-21 07:17
Chuck Fulminata

Is there any way to make this multithreaded?

Reply 2015-04-22 10:22
cleocredo

Good day!

First of all, congratulations on this! This is so interesting and I did some modifications on this one. I do have some questions, what is the confidence formula? what is it based? Thanks in advance!

Reply 2015-05-11 06:16
1. Terence Eden
  
  I'm not sure - take a look at the OpenCV Eigenface documentation.
  
  Reply 2015-05-11 09:40
jacop

I tried it on macos 10.10 and ı get this error ? thanks Traceback (most recent call last): File "newfaces.py", line 47, in [X,y] = read_images(sys.argv[1], (256,256)) File "newfaces.py", line 31, in read_images im = cv2.resize(im, sz) cv2.error: /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_tarballs_ports_graphics_opencv/opencv/work/opencv-2.4.10/modules/imgproc/src/imgwarp.cpp:1968: error: (-215) ssize.area() > 0 in function resize

Reply 2015-05-18 21:35
Vadim Mironov

Hello, Thank you! Scripts work very well on Centos 7 x64. But I have one question:

# Look through the model and find the face it matches [p_label, p_confidence] = model.predict(sampleImage)

It returns the best match, or first that fits the threshold ?

Thanks Again!

Reply 2015-07-06 15:33
some1else

Another result: http://www.tate.org.uk/art/artworks/gainsborough-gainsborough-dupont-n06242 Using Visual Similarity with Google Image Search: http://is.gd/FMJzvr

Reply 2015-07-12 14:15
gsever

Can the trained model return multiple matches or does it only return the best possible match?

Reply 2015-09-23 23:31
anusha

Hey I used the same code as ur's...Im getting This Result for any Images (0, 0.0) Predicted label = 0 (confidence=0.00) ....Any help?

Reply 2015-10-09 05:50
1. Terence Eden
  
  Looks like you either don't have the correct version of OpenCV installed, or you're not sending it any images.
  
  Reply 2015-10-09 10:25
Alistair

I have created Java version of your face extraction code here: https://github.com/alistairrutherford/faceit

Reply 2015-10-17 18:01
1. Terence Eden
  
  Brilliant 🙂
  
  Reply 2015-10-17 19:16
April lee

Hi,

I have receive this error:

[X, y] = read_images (sys.argv [1], (256,256)) IndexError: list index out of range

How do I solve the problem?

Thank you very much

Reply 2015-12-09 07:17
1. Terence Eden
  
  It looks like you're trying to read from a list element which doesn't exist. I'd suggest taking a quick look around some Python tutorials to see why.
  
  Reply 2015-12-10 19:04
  1. April lee
    
    Hi,
    
    I managed to solve the problem, but is there any way which it can match more accurately?
    
    Thank you
    
    Reply 2015-12-17 06:23