Friday, June 9, 2017

Wink Detection using Dlib and OpenCV

A couple of weeks ago, I was going through a tutorial for eye blink detection by Adrian at PyImageSearch. It was an excellent tutorial, which explained the use of Eye Aspect Ratio (EAR) in order to detect when an eye gets closed. Then, few weeks back, I was having a chat with Shirish Ranade, a reader of this blog and a fellow computer vision and machine learning enthusiast, on whether we can perform an action by winking at the computer. So, I decided to try out a code to detect winking.

Wink Detection Running with Dlib and OpenCV
Wink Detection Running with Dlib and OpenCV
It's an interesting idea to perform an action or a task just by winking at your computer. It can be thought as a form of gesture detection or facial expression detection as well. So, here's how you can build your own 'wink' detector for it.

We start by importing all the necessary packages,
 import numpy as np  
 import cv2  
 import dlib  
 from scipy.spatial import distance as dist  


I'll be using the same Eye Aspect Ratio method to detect when an eye gets closed, so I'm going to use the same eye_aspect_ratio function by Adrian,
 def eye_aspect_ratio(eye):  
   # compute the euclidean distances between the two sets of  
   # vertical eye landmarks (x, y)-coordinates  
   A = dist.euclidean(eye[1], eye[5])  
   B = dist.euclidean(eye[2], eye[4])  
   
   # compute the euclidean distance between the horizontal  
   # eye landmark (x, y)-coordinates  
   C = dist.euclidean(eye[0], eye[3])  
   
   # compute the eye aspect ratio  
   ear = (A + B) / (2.0 * C)  
   
   # return the eye aspect ratio  
   return ear  

Note: This code of the eye_aspect_ratio function is directly taken from here. All credits for it should go to Adrian at PyImageSearch.

In order to calculate the Eye Aspect Ratio, we need to detect the outline points of the eyes first. And, in order to detect the eye points, we first need to detect the face, and then detect the face landmarks. So, we'll be using the same steps we used when extracting individual facial features from Dlib Face Landmarks.

We first define the ranges of the Face Landmarks,
 FULL_POINTS = list(range(0, 68))  
 FACE_POINTS = list(range(17, 68))  
   
 JAWLINE_POINTS = list(range(0, 17))  
 RIGHT_EYEBROW_POINTS = list(range(17, 22))  
 LEFT_EYEBROW_POINTS = list(range(22, 27))  
 NOSE_POINTS = list(range(27, 36))  
 RIGHT_EYE_POINTS = list(range(36, 42))  
 LEFT_EYE_POINTS = list(range(42, 48))  
 MOUTH_OUTLINE_POINTS = list(range(48, 61))  
 MOUTH_INNER_POINTS = list(range(61, 68))  

We also define a couple of variables which helps us with the EAR calculations,
 EYE_AR_THRESH = 0.25  
 EYE_AR_CONSEC_FRAMES = 3  
   
 COUNTER_LEFT = 0  
 TOTAL_LEFT = 0  
   
 COUNTER_RIGHT = 0  
 TOTAL_RIGHT = 0  

The EYE_AR_THRESH defines the threshold for the EAR value where we consider the eye to be closed. You may need to experiment a bit with this value, as it may depend on your camera and the background lighting.

The EYE_AR_CONSEC_FRAMES defines the number of consecutive frames the eye needs to be 'closed' in order for us to detect it as a 'wink'. Again, experiment with this value a bit to get a good value for your application.

The COUNTER_* and TOTAL_* variables holds the consecutive 'eye closed' frames and the total number of 'winks' respectively.

With the variables ready, we start detecting the face, and its landmarks, to isolate the eyes.
 detector = dlib.get_frontal_face_detector()  
   
 predictor = dlib.shape_predictor(PREDICTOR_PATH)  
   
 # Start capturing the WebCam  
 video_capture = cv2.VideoCapture(0)  
   
 while True:  
   ret, frame = video_capture.read()  
   
   if ret:  
     gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  
   
     rects = detector(gray, 0)  
   
     for rect in rects:  
       x = rect.left()  
       y = rect.top()  
       x1 = rect.right()  
       y1 = rect.bottom()  
   
       landmarks = np.matrix([[p.x, p.y] for p in predictor(frame, rect).parts()])  
   
       left_eye = landmarks[LEFT_EYE_POINTS]  
       right_eye = landmarks[RIGHT_EYE_POINTS]  

Before we calculate the EAR values of the eyes, we want to draw the outline of the eyes. We use the convexHull and drawContours functions of OpenCV for this.
 left_eye_hull = cv2.convexHull(left_eye)  
 right_eye_hull = cv2.convexHull(right_eye)  
 cv2.drawContours(frame, [left_eye_hull], -1, (0, 255, 0), 1)  
 cv2.drawContours(frame, [right_eye_hull], -1, (0, 255, 0), 1)  

Now, we calculate the EAR for each eye, and draw the value on the video frame.
 ear_left = eye_aspect_ratio(left_eye)  
       ear_right = eye_aspect_ratio(right_eye)  
   
       cv2.putText(frame, "E.A.R. Left : {:.2f}".format(ear_left), (300, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 255), 2)  
       cv2.putText(frame, "E.A.R. Right: {:.2f}".format(ear_right), (300, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 255), 2)  




With the EAR values of each eye calculated, we check whether the value for each eye has gone below the threshold we set earlier, for the minimum duration of frames we set, and increase the counters if it has done so.
     if ear_left < EYE_AR_THRESH:  
         COUNTER_LEFT += 1  
     else:  
         if COUNTER_LEFT >= EYE_AR_CONSEC_FRAMES:  
             TOTAL_LEFT += 1  
             print("Left eye winked")  
         COUNTER_LEFT = 0  
   
     if ear_right < EYE_AR_THRESH:  
         COUNTER_RIGHT += 1  
     else:  
         if COUNTER_RIGHT >= EYE_AR_CONSEC_FRAMES:  
             TOTAL_RIGHT += 1  
             print("Right eye winked")  
         COUNTER_RIGHT = 0  
   
 cv2.putText(frame, "Wink Left : {}".format(TOTAL_LEFT), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 255), 2)  
 cv2.putText(frame, "Wink Right: {}".format(TOTAL_RIGHT), (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 255), 2)  
   
 cv2.imshow("Faces found", frame)  

Check out the video below to see the Wink Detector in action,



Here's the complete code to get you started,
 import numpy as np  
 import cv2  
 import dlib  
 from scipy.spatial import distance as dist  
   
 PREDICTOR_PATH = "path to your shape_predictor_68_face_landmarks.dat file"  
   
 FULL_POINTS = list(range(0, 68))  
 FACE_POINTS = list(range(17, 68))  
   
 JAWLINE_POINTS = list(range(0, 17))  
 RIGHT_EYEBROW_POINTS = list(range(17, 22))  
 LEFT_EYEBROW_POINTS = list(range(22, 27))  
 NOSE_POINTS = list(range(27, 36))  
 RIGHT_EYE_POINTS = list(range(36, 42))  
 LEFT_EYE_POINTS = list(range(42, 48))  
 MOUTH_OUTLINE_POINTS = list(range(48, 61))  
 MOUTH_INNER_POINTS = list(range(61, 68))  
   
 EYE_AR_THRESH = 0.25  
 EYE_AR_CONSEC_FRAMES = 3  
   
 COUNTER_LEFT = 0  
 TOTAL_LEFT = 0  
   
 COUNTER_RIGHT = 0  
 TOTAL_RIGHT = 0  
   
 def eye_aspect_ratio(eye):  
   # compute the euclidean distances between the two sets of  
   # vertical eye landmarks (x, y)-coordinates  
   A = dist.euclidean(eye[1], eye[5])  
   B = dist.euclidean(eye[2], eye[4])  
   
   # compute the euclidean distance between the horizontal  
   # eye landmark (x, y)-coordinates  
   C = dist.euclidean(eye[0], eye[3])  
   
   # compute the eye aspect ratio  
   ear = (A + B) / (2.0 * C)  
   
   # return the eye aspect ratio  
   return ear  
   
 detector = dlib.get_frontal_face_detector()  
   
 predictor = dlib.shape_predictor(PREDICTOR_PATH)  
   
 # Start capturing the WebCam  
 video_capture = cv2.VideoCapture(0)  
   
 while True:  
   ret, frame = video_capture.read()  
   
   if ret:  
     gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  
   
     rects = detector(gray, 0)  
   
     for rect in rects:  
       x = rect.left()  
       y = rect.top()  
       x1 = rect.right()  
       y1 = rect.bottom()  
   
       landmarks = np.matrix([[p.x, p.y] for p in predictor(frame, rect).parts()])  
   
       left_eye = landmarks[LEFT_EYE_POINTS]  
       right_eye = landmarks[RIGHT_EYE_POINTS]  
   
       left_eye_hull = cv2.convexHull(left_eye)  
       right_eye_hull = cv2.convexHull(right_eye)  
       cv2.drawContours(frame, [left_eye_hull], -1, (0, 255, 0), 1)  
       cv2.drawContours(frame, [right_eye_hull], -1, (0, 255, 0), 1)  
   
       ear_left = eye_aspect_ratio(left_eye)  
       ear_right = eye_aspect_ratio(right_eye)  
   
       cv2.putText(frame, "E.A.R. Left : {:.2f}".format(ear_left), (300, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 255), 2)  
       cv2.putText(frame, "E.A.R. Right: {:.2f}".format(ear_right), (300, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 255), 2)  
   
       if ear_left < EYE_AR_THRESH:  
         COUNTER_LEFT += 1  
       else:  
         if COUNTER_LEFT >= EYE_AR_CONSEC_FRAMES:  
           TOTAL_LEFT += 1  
           print("Left eye winked")  
         COUNTER_LEFT = 0  
   
       if ear_right < EYE_AR_THRESH:  
         COUNTER_RIGHT += 1  
       else:  
         if COUNTER_RIGHT >= EYE_AR_CONSEC_FRAMES:  
           TOTAL_RIGHT += 1  
           print("Right eye winked")  
         COUNTER_RIGHT = 0  
   
     cv2.putText(frame, "Wink Left : {}".format(TOTAL_LEFT), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 255), 2)  
     cv2.putText(frame, "Wink Right: {}".format(TOTAL_RIGHT), (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 255), 2)  
   
     cv2.imshow("Faces found", frame)  
   
   ch = 0xFF & cv2.waitKey(1)  
   
   if ch == ord('q'):  
     break  
   
 cv2.destroyAllWindows()  
   
Once you detect the wink, then it's just a matter of assigning an action to it (just like real life ;) )

Thanks Shirish Ranade for the idea of wink detection, and Adrian at PyImageSearch for the excellent tutorial that helped get me started.

Related posts:
http://www.codesofinterest.com/2017/04/extracting-individual-facial-features-dlib.html
http://www.codesofinterest.com/2016/10/getting-dlib-face-landmark-detection.html

Related links:
http://www.pyimagesearch.com/2017/04/24/eye-blink-detection-opencv-python-dlib/

Build Deeper: Deep Learning Beginners' Guide is the ultimate guide for anyone taking their first step into Deep Learning.

Get your copy now!

1 comment:

  1. Unable to open path to your shape_predictor_68_face_landmarks.dat file

    ReplyDelete