Tuesday, February 27, 2018

Track any object in a video with Dlib Correlation Trackers

Training an object detector is bit of a complicated task. You need to have a proper training dataset with the relevant bounding boxes, and then use something like a HOG feature extractor with a SVM classifier for the detection - such as the Dlib Object Detection classes (link).

But that's a lot of work if you just need to track an object across a limited number of frames, or just need to detect motion or direction of movement. For that, we can easily use the Correlation Trackers feature in Dlib.

Object Tracking
Object Tracking

See it in action,

Object Tracking in Action - Animated
Object Tracking in Action

Correlation Trackers - as their name suggests - works by correlating a set of pixels from one frame to the next.

Let's see how to build it.

First, we need a way to select which object to track in the video stream. We'll use a mouse event listener so we can select the area using the mouse.

# this variable will hold the coordinates of the mouse click events.
mousePoints = []

def mouseEventHandler(event, x, y, flags, param):
    # references to the global mousePoints variable
    global mousePoints

    # if the left mouse button was clicked, record the starting coordinates.
    if event == cv2.EVENT_LBUTTONDOWN:
        mousePoints = [(x, y)]

    # when the left mouse button is released, record the ending coordinates.
    elif event == cv2.EVENT_LBUTTONUP:
        mousePoints.append((x, y))

# create a named window in OpenCv and attach the mouse event handler to it.
cv2.namedWindow("Webcam stream")
cv2.setMouseCallback("Webcam stream", mouseEventHandler)

We create a named window object in OpenCV and assign the mouseEventHandler function to it. This function will store the coordinates of your selection (click-drag-and-release).

We setup the VideoCapture and the Correlation Tracker objects, along with the 'tracked' variable which indicates whether objects are tracked or not.

# create the video capture.
video_capture = cv2.VideoCapture(0)

# initialize the correlation tracker.
tracker = dlib.correlation_tracker()

# this is the variable indicating whether to track the object or not.
tracked = False

We setup the mail program loop as usual,

while True:
    # start capturing the video stream.
    ret, frame = video_capture.read()

    if ret:
        image = frame

In the main loop, we will check whether a selection has been made, and draw the selection rectangle.

# if we have two sets of coordinates from the mouse event, draw a rectangle.
        if len(mousePoints) == 2:
            cv2.rectangle(image, mousePoints[0], mousePoints[1], (0, 255, 0), 2)
            dlib_rect = dlib.rectangle(mousePoints[0][0], mousePoints[0][1], mousePoints[1][0], mousePoints[1][1])

Also in the loop, we setup some keyboard events to start, and reset the tracking.

        # show the current frame.
        cv2.imshow("Webcam stream", image)

    # capture the keyboard event in the OpenCV window.
    ch = 0xFF & cv2.waitKey(1)

    # press "r" to stop tracking and reset the points.
    if ch == ord("r"):
        mousePoints = []
        tracked = False

    # press "t" to start tracking the currently selected object/area.
    if ch == ord("t"):
        if len(mousePoints) == 2:
            tracker.start_track(image, dlib_rect)
            tracked = True
            mousePoints = []

    # press "q" to quit the program.
    if ch == ord('q'):

Pressing the 't' in the keyboard will give the current frame and the selection to the correlation tracker, and initialize the tracking.

Pressing 'r' will reset the tracking, and 'q' will quit the program.

Back in the main loop, we check whether tracking is in progress, and update the correlation tracker with the current frame, and get the location of the object.

        # tracking in progress, update the correlation tracker and get the object position.
        if tracked == True:
            track_rect = tracker.get_position()
            x  = int(track_rect.left())
            y  = int(track_rect.top())
            x1 = int(track_rect.right())
            y1 = int(track_rect.bottom())
            cv2.rectangle(image, (x, y), (x1, y1), (0, 0, 255), 2)

tracker.update(image) will give the latest frame to the correlation tracker, and tracker.get_position() will return the coordinates of the object/area we're tracking.

Check out the video to see the code in action:

You can get the complete code at this GitHub Link.

Build Deeper: Deep Learning Beginners' Guide is the ultimate guide for anyone taking their first step into Deep Learning.

Get your copy now!

1 comment:

  1. What if we need to apply it for real application?
    here we track an object by using mouse event. I think we have to use a detector first and then provide the coordinate of the bounding box for the tracker will do all this things.