In Your Face! Figuring Out Apple’s Face Detection API

Derek AndreMay 5th, 2016Last Updated: May 3rd, 2016

0 148 10 minutes read

I am making a native iOS app that has face detection. Apple has an awesome image detection API that can find faces, barcodes, and even rectangular shapes in images or video frames. The API came out with iOS 5.0, but I thought that an updated example with Swift 2.2 and Xcode 7.3 would hopefully help people out.

The code, located at https://github.com/dcandre/face-it, will allow you to view a video feed from your iOS device’s camera and superimpose a Storyboard file on top of a preview view at the left and right eye positions.

I am going to assume you can create a Single View Application in Xcode. My code restricts the device orientation to portrait mode. I have created a group in the Project Navigator called Video-Capture. In that group, you can create the file VideoCaptureController.swift.

VideoCaptureController Class

import Foundation
import UIKit

class VideoCaptureController: UIViewController {
    var videoCapture: VideoCapture?
    
    override func viewDidLoad() {
        videoCapture = VideoCapture()
    }
    
    override func didReceiveMemoryWarning() {
        stopCapturing()
    }
    
    func startCapturing() {
        do {
            try videoCapture!.startCapturing(self.view)
        }
        catch {
            // Error
        }
    }
    
    func stopCapturing() {
        videoCapture!.stopCapturing()
    }
    
    @IBAction func touchDown(sender: AnyObject) {
        let button = sender as! UIButton
        button.setTitle("Stop", forState: UIControlState.Normal)
        
        startCapturing()
    }
    
    @IBAction func touchUp(sender: AnyObject) {
        let button = sender as! UIButton
        button.setTitle("Start", forState: UIControlState.Normal)
        
        stopCapturing()
    }    
}

The magic that captures the video and performs the face detection will be encapsulated in the VideoCapture class, which we will create next. For now, we will assume the interface for the VideoCapture class will have two methods startCapturing and stopCapturing. Notice the two action methods. When a user pushes the button the video capture will start and when they lift up on the button the video capture will stop. Like Snapchat, Instragram, Vine or other video capturing apps. You can check out the storyboard in my code, but feel free to create your own interface to start and stop the video capture.

The viewDidLoad and didReceiveMemoryWarning methods from the UIViewController class are overwritten. These will be used to instantiate our video capture object and to stop it from capturing if we have memory warnings.

Go into your storyboard and select your view controller. In the Identity Inspector, change the custom class to your VideoCaptureController file. I used Touch Down, Touch Up Inside, and Touch Up Outside events to attach to the view controller’s action methods.

Before I talk about the VideoCapture class, I want to summarize Apple’s video capture process. To capture images or video from your iOS device’s camera you use the AVFoundation framework. The AVCaptureSession class couples inputs, like a camera, and outputs, like saving to an image file. We will use an output called AVCaptureVideoDataOutput. This will capture frames from a video and allow us to see what the camera sees.

Go ahead and create a file in the VideoCapture group called VideoCapture.swift. The completed VideoCapture class can be found on GitHub. Here is the class declaration:

VideoCapture Class

import Foundation
import AVFoundation
import UIKit
import CoreMotion
import ImageIO

class VideoCapture: NSObject, AVCaptureVideoDataOutputSampleBufferDelegate {
    var isCapturing: Bool = false
    var session: AVCaptureSession?
    var device: AVCaptureDevice?
    var input: AVCaptureInput?
    var preview: CALayer?
    var faceDetector: FaceDetector?
    var dataOutput: AVCaptureVideoDataOutput?
    var dataOutputQueue: dispatch_queue_t?
    var previewView: UIView?
    
    enum VideoCaptureError: ErrorType {
        case SessionPresetNotAvailable
        case InputDeviceNotAvailable
        case InputCouldNotBeAddedToSession
        case DataOutputCouldNotBeAddedToSession
    }
    
    override init() {
        super.init()
        
        device = VideoCaptureDevice.create()
        
        faceDetector = FaceDetector()
    }
    
    func startCapturing(previewView: UIView) throws {
        isCapturing = true
        
        self.previewView = previewView
        
        self.session = AVCaptureSession()
        
        try setSessionPreset()
        
        try setDeviceInput()
        
        try addInputToSession()
        
        setDataOutput()
        
        try addDataOutputToSession()
        
        addPreviewToView(self.previewView!)
        
        session!.startRunning()
    }
    
    func stopCapturing() {
        isCapturing = false
        
        stopSession()
        
        removePreviewFromView()
        
        removeFeatureViews()
        
        preview = nil
        dataOutput = nil
        dataOutputQueue = nil
        session = nil
        previewView = nil
    }
}

This class needs to inherit from NSObject, because the AVCaptureVideoDataOutputSampleBufferDelegate protocol inherits from NSObjectProtocol. NSObject takes care of implementing NSObjectProtocol. We can talk about implementing captureOutput for AVCaptureVideoDataOutputSampleBufferDelegate later.

I have created an enumeration so users of this class can catch specific errors. I will talk about the device and faceDetector objects later in the overwritten init method. I have created the startCapturing and stopCapturing methods, but we have not implemented all of the methods that they call. We will go through all of them and then implement the VideoCaptureDevice and FaceDetector classes.

Session

When you want to capture video or an image from an iOS device camera, you need to instantiate a AVCaptureSession class. We assign the variable session, in the startCapturing method. Then we call the setSessionPreset method. This should be added to your VideoCapture class.

private func setSessionPreset() throws {
    if (session!.canSetSessionPreset(AVCaptureSessionPreset640x480)) {
        session!.sessionPreset = AVCaptureSessionPreset640x480
    }
    else {
        throw VideoCaptureError.SessionPresetNotAvailable
    }
}

This checks to see if the camera can capture video at a 640×480 resolution. If not, then it will throw an error. I am using a resolution of 640×480, but you can use other resolutions. Here is a list of them. Now that we have a AVCaptureSession object we can start adding input and output classes.

Input Device

We are going to add two functions to our VideoCapture class. The setDeviceInput method instantiates a AVCaptureDeviceInput class. This will handle the input device’s ports and allow you to use the camera on your iOS device.

private func setDeviceInput() throws {
    do {
        self.input = try AVCaptureDeviceInput(device: self.device)
    }
    catch {
        throw VideoCaptureError.InputDeviceNotAvailable
    }
}

private func addInputToSession() throws {
    if (session!.canAddInput(self.input)) {
        session!.addInput(self.input)
    }
    else {
        throw VideoCaptureError.InputCouldNotBeAddedToSession
    }
}

Data Output

We have session and input classes, but now we want to capture the video frames for face detection. We will add another method to the VideoCapture class.

private func setDataOutput() {
    self.dataOutput = AVCaptureVideoDataOutput()
    
    var videoSettings = [NSObject : AnyObject]()
    videoSettings[kCVPixelBufferPixelFormatTypeKey] = Int(CInt(kCVPixelFormatType_32BGRA))
    
    self.dataOutput!.videoSettings = videoSettings
    self.dataOutput!.alwaysDiscardsLateVideoFrames = true
    
    self.dataOutputQueue = dispatch_queue_create("VideoDataOutputQueue", DISPATCH_QUEUE_SERIAL)
    
    self.dataOutput!.setSampleBufferDelegate(self, queue: self.dataOutputQueue!)
}

The AVCaptureVideoDataOutput class will allow us to process the uncompressed frames from our video feed. The videoSettings property is a dictionary with one key/value pair.kCVPixelBufferPixelFormatTypeKey is the type of format the video frames should be returned in. It is a four-character code that is converted to a integer for the AVCaptureVideoDataOutput class.

As you can imagine there will be a lot of video frames being produced by the camera. Maybe even 60/second. That is why we use the dispatch_queue_create method to create a serial queue with Grand Central Dispatch. This kind of queue will process one request at a time in the order in which they were added to the queue.

Finally a sample buffer is created for the queue. If we take too much time processing the frames the request will be removed from the queue, since we marked the property alwaysDiscardsLateVideoFrames as true.

Next, add this method to your VideoCapture class.

private func addDataOutputToSession() throws {
    if (self.session!.canAddOutput(self.dataOutput!)) {
        self.session!.addOutput(self.dataOutput!)
    }
    else {
        throw VideoCaptureError.DataOutputCouldNotBeAddedToSession
    }
}

This will add the AVCaptureVideoDataOutput class to the AVCaptureSession class.

Seeing is Believing

Add the following method to your VideoCapture class.

private func addPreviewToView(view: UIView) {
    self.preview = AVCaptureVideoPreviewLayer(session: session!)
    self.preview!.frame = view.bounds
    
    view.layer.addSublayer(self.preview!)
}

We are going to instantiate a AVCaptureVideoPreviewLayer class. This class will allow you to see the video frames from the input device. Then we are going to add this as a sublayer of the view we pass in from the VideoCaptureController. In our example it is the main UIView associated with that controller. You will notice that we are setting the frame of the layer to the bounds of the enclosing view. Basically, it is the full size of the view.

If you check out the startCapturing method in the VideoCapture class you will see that all of the methods are in place. We are definitely not done yet, though. How do we receive the video frames from the queue and how do we actually detect faces in those frames?

AVCaptureVideoDataOutputSampleBufferDelegate Protocol

There is only one method that is required for the AVCaptureVideoDataOutputSampleBufferDelegate protocol. It is captureOutput.

func captureOutput(captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, fromConnection connection: AVCaptureConnection!) {
        
    let image = getImageFromBuffer(sampleBuffer)
    
    let features = getFacialFeaturesFromImage(image)
    
    let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)
    
    let cleanAperture = CMVideoFormatDescriptionGetCleanAperture(formatDescription!, false)
    
    dispatch_async(dispatch_get_main_queue()) {
        self.alterPreview(features, cleanAperture: cleanAperture)
    }
}

We can grab the video frame from the CMSampleBuffer that is passed to this function. We grab some properties about the image and then dispatch an asynchronous request through Grand Central Dispatch. We use the main thread, dispatch_get_main_queue. You want to use the main thread when updating the app’s UI, because other requests will not happen before your request, causing errors.

Let’s add the function getImageFromBuffer to your VideoCapture class.

private func getImageFromBuffer(buffer: CMSampleBuffer) -> CIImage {
    let pixelBuffer = CMSampleBufferGetImageBuffer(buffer)
    
    let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, buffer, kCMAttachmentMode_ShouldPropagate)
    
    let image = CIImage(CVPixelBuffer: pixelBuffer!, options: attachments as? [String : AnyObject])
    
    return image
}

CMSampleBufferGetImageBuffer will return the image buffer. The attachments dictionary is populated by CMCopyDictionaryOfAttachments, which copies all of the properties of the sampleBuffer object. A Core Image object is returned.

Go ahead and add the getFacialFeaturesFromImage method to your VideoCapture class. The face detection code is encapsulated in the FaceDetector class. We will get to that later.

private func getFacialFeaturesFromImage(image: CIImage) -> [CIFeature] {
    let imageOptions = [CIDetectorImageOrientation : 6]
    
    return self.faceDetector!.getFacialFeaturesFromImage(image, options: imageOptions)
}

I have set the orientation for the detection to portrait, which is 6, since this app is locked in portrait mode. The getFacialFeaturesFromImage method on the FaceDetector class returns an array of CIFeature objects. In our case they will be the subclass CIFaceFeature. This object can tell you if eyes and a mouth are visible and where they are positioned. It will even tell you if it detects a smile. Before we create the FaceDetector class, let’s look at the alterPreview method that we asynchronously dispatch to interact with the ui.

private func alterPreview(features: [CIFeature], cleanAperture: CGRect) {
    removeFeatureViews()
    
    if (features.count == 0 || cleanAperture == CGRect.zero || !isCapturing) {
        return
    }
    
    for feature in features {
        let faceFeature = feature as? CIFaceFeature
        
        if (faceFeature!.hasLeftEyePosition) {
            
            addEyeViewToPreview(faceFeature!.leftEyePosition.x, yPosition: faceFeature!.leftEyePosition.y, cleanAperture: cleanAperture)
        }
        
        if (faceFeature!.hasRightEyePosition) {
            
            addEyeViewToPreview(faceFeature!.rightEyePosition.x, yPosition: faceFeature!.rightEyePosition.y, cleanAperture: cleanAperture)
        }
        
    }
    
}

private func removeFeatureViews() {
    if let pv = previewView {
        for view in pv.subviews {
            if (view.tag == 1001) {
                view.removeFromSuperview()
            }
        }
    }
}

private func addEyeViewToPreview(xPosition: CGFloat, yPosition: CGFloat, cleanAperture: CGRect) {
    let eyeView = getFeatureView()
    let isMirrored = preview!.contentsAreFlipped()
    let previewBox = preview!.frame
    
    previewView!.addSubview(eyeView)
    
    var eyeFrame = transformFacialFeaturePosition(xPosition, yPosition: yPosition, videoRect: cleanAperture, previewRect: previewBox, isMirrored: isMirrored)
    
    eyeFrame.origin.x -= 37
    eyeFrame.origin.y -= 37
    
    eyeView.frame = eyeFrame
}

In the alterPreview method we remove the views we are positioning over the eyes, because we reposition them on each frame. If there were no facial features found, then we will bail without doing anything to the frame. If a left or right eye is found, then we call the addEyeViewToPreview(xPosition method. This method contains a couple of methods that we need to also add to our VideoCapture class. The getFeatureView will load a Storyboard file, which I have named HeartView in my project.

private func getFeatureView() -> UIView {
    let heartView = NSBundle.mainBundle().loadNibNamed("HeartView", owner: self, options: nil)[0] as? UIView
    heartView!.backgroundColor = UIColor.clearColor()
    heartView!.layer.removeAllAnimations()
    heartView!.tag = 1001
    
    return heartView!
}

private func transformFacialFeaturePosition(xPosition: CGFloat, yPosition: CGFloat, videoRect: CGRect, previewRect: CGRect, isMirrored: Bool) -> CGRect {
    
        var featureRect = CGRect(origin: CGPoint(x: xPosition, y: yPosition), size: CGSize(width: 0, height: 0))
        let widthScale = previewRect.size.width / videoRect.size.height
        let heightScale = previewRect.size.height / videoRect.size.width
        
        let transform = isMirrored ? CGAffineTransformMake(0, heightScale, -widthScale, 0, previewRect.size.width, 0) :
            CGAffineTransformMake(0, heightScale, widthScale, 0, 0, 0)
        
        featureRect = CGRectApplyAffineTransform(featureRect, transform)
        
        featureRect = CGRectOffset(featureRect, previewRect.origin.x, previewRect.origin.y)
        
        return featureRect
    }

The getFeatureView method loads the XIB file and tags it with an integer, 1001, so that we can go back later and easily remove it with the removeFeatureViews method. The transformFacialFeaturePosition method uses CGRectApplyAffineTransform to transform coordinates from one coordinate system to another. Why do we have to do that you ask? The video is being captured at 640×480, but our preview view is in portrait mode with a different width and height, depending on how the view is rendered to fit within the window. The different views are represented by CGRect objects, videoRect and previewRect. Once we have a CGRect object that represents the eye positions in the preview view coordinate system we can attach them to the previewView as a subview.

Our VideoCapture class is looking pretty good. We can finish up by creating our two remaining classes: the VideoCaptureDevice class and the FaceDetector class.

VideoCaptureDevice Class

Create a new file called VideoCaptureDevice.swift in the VideoCapture group. Here is the complete class on GitHub. Now we have our device object for the setDeviceInput method in the VideoCapture class.

import Foundation
import AVFoundation

class VideoCaptureDevice {
    
    static func create() -> AVCaptureDevice {
        var device: AVCaptureDevice?
        
        AVCaptureDevice.devicesWithMediaType(AVMediaTypeVideo).forEach { videoDevice in
            if (videoDevice.position == AVCaptureDevicePosition.Front) {
                device = videoDevice as? AVCaptureDevice
            }
        }
        
        if (nil == device) {
            device = AVCaptureDevice.defaultDeviceWithMediaType(AVMediaTypeVideo)
        }
        
        return device!
    }
}

This class contains a static factory method that creates an instance of AVCaptureDevice. This app uses video, so we use the devicesWithMediaType method to find an array of devices of type AVMediaTypeVideo. Since we are doing face detection, I thought the front camera would be ideal. If the front camera is not found then the defaultDeviceWithMediaType method is used to return a camera that can shoot video, which will most likely be the rear camera.

FaceDetector Class

Add a file called FaceDetector.swift to the VideoCapture group. Here is the complete class on GitHub.

import Foundation
import CoreImage
import AVFoundation

class FaceDetector {
    var detector: CIDetector?
    var options: [String : AnyObject]?
    var context: CIContext?
    
    init() {
        context = CIContext()
        
        options = [String : AnyObject]()
        options![CIDetectorAccuracy] = CIDetectorAccuracyLow
        
        detector = CIDetector(ofType: CIDetectorTypeFace, context: context!, options: options!)
    }
    
    func getFacialFeaturesFromImage(image: CIImage, options: [String : AnyObject]) -> [CIFeature] {
        return self.detector!.featuresInImage(image, options: options)
    }
}

The CIDetector class is the interface we will use to detect faces in our CIImage object we created from the sampleBuffer. The CIDetectorTypeFace param is a string which will tell the CIDetector class to search for faces. One of the options for the CIDetector is CIDetectorAccuracy. We have set it to low, so that we don’t see performance issues with the amount of frames that we are going to be processing.

Final Thoughts

Remember that the iOS simulator doesn’t have a camera, so you need to test the app with a real device. When you run the app and start capturing, you should see your storyboard file superimposed over your eyes. In the future I would like to improve this code, so that it doesn’t have to create new UIView objects for every frame. It would just move existing ones or remove them when the video capturing was complete.

And that’s it! Let me know if you have any questions or comments!

Reference:

In Your Face! Figuring Out Apple’s Face Detection API from our JCG partner Derek Andre at the Keyhole Software blog.