Projects:2017s1-100 Face Recognition using 3D Data

From Projects
Revision as of 18:22, 29 October 2017 by A1648351 (talk | contribs) (3D data from Xbox Kinect and Pre-processing)
Jump to: navigation, search

Introduction

This project seeks to develop a system that is capable of recognising faces captured using commercial off-the-shelf devices. It will be able to capture depth imagery of faces and align them to a common facial pose, before using them to perform recognition. The project will involve elements of literature survey (both sensor hardware and algorithmic techniques), software development (in Matlab), data collection, and performance comparison with existing approaches.

Objectives

Develop a system that is capable of recognizing faces captured using commercial off-the-shelf devices such as the Xbox Kinect.

  • Recovery of 3D data from polarimetric imagery
  • Recovery of 3D data from Xbox Kinect and alignment to common pose
  • Facial recognition from 3D models

Project Team

Jesse Willsmore

Orbille Piol

Michael Sadler

Supervisors

Dr Brian Ng

Dr David Booth (DST Group)

Sau-Yee Yiu (DST Group)

Philip Stephenson (DST Group)

3D data from Polarimetry

3D data from Xbox Kinect and Pre-processing

This section involves the creation of the software responsible for interfacing the newest Kinect camera (Kinect V2) hardware with MATLAB software on a computer. It included the canonical preprocessing techniques done from the 3D depth point cloud data resulting to a 3D facial image aligned to a frontal view pose for face recognition.

Method & Results

Kinect and Eurecom Database Interface

The acquisition of the image using the Kinect V2 was done first through connecting the Kinect camera with the MATLAB software. The subject was set up to have a distance no further than 1 metre away from the camera. The reasoning behind this is to allow the depth resolution to work around the 1.5mm depth resolution accuracy of a Kinect Camera. The image acquisition toolbox present in Matlab allows the user to acquire three sets of possible data, a 2D RGB image, a grayscale depth map or a RGB-D point cloud. This thesis worked with the RGB-D point cloud to be consistent with the data acquired from the available database. An added bonus is that Matlab has already aligned the coloured image with its subsequent grayscale point cloud and thus preventing any misalignment issues presented by the difference in camera locations. The figure below displays the result of the image acquisition step for the Kinect camera KinectImage.JPG


The Eurecom database was used as it already provides the array for the creation of point clouds with each subject and is the current database used for the facial recognition software. Another reason is due to the point cloud already been processed in a way that the background and any possible outliers from the front of the subject have been separated from the needed point cloud which in this case is the person being captured. The image below shows the three different poses the subjects mad which shows the initial processing done on the database already. The yellow and blue areas from the image of the point cloud represents the outliers, both the background and the frontal outliers. The main area of work is the green shaded area of the point cloud. The interface is currently working with images with no major occlusions, however eyeglasses are an exception. The reasoning behind this is to allow the nose-tip detection and face cropping stages described ahead with relative ease. EURECOMInitial.jpg


Nose Tip Detection

Nose-tip detection for the point cloud acquired from the Kinect data is fully automatic. It is done through the use of the Viola - Jones algorithm. The algorithm uses the already aligned coloured image to detect the face initially. The image is cropped on the detected face and is put through Viola - Jones again to acquire the nose of the face. This reduces the incidence of false detection and improves the accuracy of the nose-tip detection. The point cloud is then cropped depending on the detected nose, and the closest point to the camera is selected as the nose-tip. Although, this way of nose-tip detection may provide only a rough location of the nose-tip, the nose-tip point is only needed for the purpose of a finding a rough centre point of the face for face cropping. The subsequent face cropping algorithm is robust enough that it will work as long as the nose-tip detected is within a few points from the actual nose - tip.Due to the Viola - Jones algorithm not working for tilted heads, the detection for tilted head is done by first rotating the image point cloud until a face is detected. To improve robustness the algorithm also checks if the other necessary features which is needed for the face cropping is detected. As the location of the head has barely changed with the rotation of the image it is assumed that the rotated image detected face is at the same place as the tilted head and that the nose tip is within the same area as the nose point cloud. The figures below summarizes the steps in determining the nose tip point for the Kinect camera both with frontal and tilted heads.

CroppingKinect.JPG TiltedHeadNose.JPG


With outliers and the background already removed from the point cloud acquired from the database an automatic nose-tip method can be done albeit a little different with the Kinect camera captured data. One of the major issues is the misalignment of the coloured images present in the database compared to the point cloud data and thus the same method is not viable. Nose-tip detection is done by utilizing the vertical facial symmetry of the average human face, by first finding the closest points to the camera for frontal poses or by finding the smallest or largest point values in X depending on left or right pose and creating a histogram with Y values. This is followed by disregarding points not around the middle section of the head and allowing only points that are within two thirds distance from the centre. The correct nose tip is determined by choosing the closest point to the camera for frontal poses or the leftmost or rightmost point of the face for profile photos within the now enclosed area. The two thirds distance leeway is to account for any additional outlier such as beards or glasses and also for any additional pose differences that has not been taken into account. This method is highlighted by figure underneath.

NoseTipDetectionEurecom.JPG



3D Face Cropping

The point cloud is put through a face cropping algorithm which only includes the important features such as the eyes, nose and the mouth while also removing the hair and ears from the point cloud system. The algorithm is initialized by assuming that the nose tip is at the center of the face. This is also necessary for the canonical preprocessing techniques especially the symmetric filling section. This is done by putting the nose-tip point acquired on the previous step and normalizing the X and Y values of the point cloud to place the nose tip on the origin point, while leaving the depth data untouched. The radius of the cropping is set up so that it will remove the ears, the neck and any excess hair collected from the top of the head leaving only a section of the forehead, eyes nose, mouth and a section of the chin.

The Eurecom database had all of the subjects at a same distance from the camera for all of the photos and for each session. This meant that the radius for the face cropping was consistent enough and the same throughout the whole database. The radius was selected through a trial and error basis, and the optimal radius was found to be along 80 geometric points from the nose-tip detected. The Kinect camera on the other hand did not have this luxury. Instead the Viola Jones algorithm is adjusted to find additional features, not only the nose. When possible, the eyes and the mouth were also located. Using the vertical facial symmetry of an average face and the cropped point clouds specific to the mouth, nose and the eyes, the radius of the face is found even with varying distances from the Kinect camera. The point cloud is also arranged in the order to which the width of the face is along the X axes, and the height of the face on the Y. The results are seen underneath.

KinectResultCropped.PNG EURECOMDATA.jpg



= Pose Correction

Pose correction is done using non-rigid ICP algorithm. Using the ICP algorithm will cor- rect the pose of the original facial model to a frontal pose. The ICP algorithm is an accurate way of aligning two relatively similar point clouds. As ICP will need a large amount of iterations, registering the cropped point cloud to every feasible frontal face will be compu- tationally expensive and time consuming. Instead Mian’s et al. reference face model is used for the point registration and alignment. This reference face was created by scanning and realigning every face in the FRGC and the UWA database. These databases were created using a higher depth resolution camera as the low accuracy Kinect data will provide too many outliers and errors for the reference face. In order to use the reference face in conjunction with the query point cloud, a minimum correspondence is needed for both point clouds. This is done by first placing the nose-tip of the reference face to the centre like the query face(placing the nose-tip point at the origin) and then normalizing the data to span the same minimum and maximum values for all three coordinates of the query facial model. The now normalized reference face and query data point cloud is fed through to the ICP algorithm and will continue to iterate until the user set threshold has been met. The reasoning behind this is that poses that are not full frontal will have certain data points on the face which is rather far from the reference face. To counteract this, the ICP is done in spheres of 15 mm radius starting from the nose tip. A sphere of 15 points radius starting from the nose-tip is necessary because in using the reference face regardless of the portrait view or the frontal view, the nose tip and its surrounding area, the distance between the views however relatively small is important so similarity is still present. The ICP is done six consecutive times while doubling the radius of the sphere where the ICP algorithm occurs and by the fifth time the whole frontal reference face is aligned with the query point cloud. The sixth and final iteration of the ICP algorithm is done to provide completeness of the alignment between the frontal and reference face Another method of pose correction is to correct the left and right tilting of the head causing misalignment issues. This was only done on the Kinect camera database due to the Eurecom database not having any left or right face tilted subjects. This was pose corrected using the Viola - Jones algorithm to find the angle to rotate the point cloud and through a rotation matrix aligning the query face to a frontal pose. As mentioned above the Viola - Jones algorithm can only detect faces which are frontal and upright. The image with the tilted face is ran through the algorithm and rotated until a face is detected. To improve robustness the algorithm is ran through the detected face with the idea of detecting the eyes, nose, and mouth features. The image is rotated until the face and all of the features are detected. The angle of rotation is then passed along a rotation matrix to transform and rotate the query point cloud. The reference face and pose corrected images are available below.

Referenceface.jpg PoseCorrectedEurecom.JPG PoseCorrectedKinect.JPG



Symmetric Filling

Symmetric filling will be done in order to fill any holes or missing data from non-frontal views, specifically the profile posed subjects. The symmetric filling is done by first creating a mirrored point cloud of all the points in the original point cloud but instead with an inverted X values (-X), to then insert into the original point cloud. It is noted that symmetric filling is only done in order to fill up missing data and not to add any further outliers to the data. This was prevented by allowing only a mirrored point to be included into the original point cloud if it is within a certain amount of threshold distance. It was found that if there were any points within 2 geometric points away from each other, the mirrored point is not added into the point cloud. This is the perfect fit as it corresponds to within a 2 mm radius distance on the face. This is because having too large of a threshold may cause the point cloud to be too noisy while too less will not fill up the missing data. Symmetric filling is only necessary for the profile point clouds collected in the Eurecom database. The symmetric filling section is seen below

SymmetricFilling.JPG


=== Mesh Fitting and Resampling The final step in the preprocessing algorithm is to re-sample and fit a mesh on the point cloud data. Re-sampling is done in order to convert the point cloud into a Z axes depth map while also stabilizing the noise in the point cloud data. Another reason is to down-sample the point cloud data into a much smaller output to save time for the facial recognition software as well as to make it less computationally expensive. The resampling was done using a publicly available code gridfit which fits the point cloud data with a smooth mesh using an approximation approach. This algorithm was used due to its ability to set a constraint on how much smoothing is done, as too much smoothing will bend the results of the previous stages while too less smoothing will have noticeable presence of noise and outliers. It also had the ability to change the interpolation method which can reduce additional noise without removing any of important features of the face/ Resampling was also done within the available minimum and maximum X and Y values. This was done in order to align the face to a 2D grid with the X values set as the width and the Y values set as the height of the face. Additionally a background of points was added to the point cloud in order to fully distinguish the edges of the facial model in the depth map. The overall results of this section is summarised on the images underneath.


FrontKinect.JPG TiltKinect.JPG FrontWhole.JPG LeftWhole.JPG RightWhole.JPG

Facial recognition from 3D models

The proposed method for face recognition utilises sparse representation and is designed to be robust under occlusion and different facial expressions.

Method

A dictionary is built for each subject which is made up of that subject's training samples. These dictionaries can be utilised to identify a test sample by exploiting the assumption, that for each subject, these dictionaries will lie on a linear subspace in order to perform classification.

Results

L1 differences2.PNG

File:File:Estimated sparse error.PNG