Face Tracking

Introduction to 3D Face Tracking and Its Applications

Three-dimensional (3D) human face tracking is a generic problem that has been receiving considerable attention in computer vision community. The main goal of 3D face tracking is to estimate some parameters of human faces from video frames: i) 6 Degrees of Freedom (DOF) – consists of the 3D translation and three axial rotation of a person’s head relative to the camera view. As commonly used in the literature, we adopt three terms Yaw (or Pan), Pitch (or Tilt) and Roll for the three axial rotations. The Yaw orientation is computed when, for example, rotating the head from right to left. The Pitch orientation is related to the movement of the head from forward to backward. The Roll orientation is when bending the head from left to right. The 6 DOF are considered as rigid parameters. ii) The non-rigid parameters describes the facial muscle movements or facial animation, which are usually the early step to recognize the facial expression, such as: happy, sad, angry, disgusted, surprise, and fearful, etc. Indeed, the consideration for non-rigid parameters is often represented in form of detecting and tracking facial points. These points are acknowledged as fiducial points, feature points or landmarks in face processing community. The word ”indexing” in our report means that rigid and non-rigid parameters are estimated from video frames. Our aim is to read a video (or from a webcam) capturing the single face (In case of multiple faces, the big face is selected) and the output is parameters (rigid and non-rigid) of video frames. Indeed, the non-rigid parameters are difficult to be represented because it depends on the application. In our study, they are represented indirectly as localizing or detecting feature points on the face.

There are several potential applications in many domains which use face tracking. The most popular one is to recognize facial behaviors in order to support for an automatic system of human communication understanding. In this context, the visual focus of attention of a person is a very important key to recognize. It is a nonverbal communication way or an indicative signal in a conversation. For this problem, we have to analyze first the head pose to determine the direction where people are likely looking at in video sequences. It makes sense that people may be focusing on someone or something while talking. Furthermore, there are the important meanings in head movements as a form of gesturing in a conversation. For example, the head nodding or shaking indicates that they understand and misunderstand or agree and disagree respectively to what is being said. Emphasized head movements are a conventional way of directing someone to observe a particular object or location. In addition, the head pose is intrinsically linked with the gaze, or head pose indicate a coarse estimation of gaze in situations of invisible eyes such as low-resolution imagery, very low-bit rate video recorders, or eye-occlusion due to sunglasses-wearing. Even when the eyes are visible, the head pose supports to predict more accurately the gaze direction. There are other gestures that are able to indicate dissent, confusion and consideration, etc. Facial animation analysis is also necessary to be able to read what kind of expression people are exposing. The facial expression is naturally occurred in human communication, it is one of the most cogent means for human beings to infer the attitude and emotions of other persons in the vicinity. The expression analysis, which requires facial animation detection, is an crucial topic not only in machine vision but also psychology.

to be continued…..