To study the preference of infants for contingency of movements and familiarity of faces during self-recognition task, we built, as an accurate and instantaneous imitator, a real-time face- swapper for videos. We present a non-constraint face-swapper based on 3D visual tracking that achieves real-time performance through parallel computing. Our imitator system is par- ticularly suited for experiments involving children with Autistic Spectrum Disorder who are often strongly disturbed by the constraints of other methods.
Human neonates detect contingency between their movements and what they see (Rochat, 2009) but they cannot discriminate their own image from that of another infant before 5 months of age. Only by 18 months can they recognize themselves in a mirror. The 6 to18 months period is a decisive developmental stage.
Behavioral studies have shown that 9-month-olds display a preference for familiar faces similar to themselves (Sanefuji et al., 2006), but also that 5-month-olds show differential visual fixation to a contingent video (Bahrick and Watson, 1985).
We propose to compare the contingency of movements and familiarity of faces factors in self-recognition in an experiment where an imitator reproduces the head, arms and body movements with or without delay. The imitator’s face may be identical to the subject’s face or look different. We thus developed a face-swapper for videos that detects the face position and orientation of the current subject A, then superimposes the image of a subject B on A’s face (fig. 1 where A and B’s faces are identical). To avoid disturbing the subject’s behavior or appearance, we did not use special markers. The only installa-tion was a camera. Our non-constraint real-time system is an integration of existing 3D head posture trackers, with an original face-swapper in videos. The very short delay of the face swapper has been reached thanks to parallel computing including the General-Purpose computing on Graphics Processing Units (GPGPU). The novelty of this work also lies in its easy calibration.
The overall system (fig. 2) includes a head tracker to determine the head position and orientation of the current subject A, and a face swapper to replace the face of A by that of subject B. Its calibration only uses frontal face pictures of subjects A and B, and the camera video as inputs.
Devices measuring the head’s pose such as magnetic sensors, link mechanisms or motion capture unfortunately alter the subjects’ behavior or their natural appearance. As non-invasive methods, faceAPI, sparsetemplate-matching-based object tracking (Matsubara and Shakunaga, 2004) and CAMSHIFT solutions exist. However they either are commercial systems where the information needed to adapt it for children and extend it to a face-swapper could be inaccessible, or they lack robustness. Matsumoto et al. (2009) propose an estimation of the 6-DOF motion of the face using a single camera, but require the heavy set-up of a personal 3D facial model. Lozano and Otsuka (2009) present a real-time visual tracker by stream processing and particle filter using a generic 3D model of the face. Our head tracker also adopts this approach to estimate the state x = (T x , T y , T x dot , T y dot , S, R x , R y , R z , R y dot , α) where T x , T y are the translation coordinates of the target object, T x dot , T y dot are the velocity along the horizontal x and vertical y axes, S is the scale, R x , R y , R z are the rotations along each axis, R y dot is the velocity of the rotation along the vertical axis y, and α is a global illumination variable.
Our tracker relies on multi-processing and sparsetemplate-based particle filtering. No 3D face model was used, but a simpler ellipsoid model. The real-time constraint has been kept thanks to the parallel processing of the camera capture, head-tracking, face-swapping and results-display threads. Moreover the computation of the particle filter was speeded up by the use of GPGPU and NVIDIA CUDA.
Once x A the face position and orientation of A is detected, an image of B is superimposed on A’s face.
Replacement of whole faces in still images has been developed only recently (Zhu et al. (2009), Bitouk et al. (2008)). However, we target videos with real-time constraints, continuity and movement factors.
Our system first creates automatically a set of replacement faces of subject B and tags them with the position and orientation x. The face-swapper thread compares the state parameters x A with those of the replacement faces of B. It selects the closest face replacement x B to superimpose on A’s face. To render the temporal continuity, the replacement face is interpolated before superimposition, so that the replacement looks dynamic. We obtain a whole system for automatically replacing faces in videos, that renders dynamic movements of the head.
A demonstration video can be found on the site http: //www.youtube.com/watch?v=qtYl4o4QoIo.
The real-time constraint was the greatest challenge. The use of GPGPU and parallel processing decreased the delay to 99ms. In addition, our face-swapper is robust against background distractors such as other faces in the background (fig. 1). The head-tracker can detect a wide range of face orientations with a head pitch angle up to 70 degrees and is robust against partial occlusion like when children bring their toys or hands to their mouth or faces.
The system was evaluated against motion capture system, with an adult subject moving at normal speed, and
This content is AI-processed based on open access ArXiv data.