Face detection, because of its vast array of applications, is one of the most active research areas in computer vision. In this book, we review various approaches to face detection developed in the past decade, with more emphasis on boosting-based learning algorithms. We then present a series of algorithms that are empowered by the statistical view of boosting and the concept of multiple instance learning. We start by describing a boosting learning framework that is capable to handle billions of training examples. It differs from traditional bootstrapping schemes in that no intermediate thresholds need to be set during training, yet the total number of negative examples used for feature selection remains constant and focused (on the poor performing ones). A multiple instance pruning scheme is then adopted to set the intermediate thresholds after boosting learning. This algorithm generates detectors that are both fast and accurate. We then present two multiple instance learning schemesfor face detection, multiple instance learning boosting (MILBoost) and winner-take-all multiple category boosting (WTA-McBoost). MILBoost addresses the uncertainty in accurately pinpointing the location of the object being detected, while WTA-McBoost addresses the uncertainty in determining the most appropriate subcategory label for multiview object detection. Both schemes can resolve the ambiguity of the labeling process and reduce outliers during training, which leads to improved detector performances. In many applications, a detector trained with generic data sets may not perform optimally in a new environment. We propose detection adaption, which is a promising solution for this problem. We present an adaptation scheme based on the Taylor expansion of the boosting learning objective function, and we propose to store the second order statistics of the generic training data for future adaptation. We show that with a small amount of labeled data in the new environment, the detector'sperformance can be greatly improved. We also present two interesting applications where boosting learning was applied successfully. The first application is face verification for filtering and ranking image/video search results on celebrities. We present boosted multi-task learning (MTL), yet another boosting learning algorithm that extends MILBoost with a graphical model. Since the available number of training images for each celebrity may be limited, learning individual classifiers for each person may cause overfitting. MTL jointly learns classifiers for multiple people by sharing a few boosting classifiers in order to avoid overfitting. The second application addresses the need of speaker detection in conference rooms. The goal is to find who is speaking, given a microphone array and a panoramic video of the room. We show that by combining audio and visual features in a boosting framework, we can determine the speaker's position very accurately. Finally, we offer our thoughts on future directions for face detection. Table of Contents: A Brief Survey of the Face Detection Literature / Cascade-based Real-Time Face Detection / Multiple Instance Learning for Face Detection / Detector Adaptation / Other Applications / Conclusions and Future Work
Les mer
Table of Contents: A Brief Survey of the Face Detection Literature / Cascade-based Real-Time Face Detection / Multiple Instance Learning for Face Detection / Detector Adaptation / Other Applications / Conclusions and Future Work
Les mer
A Brief Survey of the Face Detection Literature.- Cascade-based Real-Time Face Detection.- Multiple Instance Learning for Face Detection.- Detector Adaptation.- Other Applications.- Conclusions and Future Work.
Les mer

Produktdetaljer

ISBN
9783031006814
Publisert
2010-09-20
Utgiver
Vendor
Springer International Publishing AG
Høyde
235 mm
Bredde
191 mm
Aldersnivå
Professional/practitioner, P, 06
Språk
Product language
Engelsk
Format
Product format
Heftet

Biographical note

Cha Zhang is a Researcher in the Communication and Collaboration Systems Group at Microsoft Research (Redmond, WA). He received the B.S. and M.S. degrees from Tsinghua University, Beijing, China in 1998 and 2000, respectively, both in Electronic Engineering, and the Ph.D. degree in Electrical and Computer Engineering from Carnegie Mellon University, in 2004. His current research focuses on applying various machine learning and computer graphics/computer vision techniques to multimedia applications, in particular, multimedia teleconferencing. During his graduate studies at CMU, he worked on various multimedia related projects including sampling and compression of image-based rendering data, 3D model database retrieval and active learning for database annotation, peer-to-peer networking, etc. Dr. Zhang has published more than 40 technical papers and holds 10+ U.S. patents. He won the best paper award at ICME 2007, the top 10% award at MMSP 2009, and the best student paper award at ICME 2010. He co-authored a book titled Light Field Sampling, published by Morgan and Claypool in 2006. Dr. Zhang is a Senior Member of IEEE. He was the Publicity Chair for International Packet Video Workshop in 2002, the Program Co-Chair for the first Immersive Telecommunication Conference (IMMERSCOM) in 2007, the Steering Committee Co-Chair and Publicity Chair for IMMERSCOM 2009, the Program Co-Chair for the ACM Workshop on Media Data Integration (in conjunction with ACM Multimedia 2009), and the Poster&Demo Chair for ICME 2011. He served as TPC members for many conferences including ACM Multimedia, CVPR, ICCV, ECCV, MMSP, ICME, ICPR, ICWL, etc. He served as an Associate Editor for Journal of Distance Education Technologies, IPSJ Transactions on Computer Vision and Applications, and ICST Transactions on Immersive Telecommunications. He was a guest editor for Advances in Multimedia, Special Issue on Multimedia Immersive Technologies and Networking. Long Quan is a Professor of the Department of Computer Science and Engineering at the Hong Kong University of Science and Technology. He received his Ph.D. in Computer Science from INPL, France, in 1989. Before moving back to Hong Kong in 2001, he has been a French CNRS senior research scientist at INRIA in Grenoble. His research interests are focused on 3D reconstruction, structure from motion, vision geometry, and image-based modeling. He has served as an Associate Editor of PAMI (IEEE Transactions on Pattern Analysis and Machine Intelligence) and as a Regional Editor of IVC (Image and Vision Computing Journal). He is currently on the editorial board of IJCV (the International Journal of Computer Vision), ELCVIA (the Electronic Letters on Computer Vision and Image Analysis), MVA (Machine Vision and Applications) and Foundations and Trends in Computer Graphics and Vision. He has served as area chair for ICCV (International Conference on Computer Vision), ECCV (European Conference on Computer Vision), and CVPR (IEEE Computer Vision and Pattern Recognition) and ICPR (IAPR International Conference on Pattern Recognition). He was a Program Chair of ICPR 2006 Computer Vision and Image Analysis, and is a General Chair of ICCV 2011 in Barcelona.