Image fragments are formed within areas corresponding to circles which are searched from an inputted image. In a cascade formed with the same kind of classifiers, each classifier classifies input vectors corresponding to the image fragments into facial types and non-facial types. Said processing is performed for all images of an image pyramid including a plurality of images, and based on the result of the processing for all the images, coordinates for detected faces are calculated.