Multi-view Image Exploration, Release 1.1 ========================================= This is the source code to the image exploration tool used in the following publications: [1] V. Ferrari, T. Tuytelaars, L. Van Gool: Simutaneous Object Recognition and Segmentation by Image Exploration, ECCV 2004 [2] V. Ferrari, T. Tuytelaars, L. Van Gool: Integrating Multiple Model Views for Object Recognition, CVPR, Vol. 2, pp. 143-153, 2004 [3] A. Thomas, V. Ferrari, B. Leibe, T. Tuytelaars, B. Schiele, L. Van Gool: Towards Multi-View Object Class Detection, CVPR, Vol. 2, pp. 1589-1596, 2006. [4] A. Thomas, V. Ferrari, B. Leibe, T. Tuytelaars, B. Schiele, L. Van Gool: Using Multi-view Recognition and Meta-data Annotation to Guide a Robot's Attention, doi:10.1177/0278364909340444 International Journal of Robotics Research, 2009. The original code (a mix of Matlab and C(++)) was programmed by T. Tuytelaars and V. Ferrari, and was ported to a full C++ implementation by A. Thomas. This source code is released "as-is". It may only be used for research purposes or personal use. For any other use, you should contact the authors. There is no warranty for fitness for a particular purpose. This code comes without support, if you have any problems with compiling/installing it, you'll have to fix them yourself. If you have any questions besides compilation errors, you can contact the authors at: alexander.thomas@esat.kuleuven.be ferrari@vision.ee.ethz.ch Version history: 1.0 (2007/01): Initial release 1.1 (2009/07): Fixed some issues with recent compilers INSTALLATION: ------------- Read the 'INSTALL' file. USAGE: ------ Once installed, the binaries will reside inside the 'bin' directory inside the installation directory. The main program is 'learnMultiViewMosaic'. Most of the other binaries perform subparts of the process described in [1, 2, 3]. Most of the programs will give short usage information when run without arguments. Some of the programs may be useless test programs, we didn't really clean up everything before releasing this. Similarly, the source code may be a bit messy in some parts. 'learnMultiViewMosaic' will search for dense multi-view correspondences as described in [3]. The arguments should be a set of images, containing different views of the same object(s). File names should be of the form objectName##suffix, where suffix doesn't contain numbers, and ## is the view ID. You can process multiple objects at the same time by using different prefixes. For instance, file names could be: object1-001.png object1-002.png object1-003.png object2-00.png object2-01.png Explicit segmentation masks can be provided in a subfolder 'maps' in the same directory as the images (masks must not be given as arguments to the program, they are detected automatically). Masks must be in PNG format, either 8-bit grayscale or RGB color, no alpha channels. For an image called 'image.ext', the mask must be called 'image-map.png', and the dimensions must match. Foreground is indicated by white in the mask image, background by black. If no segmentation mask is found, the program will try to detect a uniform background color and use it to segment figure from background. The program will produce a lot of output files, most of which are intermediate results that can be discarded after processing is completed. When aborted and re-started, the program will check for intermediate results and use them instead of recalculating (unless the -f option is used). It is assumed that consecutive view IDs are adjacent, and by default only the 2 nearest views in each direction of the current view will be matched (this can be changed with the -s option). A few other options are available to control parameters of the process: see the program's default output. The defaults should be good for images in which the object occupies an area of about 500x500 pixels. The final output is provided in a 'model_objectName.tracks' file, one such file per object. This file contains a list of all the 'tracks' found on the object, where a track is a set of regions that correspond across the different views. The structure of a .tracks file is as follows: MVT [version number] [number of views N] [number of tracks M] [int trackLength] [int viewID] [int type] [double[2] pt0] [double[2] pt1] [double[2] pt2] [double[2] maj] [double[2] min] [int viewID] [int .... A track consists of at least two regions, where viewID is the ID of the image in which the region exists. 'type' is a number indicating the type of region, which will be always 21 (ellipse), unless the parallellogram regions were activated. The remaining data describe the shape of the region: 'pt0' is the center of the region; 'pt1' and 'pt2' are the transformed coordinates of the points (1,0) and (0,1) if the region is considered an affine transform of the unit circle. 'maj' and 'min' are intended to be the points on the long, resp. short axis of the ellipse, but these values are not calculated in the program, so they should be ignored. There is no binary for the object recognition procedure described in [1, 2], because this was unneeded in [3]. Because it uses the same building blocks as the image exploration procedure, it should be straightforward to implement this, using learnmvmosaic.cpp as a starting point.