ICRA07 Vision

Workshop: From features to actions - Unifying perspectives in computational and robot vision

Detailed program

8.30	Welcome (D. Kragic & Ville Kyrki)
8.50	James Little: "Maps, Places, and Worlds for Robots" (abstract)
9.20	Zoran Zivkovic, Ben Krose: "Part Based People Detection on a Mobile Robot" (abstract)
9.45	Christophe Doignon et al: "Model-based 3-D Pose Estimation and Feature Tracking for Robot Assisted Surgery with Medical Imaging" (abstract)

10.10	Coffee break

10.45	David Hogg: "Reasoning and Vision" (abstract)
11.10	Ruben Smits et al: "Image-Based Visual Servoing with Extra Task Related Constraints in a General Framework for Sensor-Based Robot Systems" (abstract)
11.35	Norbert Kruger et al: "Early Reactive Grasping with Second Order 3D Feature Relations" (abstract)
12.05	Dov Katz, Oliver Brock: "Interactive Perception: Closing the Gap Between Action and Perception" (abstract)

12.30	Lunch

14.00	Jiri Matas, Jan Sochman: "Wald’s Sequential Analysis for Time-constrained Vision Problems" (abstract)
14.30	Gabe Sibley et al: "Constant Time Sliding Window Filter SLAM as a Basis for Metric Visual Perception" (abstract)
14.55	Simon Lacroix et al: "More Vision for SLAM" (abstract)
15.20	Murillo et al: "Topological and Metric Robot Localization through Computer Vision Techniques" (abstract)

15.45	Coffee break

16.20	Darius Burschka: "Towards Robust Vision-Based Navigation Systems" (abstract)
16.35	Tamim Asfour et al: "Perceiving Objects and Movements to Generate Actions on a Humanoid Robot" (abstract)
17.00	Open floor
17.55	Closing

James Little
"Maps, Places, and Worlds for Robots"
(Computer Science, University of British Columbia, Vancouver, BC, Canada)

Vision is a powerful sense that permits a robot to look around itself and gather information both about the'immediate present and the near future. The future arrives through the more distant physical space through which the robot can move, and the possible actions and events that may arise. To know the future a robot needs to parse the world with the aid of its models and experience. Lasers and other active sensors have proven their ability to provide accurate geometric information. But context and the meaning of the space surrounding the robot, the objects and the actions they permit are only accessible with the more complete sensory input of vision. Vision as a sensor is computationally demanding, but resources have improved to make it practical. Moreover there has been a convergence of the interests of roboticists and vision scientists – both want to explore and act in the world. Many of us have accepted that we must learn the patterns of data using machine learning, but we must also integrate categorical descriptions of our world, prototypical information that no individual robot is yet capable of learning. Vision provides the anchoring for concepts. I will discuss recent advances and trends linking vision and robotics through spatial descriptions and the connections with objects, actions, and meaning.

Zoran Zivkovic and Ben Krose
"Part Based People Detection on a Mobile Robot"
(University of Amsterdam, The Netherlands)

We design a robust people detection module for a mobile robot inspired by the latest results on the part-based representations from the computer vision area. The approach is based on the probabilistic combination of fast human body part detectors. The representation is robust to partial occlusions, part detector false alarms and missed detections of body parts. Furthermore, we show how to use the fact that the persons walk on a known floor plane to detect them more reliably and efficiently. Finally, we show how our framework can be used to combine information from different sensors.

C. Doignon, F. Nageotte, B. Maurin and A. Krupa
"Model-based 3-D Pose Estimation and Feature Tracking for Robot Assisted Surgery with Medical Imaging"
(Louis Pasteur University of Strasbourg, ENSPS, Illkirch, France,
Cerebellum Automation Company Chavanod, France and
IRISA - INRIA Rennes, France)

In this paper we address the problem of the pose estimation based on multiple geometrical features for monocular endoscopic vision with laparoscopes and for stereotaxy with CT scanners. Partial and full pose estimation (6 dofs) are considered with applications to minimally invasive surgery. At the University of Strasbourg, we have been developing a set of techniques for assisting surgeons in navigating and manipulating the three-dimensional space within the human body. In order to develop such systems, a variety of challenging visual tracking and registration problems with pre-operative and/or intra-operative images must be solved. This paper integrates several issues where computational vision can play a role. Depth recovery (from the tip of a surgical instrument w.r.t. living tissue), the Plucker coordinates (4 dofs) of a markerless cylindrical instrument, the 6 dofs of a needle-holder with an heterogeneous set of features and stereotaxy are the examples we describe. Projective invariants with perspective projection, quadrics of revolution and stereotactic markers are features which are useful to achieve the registration with uncalibrated or calibrated devices. Visual servoing-based tracking methods have been developed for image-guided robotic systems, for assisting surgeons in laparoscopic surgery and in interventional radiology. Real-time endoscopic vision and single-slice stereotactic registration has been proposed to retrieve the out-of-field of view instruments, to position a needle and to compensate small displacements like those due to patient breathing or any small disturbances which may occur during an image-guided surgical procedure.

David Hogg
"Reasoning and Vision"
(School of Computing, University of Leeds, UK )

Representations and mechanisms of conceptual reasoning have traditionally been at the heart of research on artificial intelligence, and generally (but not always) absent from approaches to computer vision dealing with natural images and video.
The talk will examine recent work that is attempting to integrate conceptual reasoning with computer vision in solving problems in tracking, object recognition, behaviour analysis, and human-machine interaction. Two important issues are the forms of representation used and the role of machine learning, contributing to the development of adaptive systems.

Ruben Smits, Duccio Fioravanti, Tinne De Laet,
Benedetto Allotta, Herman Bruyninckx and Joris De Schutter
"Image-Based Visual Servoing with Extra Task Related Constraints in a General Framework for Sensor-Based Robot Systems"
(Department of Mechanical Engineering, Katholieke Universiteit Leuven, Belgium and
Department of Energetics “Sergio Stecco”, Universita degli Studi di Firenze, Italy)

This paper reformulates image-based visual servoing (IBVS) as a constraint-based robot task, in order to integrate it seamlessly with other task constraints, in image space, in Cartesian space, in the joint space of the robot, or in the “image space” of any other sensor (e.g. force, distance). In this way the different sensor data is fused. The integration takes place via the specification of generic “feature coordinates”, defined in the different task spaces. Control loops are closed around the feature coordinate setpoints, in each of these task spaces, and instantaneously combined into setpoints for a velocity controlled robot that executes the task. The paper describes real world experimental results for image-based visual tracking with extra Cartesian constraints. During the workshop, many more examples will be given, with constraints in all different task spaces.

Daniel Aarno, Johan Sommerfeld, Danica Kragic, Nicolas Pugeault, Sinan Kalkan, Florentin Wörgötter, Dirk Kraft, Norbert Krüger
"Early Reactive Grasping with Second Order 3D Feature Relations"
(Royal Institute of Technology, Sweden,
University of Edinburgh, UK
University of Göttingen, Germany
Sydansk University and Aalborg University, Denmark)

One of the main challenges in the field of robotics is to make robots ubiquitous. To intelligently interact with the world, such robots need to understand the environment and situations around them and react appropriately, they need
context-awareness. But how to equip robots with capabilities of gathering and interpreting the necessary information for novel tasks through interaction with the environment and by providing some minimal knowledge in advance? This has been a longterm question and one of the main drives in the field of cognitive system development.
The main idea behind the work presented in this paper is that the robot should, like a human infant, learn about objects by interacting with them, forming representations of the objects and their categories that are grounded in its embodiment. For this purpose, we study an early learning of object grasping process where the agent, based on a set of innate reflexes and knowledge about its embodiment. We stress out that this is not the work on grasping, it is a system that interacts with the environment based on relations of 3D visual features generated trough a stereo vision system. We show how geometry, appearance and spatial relations between the features can guide early reactive grasping which can later on be used in a more purposive manner when interacting with the environment.

Dov Katz and Oliver Brock
"Interactive Perception: Closing the Gap Between Action and Perception"
(Robotics and Biology Laboratory, Department of Computer Science, University of Massachusetts Amherst)

We introduce Interactive Perception as a new perceptual paradigm for autonomous robotics in unstructured environments. Interactive perception augments the process of perception with physical interactions, thus integrating robotics and computer vision. By integrating interactions into the perceptual process, it is possible to manipulate the environment so as to uncover information relevant for the robust and reliable execution of a task. Examples of such interactions include the removal of obstructions or object repositioning to improve lighting conditions. More importantly, forceful interaction can uncover perceptual information that would otherwise be imperceivable. In this paper, we begin to explore the potential of the interactive perception paradigm. We present an interactive perceptual primitive that extracts kinematic models from objects in the environment. Many objects in everyday environments, such as doors, drawers, and hand tools, contain inherent kinematic degrees of freedom. Knowledge of these degrees of freedom is required to use the objects in their intended manner. We demonstrate how a robot is capable of extracting a kinematic model from a variety of tools, using very simple algorithms. We then show how the robot can use the resulting kinematic model to operate the tool. The simplicity of these algorithms and their effectiveness in our experiments indicate that Interactive Perception is a promising perceptual paradigm for autonomous robotics.

Jiri Matas and Jan Sochman
"Wald’s Sequential Analysis for Time-constrained Vision Problems"
(Center for Machine Perception, Dept. of Cybernetics, Faculty of Elec. Eng.
Czech Technical University in Prague, Prague, Czech Rep.)

In detection and matching problems in computer vision, both classification errors and time to decision characterize the quality of an algorithmic solution. We show how to formalize such problems in the framework of sequential decisionmaking and derive quasi-optimal time-constrained solutions for three vision problems. The methodology is applied to face and interest point detection and to the RANSAC robust estimator. Error rates of the face detector proposed algorithm are comparable to the state-of-the-art methods. In the interest point application, the output of the Hessian-Laplace detector is approximated by a sequential WaldBoost classifier which is about five times faster than the original with comparable repeatability. A sequential strategy based on Wald’s SPRT for evaluation of model quality in RANSAC leads to significant speed-up in geometric matching problems.

Gabe Sibley, Larry Matthies and Gaurav Sukhatme
"Constant Time Sliding Window Filter SLAM as a Basis for Metric Visual Perception"
(Jet Propulsion Laboratory, California Institute of Technology, Pasadena California and
Robotic and Embedded Systems Laboratory, University of Southern California, Los Angeles, California)

This paper describes a Sliding Window Filter (SWF) that is an on-line constant-time approximation to the feature-based 6-degree-of-freedom full Batch Least Squares Simultaneous Localization and Mapping (SLAM) problem. The ultimate goal is to develop a filter that can quickly and optimally fuse all data from a sequence of (stereo) images into a single underlying statistically accurate and precise spatial representation. Such a capability is highly desirable for mobile robots, though it is a computationally intense dense data-fusion problem. The SWF is useful in this context because it can scale from exhaustive batch solutions to fast incremental solutions. For instance if the window encompasses all time, the solution is algebraically equivalent to full SLAM; if only one time step is maintained, the solution is algebraically equivalent to the Extended Kalman Filter SLAM solution; if robot poses and environment landmarks are slowly marginalized out over time such that the state vector ceases to grow, then the filter becomes constant time, like Visual Odometry. Interestingly, the SWF enables other properties, such as continuous submapping, lazy data association, undelayed or delayed landmark initialization, and incremental robust estimation. We test the algorithm in simulations using stereo vision exterioceptive sensors and inertial measurement proprioceptive sensors. Initial experiments show qualitatively that the SWF approaches the performance of the optimal batch estimator, even for small windows on the order of 5-10 frames.

Simon Lacroix, Thomas Lemaire and Cyrille Berger
"More Vision for SLAM"
(LAAS-CNRS, Toulouse, France)

Many progresses have been made on SLAM so far, and various visual SLAM approaches have proven their effectiveness in realistic scenarios. However, there are many other improvements that can be made to SLAM thanks to vision, essentially for the mapping and data association functionalities. This paper sketches some of these improvements, on the basis of recent work in the literature and of on-going work.

A. C. Murillo, J. J. Guerrero and C. Sagues
"Topological and Metric Robot Localization through Computer Vision Techniques"
(DIIS - I3A, University of Zaragoza, Spain)

Vision based robotics applications have been widely studied in the last years. However, there is still a certain distance between these and the pure computer vision methods, although there are many issues of common interest in computer vision and robotics. For example, object recognition and scene recognition are closely related, which makes object recognition methods quite suitable for robot topological localization, e.g. room recognition. Another important issue in computer vision, the structure from motion problem SFM, is similar to the Simultaneous Localization and Mapping problem. This work is based on previous ones where computer vision techniques are applied for robot self-localization: a vision based method applied for room recognition and an approach to obtain metric localization from SFM algorithms for bearing only data. Several experiments are shown for both kinds of localization, room identification and metric localization, using different image features and data sets of conventional and omnidirectional cameras.

Darius Burschka
"Towards Robust Vision-Based Navigation Systems"
(Lab for Robotics and Real-Time Systems, Department of Informatics, Technische Universitat Munchen, Germany)

We present our work in the field of vision-based navigation from video sequences and discuss our current challenges in this field. We discuss the advantages and disadvantages of the different navigation techniques. Our challenge is to develop a low-cost navigation system based on video cameras that can be scaled depending on the required accuracies and
available resources. A minimal configuration consists of a single video camera that can be extended by inertial units, laser systems and additional cameras. We include experimental validations of the proposed system. The algorithm does not have any range limitations present in similar, sampling-based algorithms.

T. Asfour, K. Welke, A. Ude, P. Azad, Jan. Hoeft and R.Dillmann :
"Perceiving Objects and Movements to Generate Actions on a Humanoid Robot"
(University of Karlsruhe, Germany and Jozef Stefan Institute, Slovenia)

Imitation learning has been suggested as a promising way to teach humanoid robots. In this paper we present a new humanoid active head which features human-like characteristics in motion and response and mimics the human visual system. We present algorithms that can be applied to perceive objects and movements, which form the basis for learning actions on the humanoid. For action representation we use an HMM-based approach to reproduce the observed movements and build an action library. Hidden Markov Models (HMM) are used to represent movements demonstrated to a robot multiple times. They are trained with the characteristic features (key points) of each demonstration. We propose strategies for adaptation of movements to the given situation and for the interpolation between movements in a stored in a movement library.

Organizers and contact

Danica Kragic	Ville Kyrki
Centre for Autonomous Systems Royal Institute of Technology Sweden	Lappeenranta University of Technology Department of Information Technology Finland

Contact:
Danica Kragic
CAS/CVAP - CSC, KTH
10044 Stocholm, Sweden
Phone: +468 7906729, Fax: +468 7230302

For any further information, please contact the organizers at (dani at kth dot se)