I'm a PhD student at the Royal Institute of Technology (KTH), Stockholm, Sweden, supervised by Prof. Patric Jensfelt and Prof. John Folkesson, working in the Computer Vision and Active Perception research group. My research interests include long-term mapping (typically weeks or months), object detection and machine learning. Prior to my PhD I worked on virtual/augmented reality and autonomous cars.
From June - November 2016 I am doing an internship at Robert Bosch North America working on the semantic segmentation of large point clouds.
Robotics Perception and Learning (RPL) KTH,
Room 609, Teknikringen 14,
SE-100 44, Stockholm, Sweden
PhD. student in the Robotics Perception and Learning group • June 2013 - Dec 2017
MSc, Smart Systems (Robotics)• Sep 2006 - Aug 2008
BSc, Electrical Engineering and Computer Science• Sep 2003 - Jun 2006
Machine Learning Research Scientist • Jan 2018 - Present
Research Scientist in the Machine Learning group at TRI working on deep learning for robotic perception, including self-supervised learning for depth and ego-motion prediction, 2D/3D object detection and semantic segmentation.
Robotics research intern • Jun 2016 - Nov 2016
Intern in the Robotics research group, working on the semantic segmentation of indoor point clouds (room level segmentation as well as semantic labeling).
Project manager • Sep 2010 - May 2013
Project manager in the Defence Robotics group. The work covers the complete project life - cycle, starting from proposal to requirements, planning, tracking, development and delivery, both for internally funded prototypes and for client-driven custom solutions.
Systems and Robotics Engineer • Sep 2008 - Aug 2010
Worked in the Ground Segments and Systems Group assigned to various context and location sensitive Augmented Reality projects (AR) for the European Space Agency (ESA), notably WEarable Augmented Reality (WEAR) and Portable Virtual Assembly Integration Testing Visualizer (PVAITV).
Intern • Jun 2005 - Aug 2005
Internship on the Semantic TeX package. Goal: implementing a converter from LaTeX to CNXML using the Semantic TeX package. Area of work: Defining and implementing new macros for the STeX package.
We present an automatic approach for the task of reconstructing a 2D floor plan from unstructured point clouds of building interiors. Our approach emphasizes accurate and robust detection of building structural elements, and unlike previous approaches does not require prior knowledge of scanning device poses. The reconstruction task is formulated as a multiclass labeling problem that we approach using energy minimization. We use intuitive priors to define the costs for the energy minimization problem, and rely on accurate wall and opening detection algorithms to ensure robustness. We provide detailed experimental evaluation results, both qualitative and quantitative, against state of the art methods and labeled ground truth data. Our method outperforms related approaches on the majority of the data we have tested.Point cloud semantic segmentation, energy minimization
In this work we address the problem of dynamic object segmentation in office environments. We make no prior assumptions on what is dynamic and static, and our reasoning is based on change detection between sparse and non-uniform observations of the scene. We model the static part of the environment, and we focus on improving the accuracy and quality of the segmented dynamic objects over long periods of time. We address the issue of adapting the static structure over time and incorporating new elements, for which we train and use a classifier whose output gives an indication of the dynamic nature of the segmented elements. We show that the proposed algorithms improve the accuracy and the rate of detection of dynamic objects by comparing with a labelled dataset.Dynamic object segmentation, long-term mapping, SVM
We present a system for creating object models from RGB-D views acquired autonomously by a mobile robot. We create high-quality textured meshes of the objects by approximating the underlying geometry with a Poisson surface. Our system employs two optimization steps, first registering the views spatially based on image features, and second aligning the RGB images to maximize photometric consistency with respect to the reconstructed mesh. We show that the resulting models can be used robustly for recognition by training a Convolutional Neural Network (CNN) on images rendered from the reconstructed meshes. We perform experiments on data collected autonomously by a mobile robot both in controlled and uncontrolled scenarios. We compare quantitatively and qualitatively to previous work to validate our approach.Autonomous object detection, segmentation, registration and detection.
In this article we present and evaluate a system which allows a mobile robot to autonomously detect, model and re-recognize objects in everyday environments. Whilst other systems have demonstrated one of these elements, to our knowledge we present the first system which is capable of doing all of these things, all without human interaction, in normal indoor scenes. Our system detects objects to learn by modelling the static part of the environment and extracting dynamic elements. It then creates and executes a view plan around a dynamic element to gather additional views for learning. Finally these views are fused to create an object model. The performance of the system is evaluated on publicly available datasets as well as on data collected by the robot in both controlled and uncontrolled scenarios.Autonomous object modelling, segmentation of dynamic objects, next-best-view planning.
In this paper, a system for incrementally building and maintaining a database of 3D objects for robots with long run times is presented. The system is a step towards allowing robots to maintain and learn throughout the robots life cycle. The proposed solution iteratively fuses observations as they arrive into better and better models. By greedily allowing the system to fuse data, mistakes can be made. The system continuously seek to detect and remove such errors, without the need for batch updates using all known data at once.Lifelong incremental object registration.
We present a novel method for clustering segmented dynamic parts of indoor RGB-D scenes across repeated observations by performing an analysis of their spatial-temporal distributions. We segment areas of interest in the scene using scene differencing for change detection. We extend the Meta-Room method and evaluate the performance on a complex dataset acquired autonomously by a mobile robot over a period of 30 days. We use an initial clustering method to group the segmented parts based on appearance and shape, and we further combine the clusters we obtain by analyzing their spatial-temporal behaviors. We show that using the spatial-temporal information further increases the matching accuracy.Spatial-temporal modelling, dynamic object segmentation
We present a novel approach to mobile robot search for non-stationary objects in partially known environments. We formulate the search as a path planning problem in an environment where the probability of object occurrences at particular locations is a function of time. We propose to explicitly model the dynamics of the object occurrences by their frequency spectra. Using the environment model proposed, our path planning algorithm can construct plans that reflect the likelihoods of object locations at the time the search is performed.
Three datasets collected over several months containing person and object occurrences in residential and office environments were chosen to evaluate the approach. Several types of spatio-temporal models are created for each of these datasets and the efficiency of the search method is assessed by measuring the time it took to locate a particular object. The experiments indicate that modeling the dynamics of objects’ occurrence reduces the average search time by 35% to 65% compared to maps that neglect these dynamics.
Long-term autonomous learning of human environments entails modelling and generalizing over distinct variations in: object instances in different scenes, and different scenes with respect to space and time. It is crucial for the robot to recognize the structure and context in spatial arrangements and exploit these to learn models which capture the essence of these distinct variations. Table-tops posses a typical structure repeatedly seen in human environments and are identified by characteristics of being personal spaces of diverse functionalities and dynamically changing due to human interactions. In this paper, we present a 3D dataset of 20 office table-tops manually observed and scanned 3 times a day as regularly as possible over 19 days (461 scenes) and subsequently, manually annotated with 18 different object classes, including multiple instances. We analyse the dataset to discover spatial structures and patterns in their variations. The dataset can, for example, be used to study the spatial relations between objects and long-term environment models for applications such as activity recognition, context and functionality estimation and anomaly detection.Dataset, long-term dynamics, spatial-temporal
We present a novel method for re-creating the static structure of cluttered office environments - which we define as the ”meta-room” - from multiple observations collected by an autonomous robot equipped with an RGB-D depth camera over extended periods of time. Our method works directly with point clusters by identifying what has changed from one observation to the next, removing the dynamic elements and at the same time adding previously occluded objects to reconstruct the underlying static structure as accurately as possible. The process of constructing the meta-rooms is iterative and it is designed to incorporate new data as it becomes available, as well as to be robust to environment changes. The latest estimate of the meta-room is used to differentiate and extract clusters of dynamic objects from observations. In addition, we present a method for re-identifying the extracted dynamic objects across observations thus mapping their spatial behaviour over extended periods of time.Meta-room, dynamics, object segmentation
This paper presents a novel approach to model motion patterns of dynamic objects, such as people and vehicles, in the environment with the occupancy grid map representation. Corresponding to the ever-changing nature of the motion pattern of dynamic objects, we model each occupancy grid cell by an IOHMM, which is an inhomogeneous variant of the HMM. This distinguishes our work from existing methods which use the conventional HMM, assuming motion evolving according to a stationary process. By introducing observations of neighbor cells in the previous time step as input of IOHMM, the transition probabilities in our model are dependent on the occurrence of events in the cell’s neighborhood. This enables our method to model the spatial correlation of dynamics across cells. A sequence processing example is used to illustrate the advantage of our model over conventional HMM based methods. Results from the experiments in an office corridor environment demonstrate that our method is capable of capturing dynamics of such human living environments.Hidden Markov Models, modelling human dynamics
This paper details the software and hardware architecture of a minimally invasive system aimed at converting existing vehicles into autonomous plaftforms. The driving factor behind this work is the fact that field robotics technology is currently restricted to specific platforms which are not widely available to the general public. Our system addresses exactly this drawback by designing a platform agnostic system which is field deployable and can be retrofitted in a matter of hours on any vehicle. The conversion process grants the vehicle teleoperation as well as autonomous capabilities geared towards the completion of an application specific task. In addition, our system does not impair in any way the original functionally of the vehicle, allowing an operator to take full control of the vehicle at any point in time.Autonomous driving, conversion, drive-by-wire, mapping, tele-operation
WEAR is a European Space Agency (ESA) funded project which aims at creating a light weight, hands free, location based augmented reality system. It uses a head-mounted-dysplay (HMD) to present context specific information, with the goal of assisting in the execution of maintenance procedures. The position and orientation of the user is tracked through a Kalman Filter which fuses data from an IMU and a camera based localization system. The WEAR project was demonstration on board the International Space Station.Augmented reaity, vision based localization, IMU, speech recognition, ESA, ISS
Exploration of unknown environments remains one of the fundamental problems of mobile robotics. It is also a prime example for a task that can benefit significantly from multi-robot teams. We present an integrated system for semi-autonomous cooperative exploration, augmented by an intuitive user interface for efficient human supervision and control.
In this preliminary study we demonstrate the effectiveness of the system as a whole and the intuitive interface in particular. Congruent with previous findings, results confirm that having a human in the loop improves task performance, especially with larger numbers of robots. Specific to our interface, we find that even untrained operators can efficiently manage a decently sized team of robots.
Mobile robots are increasingly used in unstructured domains without permanent supervision by a human operator. One example is Safety, Security and Rescue Robotics (SSRR) where human operators are a scarce resource. There are additional motivations in this domain to increase robot autonomy, e.g., the desire to decrease the cognitive load on the operator or to allow robot operations when communication to a operator’s station fails. Planetary exploration has in this respect much in common with SSRR. Namely, it takes place in unstructured environments and it requires high amounts of autonomy due to the significant delay in communication. Here we present efforts to carry over results from research within SSRR, especially work on terrain classification for autonomous mobility, to planetary exploration. The simple yet efficient approach to terrain classification is based on the Hough transform of planes. The idea is to design a parameter space such that drivable surfaces lead to a strong single response, whereas non-drivable ones lead to data-points spread over the parameter space. The distinction between negotiable and non-negotiable as well as other terrain type is then done by a decision tree. The algorithm is applied in the SSRR domain to 3D data obtained from two different sensors, namely, a near infra-red time of flight camera and a stereo camera. Experimental results are presented for typical indoor as well as outdoor terrains, demonstrating robust realtime detection of drivable ground. The work is then carried over to the planetary exploration domain by using data from the Mars Exploration Rover Mission (MER).Terrain classification, Hough transform, traversability
We present a novel method for efficient querying and retrieval of arbitrarily shaped objects from large amounts of unstructured 3D point cloud data. Our approach first performs a convex segmentation of the data after which local features are extracted and stored in a feature dictionary. We show that the representation allows efficient and reliable querying of the data. To handle arbitrarily shaped objects, we propose a scheme which allows incremental matching of segments based on similarity to the query object. Further, we adjust the feature metric based on the quality of the query results to improve results in a second round of querying. We perform extensive qualitative and quantitative experiments on two datasets for both segmentation and retrieval, validating the results using ground truth data. Comparison with other state of the art methods further enforces the validity of the proposed method. Finally, we also investigate how the density and distribution of the local features within the point clouds influence the quality of the results.Mapping, Mobile Robotics, Point Cloud, Retrieval
Thanks to the efforts of our community, autonomous robots are becoming capable of ever more complex and impressive feats. There is also an increasing demand for, perhaps even an expectation of, autonomous capabilities from end-users. However, much research into autonomous robots rarely makes it past the stage of a demonstration or experimental system in a controlled environment. If we don't confront the challenges presented by the complexity and dynamics of real end-user environments, we run the risk of our research being ignored by the industries who will ultimately drive its uptake. In the STRANDS project we are tackling this challenge head-on. We are creating novel autonomous systems, integrating state-of-the-art research in artificial intelligence and robotics into robust mobile service robots, and deploying these systems for long-term installations in security and care environments. In this article we present an overview of the motivation and approach of the STRANDS project, describe the technology we use to enable long, robust autonomous runs in challenging environments, and describe how our robots are able to use these long runs to improve their own performance.Service Robots; AI Reasoning Methods; Learning and Adaptive Systems; Autonomous Agents; Mobile Robots
We present an approach to automatically assign semantic labels to rooms reconstructed from unstructured point clouds such as apartments. Evidence for the room types is generated using state of the art deep learning techniques for scene classification and object detection. The evidence is merged using Conditional Random Fields. We provide detailed experimental evaluation results.Semantic labelling; Data fusion; RGB-D data; Convolutional Neural Networks; Conditional Random Fields; Scene Classification; Object Recognition
In this work we summarize the solution developed by Team KTH for the Amazon Picking Challenge 2016 in Leipzig, Germany. The competition simulated a warehouse automation scenario and it was divided in two tasks: a picking task where a robot picks items from a shelf and places them in a tote and a stowing task which is the inverse task where the robot picks items from a tote and places them in a shelf. We describe our approach to the problem starting from a high level overview of our system and later delving into details of uur perception pipeline and our strategy for manipulation and grasping. The solution was implemented using a Baxter robot equipped with additional sensors.Amazon Picking Challenge; RGB-D data; Object Recognition; Manipulation; Behaviour Tree.
In this paper we present an end to end object modeling pipeline for an unmanned aerial vehicle (UAV). We contribute a UAV system which is able to autonomously plan a path, navigate, acquire views of an object in the environment from which a model is built. The UAV does collision checking of the path and navigates only to those areas deemed safe. The data acquired is sent to a registration system which segments out the object of interest and fuses the data. We also show a qualitative comparison of our results with previous work.Unmanned Aerial Vehicle (UAV); RGB-D data; Object Modelling; View Planning.
Robotic perception is related to many applications in robotics where sensory data and artificial intelligence/machine learning (AI/ML) techniques are involved. Examples of such applications are object detection, environment representation, scene understanding, human/pedestrian detection, activity recognition, semantic place classification, object modeling, among others. Robotic perception, in the scope of this chapter, encompasses the ML algorithms and techniques that empower robots to learn from sensory data and, based on learned models, to react and take decisions accordingly. The recent developments in machine learning, namely deep-learning approaches, are evident and, consequently, robotic perception systems are evolving in a way that new applications and tasks are becoming a reality. Recent advances in human-robot interaction, complex robotic tasks, intelligent reasoning, and decision-making are, at some extent, the results of the notorious evolution and success of ML algorithms. This chapter will cover recent and emerging topics and use-cases related to intelligent perception systems in robotics.Robotic perception, machine learning, advanced robotics, artificial intelligence.
Recent techniques in self-supervised monocular depth estimation are approaching the performance of supervised methods, but operate in low resolution only. We show that high resolution is key towards high-fidelity self-supervised monocular depth prediction. Inspired by recent deep learning methods for Single-Image Super-Resolution, we propose a subpixel convolutional layer extension for depth super-resolution that accurately synthesizes high-resolution disparities from their corresponding low-resolution convolutional features. In addition, we introduce a differentiable flip-augmentation layer that accurately fuses predictions from the image and its horizontally flipped version, reducing the effect of left and right shadow regions generated in the disparity map due to occlusions. Both contributions provide significant performance gains over the state-of-the-art in self-supervised depth and pose estimation on the public KITTI benchmark. A video of our approach can be found at https://youtu.be/jKNgBeBMx0I.Self-supervised learning, depth from mono, differentiable flip-augmentation, subpixel convolutional layer.
Densely estimating the depth of a scene from a single image is an ill-posed inverse problem that is seeing exciting progress with self-supervision from strong geometric cues, in particular from training using stereo imagery. In this work, we investigate the more challenging structure-from-motion (SfM) setting, learning purely from monocular videos. We propose PackNet - a novel deep architecture that leverages new 3D packing and unpacking blocks to effectively capture fine details in monocular depth map predictions. Additionally, we propose a novel velocity supervision loss that allows our model to predict metrically accurate depths, thus alleviating the need for test-time ground-truth scaling. We show that our proposed scale-aware architecture achieves state-of-the-art results on the KITTI benchmark, significantly improving upon any approach trained on monocular video, and even achieves competitive performance to stereo-trained methods.Self-supervised learning, depth from mono, velocity-supervision, 3D packing and unpacking.
Maximilian Durner*, Manuel Brucker*, Rares Ambrus*, Zoltan Csaba Marton,
Axel Wendt, Patric Jensfelt, Kai Arras, Rudolph Triebel
International Conference on Robotics and Automation, IEEE, 2018
*these authors contributed equally
Diogo Almeida, Rares Ambrus, Sergio Caccamo, Xi Chen, Silvia Cruciani, Joao F. Pinto B. De Carvalho, Joshua Haustein, Alejandro Marzinotto, Francisco E. Vina B., Yiannis Karayiannidis, Petter Ogren, Patric Jensfelt and Danica Kragic
ICRA Workshop: Warehouse Picking Automation, IEEE, 2017
Rares Ambrus, Nils Bore, John Folkesson, Patric Jensfelt
International Conference on Intelligent Robots and Systems (IROS), IEEE/RSJ, 2017
Nick Hawes, Christopher Burbridge, Ferdian Jovan, Lars Kunze, Bruno Lacerda, Lenka Mudrova, Jay Young, Jeremy Wyatt, Denise Hebesberger, Tobias Körtner, Rares Ambrus, Nils Bore, John Folkesson, Patric Jensfelt, Lucas Beyer, Alexander Hermans, Bastian Leibe, Aitor Aldoma, Thomas Fäulhammer, Michael Zillich, Markus Vincze, Muhannad Al-Omari, Eris Chinellato, Paul Duckworth, Yiannis Gatsoulis, David Hogg, Anthony Cohn, Christian Dondrup, Jaime Pulido Fentanes, Tomáš Krajník, João Machado Santos, Tom Duckett, Marc Hanheide
(to appear) Robotics and Automation Magazine (RAM), IEEE, 2017
READ MORE Bibtex
Akshaya Thippur, Rares Ambrus, Gaurav Agrawal, Adria Gallart Del Brugo, Janardhan Haryadi Ramesh, Mayank Kumar Jha, Malepati Bala Siva Sai Akhil, Nishan Bhavanishankar Shetty, John Folkesson, Patric Jensfelt
International Conference on Control, Automation, Robotics and Vision, IEEE, 2014
READ MORE Bibtex
D. de Weerdt, M. Ilkovitz, R. Ambrus, Y. Nevatia, D. Martinez Oliviera, L. Arguello
11th International Workshop on Simulation \& EGSE facilities for Space Programmes -SESP, 2011