Lane Detection for Autonomous Driving with F1Tenth Car
In this semester thesis, our goal is to enable an F1Tenth car, an autonomous vehicle at 1:10 scale of a Formula 1 car, to accurately detect its designated driving lane using RGB-D images captured by an onboard camera.
Metric (Semi-)Monocular Depth Estimation
The goal of the project is to augment existing monocular depth estimation models with measured sparse metric depth and fuse the information into accurate metric depth maps.
Object Pose Estimation using Line and Point Features
We hope to push the state of the art on object pose estimation, especially for textureless objects, by using line features as well as point features.
3D Hand Forecasting (HoloAssist: Interactive AI Assistants)
Implement an algorithm that can forecast 3D hand poses
Action Recognition Using 3D Hand-Object Contact Map
The primary objective of this project is to use an enhanced representation of 3D hand and object interaction to improve action recognition accuracy
Topological Object Search
In this project, we want to break with the requirement of geometric sensing. We want to use only a camera. This has the advantage that it can work indoors and outdoors, has no problem with reflective surfaces etc. To explore and search without geometric sensing, we want to base the algorithm on 'directions of interest'. Wherever the robot goes and sees something interesting (e.g. an open door), it will just go there.
Automatic Scene Graph Generation
In this thesis, we want to create a tool that easily and automatically creates such scene graphs from a single iPad scan of the room, such that a robot can be deployed to the environment. We base this on LabelMaker, where we add instance segmentation, and separation of rooms and floors.
3D Reconstruction of Water in a Glass
The project is about reconstructing a dynamic scene of water, glass, and an object thrown into the water. The input is images from 2-3 synchronized RGB cameras. The expected output is the 3D reconstruction of each frame, ideally optimized so that the motion is consistent.
GNSS/RTK-SLAM fusion for accurate positioning of geospatial data in Mixed Reality
The main objective of the project is to increase the accuracy and usability of the Mixed Reality solution developed by V-Labs. The V-Labs team expects that the integration of a fusion algorithm based on Artificial Intelligence or an Unscented Kalman Filter (UKF) will be able to reach that goal.
Accurate SLAM for Human-Robot Teams
We extend the lamar.ethz.ch benchmark to develop accurate SLAM methods that can co-register drones, legged robots, wheeled robots, smartphones, and mixed reality headsets based on visual SLAM.
Action Recognition with 3D Scene Graphs
This project explores the potential of 3D scene graphs to improve action recognition in AR/VR and robotic applications, addressing the challenges posed by the complexity and high dimensionality of video data. By leveraging 3D scene graphs, the project aims to overcome the limitations of 2D scene graphs, offering a more scalable and comprehensive approach to understanding egocentric actions in indoor environments.
Human-Robot Communication with Text Prompts and 3D Scene Graphs
This project extends previous work [a] on calculating similarity scores between text prompts and 3D scene graphs representing environments. The current method identifies potential locations based on user descriptions, aiding human-agent communication, but is limited by its coarse localization and inability to refine estimates incrementally. This project aims to enhance the method by enabling it to return potential locations within a 3D map and incorporate additional user information to improve localization accuracy incrementally until a confident estimate is achieved. [a] Chen, J., Barath, D., Armeni, I., Pollefeys, M., & Blum, H. (2024). "Where am I?" Scene Retrieval with Language. ECCV 2024.
Learning Affordances and Functionalities from Egocentric Actions
The primary objective of this project is to use egocentric videos to predict the 3D functionality of a map.
3D Surface Reconstruction from Sparse Viewpoints for Medical Education and Surgical Navigation
In medical education and surgical navigation, achieving accurate multi-view 3D surface reconstruction from sparse viewpoints is a critical challenge. This Master's thesis addresses this problem by first computing normal and optionally reflectance maps for each viewpoint, and then fusing this data to obtain the geometry of the scene and, optionally, its reflectance. The research explores multiple techniques for normal map computation, including photometric stereo, data-driven methods, and stereo matching, either individually or in combination. The outcomes of this study aim to pave the way for the creation of highly realistic and accurate 3D models of surgical fields and anatomical structures. These models have the potential to significantly improve medical education by providing detailed and interactive representations for learning. Additionally, in the context of surgical navigation, these advancements can enhance the accuracy and effectiveness of surgical procedures. References: Yu, Zehao, Peng, Songyou, Niemeyer, Michael, Sattler, Torsten, Geiger, Andreas. MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction. NeurIPS 2022. Baptiste Brument and Robin Bruneau and Yvain Quéau and Jean Mélou and François Lauze and Jean-Denis Durou and Lilian Calvet. RNb-Neus: Reflectance and normal Based reconstruction with NeuS. CVPR 2024. Gwangbin Bae and Andrew J. Davison. Rethinking Inductive Biases for Surface Normal Estimation. CVPR 2024.
Enhancing 3D Reconstruction and Tracking of Anatomy for Open Orthopedic Surgery
Computer-assisted interventions have advanced significantly with computer vision, improving tasks like surgical navigation and robotics. While marker-based navigation systems have increased accuracy and reduced revision rates, their technical limitations hinder integration into surgical workflows. This master thesis proposes using the OR-X research infrastructure to collect datasets of human anatomies with 3D ground truth under realistic surgical conditions. The project will evaluate state-of-the-art 3D reconstruction and tracking methods and adapt them to the orthopedic image domain, focusing on a promising marker-less optical camera-based approach for spine surgery. This work aims to enhance precision and integration in surgical navigation systems.
Advancing Camera Localization in Surgical Environments
OR-X (https://or-x.ch) is an innovative research infrastructure replicating an operating theater, equipped with an extensive array of cameras. This setup enables the collection of comprehensive datasets through densely positioned cameras, capturing detailed surgical scenes. A key challenge addressed in this master thesis is the computation of camera positions and orientations for dynamic egocentric views, such as those from head-mounted displays or GoPro cameras. Solving this issue can significantly impact applications in automatic documentation, education, surgical navigation, and robotic surgery.
Multi-view Keypoint Refinement
The goal of this project is to extend our recent two-view keypoint refinement method to work with multiple images. This is crucial for 3D reconstruction with, e.g., Structure-from-Motion.
OpenSet Semantic SLAM
The goal of the project is to create a Simultaneous Localization and Mapping algorithm that, besides estimating the camera trajectory and the geometry of the scene, also obtains object instances. These object instances should not be restricted to a fixed set of classes (e.g., chair, table). Hence, the problem is open set segmentation.
Learn to predict intent using commonsense knowledge
From robotics to human-computer interaction, numerous real-world tasks would benefit from practical systems that can anticipate future high-level actions and predict intentions and goals based on observation of the past. Intention prediction is important for care robots to anticipate people’s actions and is a key challenge in the design of artificial intelligent systems.
Multi-level Image Matching with Semantic Cues
This project aims to explore methods to leverage these semantic cues to enhance image matching at various levels, including geometric and semantic pixel-level matching, as well as object-level matching.
Large-scale Outdoor Semantic Segmentation
The objective of the project is to perform semantic/instance segmentation on city-scale reconstructions to find, e.g., buildings.
3D Reconstruction from Scene Graphs
The project aims to develop methods for reconstructing objects and their images in 3D using scene graph embeddings.
Multi-view Contour-optimization for Semantic Segmentation
The goal is to find 3D object instances/semantics as accurately as possible. This is usually done by aggregating 2D observations (object detections) in 3D. The project aims to optimize the 2D contours so that the object is found as accurately as possible
3D Object Instance Segmentation
The project focuses on identifying distinct object instances (such as tables and chairs) within 3D reconstructions of indoor environments.
Action Label Correction from Videos with LLMs
The primary objective of this project is to leverage LLMs to improve action recognition accuracy to develop AI agents.
Learning to interact with objects through Pressure Map
The primary objective of this project is to use a pressure map representation of hands to improve action recognition accuracy and robotic manipulation.

Powered by  SiROP - the academic career network