General information
In this work, we aim to concentrate on the development and pretraining of foundational vision and vision-language models tailored for aerial robotics applications. Our primary objective is to construct a robust model proficient in understanding and interpreting aerial images, specifically for tasks associated with aerial robotics. This includes, but is not limited to, drone navigation through a single prompt and sensor input and change detection, encompassing crucial environmental conservation and surveillance tasks. Traditionally, training computer vision models for aerial data demands a substantial investment of manual labeling efforts. In response to this challenge, our approach seeks to alleviate the labeling burden by harnessing advanced pretraining techniques.