The Greek sea Bob is dreaming of

, in Mediterranean-English was heard, as a woman screamed just right off the deck’s edge. The moment of truth! Bob takes a deep breath and sprints towards her, only to stare hesitantly, as the currents drag her little girl away from the vessel.“-Swimming, anyone??!!”“-Com’ on!”, Bob shouted silently to himself,“-You’ve learnt how to do it in the pool; now, jump into the sea! ”or not…

**It’s not all about buoyancy**

**waves**? Can he withstand the water **temperature **until real help arrives? Can he even keep his eyes open in **salty **water without his goggles? These, and many more, synthesize an ocean of possibilities for what we only know has little relevance to what Bob has been trained on. Waves in swimming pools are usually minuscule, water temperature is comforting and trainees always wear goggles. In the wild, on the other hand, waves can be anything from tiny to gigantic, waters can be freezing or boiling and people may need to submerge without equipment.

Challenges that Bob has to face in the wild sea

Bob's ideal benchmark

**Monocular 360° depth estimation**

Depth prediction from panoramic images

Our task is to train a neural network to predict depth from panoramic color images. Note that this gets especially challenging due to the spherical distortion of such images.

The most basic approach to training is to take some sampling of the real-world, for example a dataset of images like the Matterport3D [1], split it into two disjoint subsets, use one to train the model and the other to test its performance; all under the assumption that the test set is a good enough representative of real-world distribution. As we saw earlier, good performance in one sampling of the world does not guarantee generalization, as there may be significant distribution shift between this sampling and reality.

Lately, a new model-development scheme has been gaining traction. Instead of having train and test sets derive from the same sampling, a.k.a dataset, you use a totally different one to test, like for example the GibsonV2 [3], expecting it to exhibit some generalized distribution shift, enough to be a more reliable measure of real-world performance.

Our work differs by identifying three common modes of distribution shift in the task of monocular depth estimation, and introducing a scheme where multiple test-sets are used, each one expressing only some modes of it. This way, we can study their impact isolated from other factors and test generalized or targeted mitigation techniques for their performance, in a better-controlled environment.

The three modes of distribution shift we study in our benchmark are: Covariate shift, which is a shift of the input domain, Prior shift, which is a shift of the output domain and Concept shift which is a shift of the relationship between input and output.

To train, we use the default M3D train-split which comprises scans of residential buildings.

To produce a test-set that expresses only covariate shift, we render the M3D test-split using Blender™’s filmic view transform, which alters color to produce a more cinematic look. Depth distribution and content-type remain untouched.

Further, combine the covariate shift with the concept one, by rendering the Gibson fullplus split with the filmic view transform as well.

To understand the nature of these shifts in a quantitative way, we apply the benchmarking scheme for 360° depth estimation presented in Pano3D [4], evaluating the Unet baseline model. We rely on a set of complex indicators, aggregating metrics for depth estimation and its secondary traits, boundary preservation and smoothness, in a way that higher values are better.

As it is expected, all kinds of distribution shift seem to hurt the performance of the baseline model. Among the three tested, the more complex concept shift proved to be the hardest to overcome. As for the stackability of the shifts, it is definitely evident, hinting further to the highly challenging nature of generalized shifts.

Going on with our analysis, we put to the test two common distribution shift-addressing techniques, to measure their alleviating capacity against our isolated shifts and their combinations, putting in dispute their conventional picture.

First, we train the aforementioned model with photometric augmentation. Experiments suggest that performance improves significantly when covariate shifts manifest, with the boost being higher when covariate shift is dealt in isolation but lower when it is mixed with others, further pointing to the complex nature of generalized shifts.

Lastly, we explore the effect of pre-training by using the Pano3D’s PNAS model, pre-trained on ImageNet. At first glance, it seems to offer substantial gains for the original test-set in raw depth estimation terms, and for the covariate shift in all terms. Interestingly thought, it seems that it does so only in isolation, as all combination-shifts proved to be quite challenging for it. Overall, it feels like the model only benefits from a higher quality parameter initialization, rather than gaining generalization capacity.

Taking into account the challenging and stackable nature of distribution shifts, we believe that our work facilitates future research by offering a structured methodology to address multiple shifts, where progress can be measured and correctly accounted for, as we transition into better generalized-shift addressing methods.

Ah yes, the girl…

She was finally rescued by the properly trained women and men of the coast-guard operating nearby. She lived happily ever after.

- A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Nießner, M. Savva, S. Song, A. Zeng, Y. Zhang. (2017) Matterport3d: Learning from rgb-d data in indoor environments. arXiv:1709.06158
- R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, V. Koltun. (2020) Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv:1907.01341
- F. Xia, A. Zamir, Z-Y. He, A. Sax, J. Malik, S. Savarese. (2018) Gibson env: Real-world perception for embodied agents. arXiv:1808.10654
- G. Albanis, N. Zioulis, P. Drakoulis, V. Gkitsas, V. Sterzentsenko, F. Alvarez, D. Zarpalas, P. Daras (2021) Pano3D: A Holistic Benchmark and a Solid Baseline for 360° Depth Estimation. arXiv:2109.02749

Made with Mobirise - Click here