CV_Depth-Estimation

深度感知

1.双眼视角 2.先验知识 3.光线阴影

Available Datasets:

Loss Functions

SSIM
For self-supervised learning:

$L_p(I_t, \hat{I}_t) = \alpha_1 \frac{1-SSIM(I_t, \hat{I}_t)}{2} + (1-\alpha_1)||I_t- \hat{I}_t||$

Network Architectures

Autoencoder Structure with Skip Connections.

Challenges

Application Scenarios

Innovation Points

Semi/Un/Self Supervised Learning
Attention Mechanism
More Useful Loss Function
More Efficient Network
Reinforcement Learning
Knowledge Distill

Structure from motion(SfM)
Learning single-view 3D from registered 2D views
Warping-based view synthesis
Unsupervised/Self-supervised learning from video

SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation

>[Paper] [[Code]]
Conference: CVPR
Year: 2018
Institute: TRI
Author: Sudeep Pillai, Rares, Ambrus, Adrien Gaidon
#Self-supervised Learning, Monocular, Stereo Imagery

What they did:

Proposed a sub-pixel convolutional layer extension for depth super-resolution.
Introduce a differentiable flip-augmentation layer

Why they did:
To solve: high resolution monocular depth prediction.

Innovation points:

Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

>[Paper] [Code]
Conference: CVPR
Year: 2018
Institute: Google Brain
Author: Reza Mahjourian, Martin Wicke, Anelia Angelova
#Unsupervised Learning, Monocular Video

What they did:

Proposed a novel unsupervised algorithm for depth and ego-motion from monocular video.
Take the 3D structure of the world into consideration by a 3D loss function.

Why they did:
To solve:

Innovation points:

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

>[Paper] [Code]
Conference: CVPR
Year: 2018
Institute: Google Brain
Author: Reza Mahjourian, Martin Wicke, Anelia Angelova
#Unsupervised Learning, Monocular Video

What they did:

Proposed a novel unsupervised algorithm for depth and ego-motion from monocular video.
Take the 3D structure of the world into consideration by a 3D loss function.

Why they did:
To solve:

Innovation points:

Single Image Depth Estimation Trained via Depth from Defocus Cues

>[Paper] [[Code]]
Conference: CVPR
Year: 2019
Institute: Facebook AI
Author: Shir Gur, Lior Wolf
#Unsupervised Learning, Defocus Cues

Terms:

Structure from motion(SfM) 运动中结构重建
Estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals.
利用可能与局部运动信号耦合的二维图像序列估计三维结构
Point spread function(PSF)点扩散函数
Describes the response of an imaging system to a point source or point object.
用于描述一个图像系统对一个点源或是点目标的响应。

What they did:

Rely, instead of multiple view geometry, on shape from defocus.
Proposed a novel Point Spread Function(PSF) layer, combining the successful ASPP architecture
Dense connections and self-attention

$I$ — all-in-focus image; $D_o$ — depth-map; $\rho$ — camera parameters vector(the aperture $A$, the focal length $F$ and the focal depth $D_f$);
There are two network: $f$ for depth estimation and $g$ for focus rendering, and $f$ is learned.
The learned network $f$ takes $I$ as input, and output a predicted depth $\bar{D}_o$. Then input $I$, $\bar{D}_o$, $\rho$ to $g$, and output a estimated rendered focused image $\bar{J}$.
The fixed network $g$ consists of the PSF layer takes $I$, $D_o$, $\rho$ as input, and output a rendered focus image $J$.

Atrous Convlution
Atrous Spatial Pyramid Pooling(ASPP)

Why they did:
To solve:

Innovation points:

Digging Into Self-Supervised Monocular Depth Estimation

>[Paper] [Code]
Conference: ICCV
Year: 2019
Institute:
Author: Cl´ement Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow
#self-supervised Learning, Monocular

What they did:

A minimum reprojection loss, designed to robustly handle occlusions.
A full-resolution multi-scale sampling method that reduces visual artifacts.
An auto-masking loss to ignore training pixels that violate camera motion assumptions.

Learning Depth from Monocular Videos using Direct Methods

>[Paper] [[Code]]
Conference: CVPR
Year: 2017
Institute:
Author: Chaoyang Wang, Jos´e Miguel Buenaposada, Rui Zhu, Simon Lucey
#unsupervised Learning, monocular

What they did:

Explain why scale ambiguity in current monocular methods is problematic
Propose a simple normalization strategy
Incorporation of a Direct Visual Odometry (DVO) pose predictor
Considered the geometric relation between camera pose and depth

Unsupervised Learning of Depth and Ego-Motion from Video

>[Paper] [Code]
Conference: CVPR
Year: 2017
Institute:
Author: Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe
#unsupervised Learning, monocular

What they did:

Using single-view depth and multi-view pose networks, with a loss based on warping nearby views to the target using the computed depth and pose.
Explainability mask discounts
Overcoming the gradient locality by using a convolutional encoder-decoder architec- ture with a small bottleneck

Unsupervised Monocular Depth Estimation with Left-Right Consistency

>[Paper] [Code]
Conference: CVPR
Year: 2017
Institute:
Author: Cl´ement Godard, Oisin Mac Aodha, Gabriel J. Brostow
#unsupervised Learning, binocular stereo footage

What they did:

Introduce a novel depth estimation training loss
Featuring an inbuilt left-right consistency check

Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer

>[Paper] [[Code]]
Conference: CVPR
Year: 2018
Institute:
Author: Amir Atapour-Abarghouei, Toby P. Breckon
#supervised Learning, monocular, image style transfer, domain adaptation

What they did:

Synthetic depth prediction - apredict depth based on high quality synthetic depth training data(supervise learning)
Domain adaptation via style transfer

Depth map prediction from a single image using a multi-scale deep network.

>[Paper]
Conference: NIPS
Year: 2014
Institute:
Author: David Eigen, Christian Puhrsch, Rob Fergus
#supervised Learning

What they did:

Using a scale-invariant error in addition to more common scale-dependent errors
One that first estimates the global structure of the scene, then a second that refines it using local information

Why they did:
To solve: the global scale

Innovation points:

Collecting dataset(indoor&outdoor) from websites, social media outlets, real estate listings, and shopping sites.

Deeper Depth Prediction with Fully Convolutional Residual Networks

>[Paper] [Code]
Conference:
Year: 2016
Institute:
Author: Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, Nassir Navab
#supervised Learning

What they did:

A fully convolutional architecture, encompassing residual learning.
Efficiently learn feature map up-sampling within the network, up-projection layer.
Reverse Huber loss.

深度感知

Available Datasets:

Loss Functions

Network Architectures

Challenges

Application Scenarios

Innovation Points

Related Work

SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation

Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints

Single Image Depth Estimation Trained via Depth from Defocus Cues

Digging Into Self-Supervised Monocular Depth Estimation

Learning Depth from Monocular Videos using Direct Methods

Unsupervised Learning of Depth and Ego-Motion from Video

Unsupervised Monocular Depth Estimation with Left-Right Consistency

Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer

Depth map prediction from a single image using a multi-scale deep network.

Deeper Depth Prediction with Fully Convolutional Residual Networks