[논문리뷰] Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

[2008.05711] Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (arxiv.org)

Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

The goal of perception for autonomous vehicles is to extract semantic representations from multiple sensors and fuse these representations into a single "bird's-eye-view" coordinate frame for consumption by motion planning. We propose a new end-to-end arch

arxiv.org

Introduction & Limitation of previous methods

multi view perception은 다음과 같은 조건을 만족해야한다.
- Translation equivariance
- Permutation invariance
- Ego-frame isometry equivariance
하지만, 기존의 BEV perception 연구는 end-to-end 모델이 없었다. 2D backbone을 거친 feature로 multi view를 만들고 post-processing을 했기에 gradient descent를 처음부터 끝까지 활용할수가 없었다고 한다.
그래서 본 논문에서는 end-to-end model을 통해 gradient descent를 모델 전체에 활용하는 Multi view 3d perception model을 제안한다.

Proposed method

크게 Lift, Splat, Shoot 세가지 과정으로 나뉜다.
Lift: 각 pixel마다 3D 공간에 D개의 depth point가 있다고 가정한다. 그리고 각 depth에 따른 depth distribution a가 있다고 생각한다.

이 과정에서 2D backbone을 통해 얻은 context vector와 depth distribution a를 곱해서 depth d에 따른 feature c_d를 얻는다.

Splat: 이렇게 얻은 context feature c_d를 BEV space에 배치한다. 간단하게 sum pooling을 수행. 하지만 이것이 연산이 과다하여, 각 bev voxel마다 구획을 정해놓고 cumulative sum을 빼주는 방식으로 sum pooling을 수행한다. 일종의 trick.

Shoot: 이후에는 이를 하나의 3D point로 생각해서 BEV encoder에 넣어서 LiDAR와 같이 Segmentation을 수행한다.

map segmentation과 tracking을 위해서는 이렇게 식을 세워야한다는데 내가 코딩한 것은 아니라서 잘은 모르겠다...

Experiment

Segmentation과 mapping task에 대해서 가장 좋은 성능을 보이고 있다.

Extrinsic noise와 dropout에 대해서도 강건성을 보이고 있음을 알 수 있다.

또한 train time에 없던 카메라가 추가되었을 때 test time에서 성능이 상승하는 모습을 보여준다.

Depth prediction에 대해서 LiDAR보다 성능이 좋지는 않지만, 주행 가능한 구역에 대해서는 LiDAR에 근접한 성능을 보임을 알 수 있다.

저작자표시 비영리 변경금지

'AI > Paper Review' 카테고리의 다른 글

[논문 리뷰] Probing the 3D Awareness of Visual Foundation Models (2)	2024.12.24
[논문 리뷰] Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains (0)	2024.08.20
[논문 리뷰] NeRF-SLAM: Real-Time Dense Monocular SLAMwith Neural Radiance Fields (0)	2024.04.27
[논문 리뷰] DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras (0)	2024.04.26
[PaperReview] Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather (0)	2024.03.31

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

LiDARism

[논문리뷰] Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

Introduction & Limitation of previous methods

Proposed method

Experiment

'AI > Paper Review' 카테고리의 다른 글

'AI/Paper Review'의 다른글

티스토리툴바

[논문리뷰] Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

Introduction & Limitation of previous methods

Proposed method

Experiment

'AI > Paper Review' 카테고리의 다른 글

'AI/Paper Review'의 다른글

관련글

티스토리툴바