[PaperReview] UniSeg: A Unified Multi-Modal LiDAR Segmentation Networkand the OpenPCSeg Codebase

AI/Paper Review

[PaperReview] UniSeg: A Unified Multi-Modal LiDAR Segmentation Networkand the OpenPCSeg Codebase

CVMaster 2024. 3. 30. 01:38

UniSeg: A Unified Multi-Modal LiDAR Segmentation Networkand the OpenPCSeg Codebase

Introduction

색이나 texture 표현이 부족한 point cloud data를 위해서 RGB 이미지를 기용하고자 함
Sensor fusion을 위해 Learnable cross-Modal Association (LMA) module, Learnable cross-View Association module (LVA)을 제안
Pointcloud processing은 크게 세가지 형식으로 나뉜다.
- Point-view : pointcloud를 그대로 사용. 하지만 느린 연산과 neighboring point의 data를 잘 활용하지는 못하게 된다.
- Voxel-view : voxelization/rasterization을 활용. 하지만 voxelizing data loss가 존재.
- Range-view : 아예 pointcloud로 2D image를 만들어버린다. 심각한 data loss.

Codebase

MM3D와 같은 segmentation codebase가 그동안 indoor method만 있어서, 여기서는 새로 구현해서 코드를 공개했다고 한다. 이전에도 OpenPCDet이라는 codebase를 제공하기는 했었나보다.
- https://github.com/PJLab-ADG/OpenPCSeg
- https://github.com/open-mmlab/OpenPCDet

Methods

전체 흐름은 다음과 같다. 각 view와 data에 따른 encoder를 놓고 그로부터 나온 feature를 조율하는 방식
- 이런 거라면 그냥 아예 pretrained encoder들로 운용하고 finetune만 한다음에 사용하는게 좋을 듯???

the LMA module은 the voxel-image fusion & range-image fusion을 위해 존재
the LVA module은 range-point-voxel fusion을 위해 존재
Learnable Cross-Modal Association
- Point - Image correspondence는 다음과 같이 camera callibration matrix로 수행한다.
  - 이렇게 하면 range image로 다른 point들 버리는 것 보다는 나을 거 같다. 물론 그래도 중복되는 data들은 항상 나오겠지만
- Voxel-Image correspondence는 voxel 중심에 대해서 왼쪽에서와 같이 image projection을 해서 image pixel offset을 구하고, 우측 식과 같이 image feature와 voxel feature를 fusion한다.
- 아래의 왼쪽 그림과 같이 Query로 voxel feature를 넣고 Key, value로 image feature를 넣어서 multihead attention을 수행한다.

Learnable Cross-View Association
- voxel2point, range2point : point보다 voxel과 range가 개수가 적으므로, trilinear/billinear interpolation을 통해서 대응되지 않는 point를 매꾼다.
- 그렇게 concatenate을 한 후, 이후에는 아래 식으로 fusion을 한다. 그리고 point feature를 통해서 residual sum을 한다.
- 마지막에는 point2voxel, point2range를 한다.

이후 semantic segmentation을 하고, 그 prediction을 활용해서 다시 panoptic segmentation을 한다.
loss는 다음과 같이 정한다.

Experiments

다른 Sensor fusion method나 일반 pointcloud method보다 더 좋은 성능을 보인다는 것을 알 수 있다.

Panoptic segmentation도 더 잘한다.

Pruning을 한 결과도 no fusion method보다 나은 것을 볼 수 있다. 그리고 다른 sensor fusion method보다 더 나은 결과를 보이는 것을 알 수 있다. 놀랍게도 0.2배로 channel을 pruning한 것을 사용해도 다른 fusion method보다 좋았다고 한다.

Unimodal보다 multimodal의 성능이 더 좋다는 것을 보였다.

다음은 Qualitative results

저작자표시 비영리 변경금지

'AI > Paper Review' 카테고리의 다른 글

[PaperReview] CycleGAN: Unpaired Image-to-Image Translationusing Cycle-Consistent Adversarial Networks (0)	2024.03.31
[PaperReview] PatchmatchNet: Learned Multi-View Patchmatch Stereo (0)	2024.03.30
[Paper Review] GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models (0)	2023.12.22
[Paper Review] Transformer Interpretability Beyond Attention Visualization (0)	2023.12.19
[Paper Review] 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds (0)	2023.12.19

현재글[PaperReview] UniSeg: A Unified Multi-Modal LiDAR Segmentation Networkand the OpenPCSeg Codebase

3D Computer Vision Researcher

2ndODE, 1stODE, 공대생, 요점정리, Ode, 정리, 기계공학, 서평 #지능의사생활 #가나자와사토시 #웅진지식하우스, 진화심리학, 데이비드버스, 사이언스북스, 공업수학, 욕망의진화, 요약, 서평, OrdinaryDifferentialEquation, 2차미분방정식, 성전략,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

개발 창고