参考地址:https://www.bilibili.com/video/BV1d5ukedEsi/

github地址:https://github.com/yunlongdong/Awesome-Embodied-AI

Scene Understanding

Image

Description Paper Code
SAM Segmentation https://arxiv.org/abs/2304.02643 GitHub - facebookresearch/segment-anything: The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
YOLO-World Open-Vocabulary Detection https://arxiv.org/abs/2401.17270 GitHub - AILab-CVC/YOLO-World: [CVPR 2024] Real-Time Open-Vocabulary Object Detection

Segment anything(SAM)论文及demo使用保姆级教程-CSDN博客

Point Cloud

Description Paper Code
SAM3D Segmentation https://arxiv.org/abs/2306.03908 GitHub - Pointcept/SegmentAnything3D: [ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes
PointMixer Understanding https://arxiv.org/abs/2401.17270 GitHub - LifeBeyondExpectations/PointMixer

Multi-Modal Grounding

Description Paper Code
GPT4V MLM(Image+Language->Language) https://arxiv.org/abs/2303.08774
Claude3-Opus MLM(Image+Language->Language) Introducing the next generation of Claude \ Anthropic
GLaMM Pixel Grounding https://arxiv.org/abs/2311.03356 GitHub - mbzuai-oryx/groundingLMM: [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
All-Seeing Pixel Grounding https://arxiv.org/abs/2402.19474 GitHub - OpenGVLab/all-seeing: [ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
LEO 3D https://arxiv.org/abs/2311.12871 GitHub - embodied-generalist/embodied-generalist: [ICML 2024] Official code repository for 3D embodied generalist agent LEO

ICML'24开源 | LEO:首个三维世界中的具身通用智能体 - 哔哩哔哩

Data Collection

From Video

Description Paper Code
Vid2Robot https://vid2robot.github.io/vid2robot.pdf
RT-Trajectory https://arxiv.org/abs/2311.01977
MimicPlay https://mimic-play.github.io/assets/MimicPlay.pdf GitHub - j96w/MimicPlay: "MimicPlay: Long-Horizon Imitation Learning by Watching Human Play" code repository

Hardware

Description Paper Code
UMI Two-Fingers https://arxiv.org/abs/2402.10329 GitHub - real-stanford/universal_manipulation_interface: Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
DexCap Five-Fingers https://dex-cap.github.io/assets/DexCap_paper.pdf GitHub - j96w/DexCap: [RSS 2024] "DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation" code repository
HIRO Hand Hand-over-hand https://sites.google.com/view/hiro-hand

Generative Simulation

Description Paper Code
MimicGen https://arxiv.org/abs/2310.17596 GitHub - NVlabs/mimicgen_environments: This code corresponds to simulation environments used as part of the MimicGen project.
RoboGen https://arxiv.org/abs/2311.01455 GitHub - Genesis-Embodied-AI/RoboGen: A generative and self-guided robotic agent that endlessly propose and master new skills.

Action Output

动作规划

Generative Imitation Learning

Description Paper Code
Diffusion Policy https://arxiv.org/abs/2303.04137 GitHub - real-stanford/diffusion_policy: [RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
ACT https://arxiv.org/abs/2304.13705 GitHub - tonyzhaozh/act

ffordance Map

Description Paper Code
CLIPort Pick&Place https://arxiv.org/pdf/2109.12098.pdf GitHub - cliport/cliport: CLIPort: What and Where Pathways for Robotic Manipulation
Robo-Affordances Contact&Post-contact trajectories https://arxiv.org/abs/2304.08488 GitHub - shikharbahl/vrb
Robo-ABC https://arxiv.org/abs/2401.07487 GitHub - TEA-Lab/Robo-ABC: This is the official repository of "Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation"
Where2Explore Few shot learning from semantic similarity https://proceedings.neurips.cc/paper_files/paper/2023/file/0e7e2af2e5ba822c9ad35a37b31b5dd4-Paper-Conference.pdf
Move as You Say, Interact as You Can Affordance to motion from diffusion model https://arxiv.org/pdf/2403.18036.pdf
AffordanceLLM Grounding affordance with LLM https://arxiv.org/pdf/2401.06341.pdf
Environment-aware Affordance https://proceedings.neurips.cc/paper_files/paper/2023/file/bf78fc727cf882df66e6dbc826161e86-Paper-Conference.pdf
OpenAD Open-Voc Affordance Detection from point cloud https://www.csc.liv.ac.uk/~anguyen/assets/pdfs/2023_OpenAD.pdf GitHub - Fsoft-AIC/Open-Vocabulary-Affordance-Detection-in-3D-Point-Clouds: [IROS 2023] Open-Vocabulary Affordance Detection in 3d Point Clouds
RLAfford End-to-End affordance learning with RL https://gengyiran.github.io/pdf/RLAfford.pdf
General Flow Collect affordance from video https://general-flow.github.io/general_flow.pdf GitHub - michaelyuancb/general_flow: Repository for "General Flow as Foundation Affordance for Scalable Robot Learning"
PreAffordance Pre-grasping planning https://arxiv.org/pdf/2404.03634.pdf
ScenFun3d Fine-grained functionality&affordance in 3D scene https://aycatakmaz.github.io/data/SceneFun3D-preprint.pdf GitHub - SceneFun3D/scenefun3d: SceneFun3D ToolKit

Question&Answer from LLM

Description Paper Code
COPA https://arxiv.org/abs/2403.08248
ManipLLM https://arxiv.org/abs/2312.16217
ManipVQA https://arxiv.org/pdf/2403.11289.pdf GitHub - SiyuanHuang95/ManipVQA: ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

Language Corrections

Description Paper Code
OLAF https://arxiv.org/pdf/2310.17555
YAYRobot https://arxiv.org/abs/2403.12910 GitHub - yay-robot/yay_robot: PyTorch implementation of YAY Robot

Planning from LLM

Description Paper Code
SayCan API Level https://arxiv.org/abs/2204.01691 google-research/saycan at master · google-research/google-research · GitHub
VILA Prompt Level https://arxiv.org/abs/2311.17842

Logo

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐