具身智能基础路线

参考地址：https://www.bilibili.com/video/BV1d5ukedEsi/github地址：https://github.com/yunlongdong/Awesome-Embodied-AI

大妞

1092人浏览 · 2024-05-31 09:56:28

大妞 · 2024-05-31 09:56:28 发布

参考地址：https://www.bilibili.com/video/BV1d5ukedEsi/

github地址：https://github.com/yunlongdong/Awesome-Embodied-AI

Scene Understanding

Image

	Description	Paper	Code
SAM	Segmentation	https://arxiv.org/abs/2304.02643	GitHub - facebookresearch/segment-anything: The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
YOLO-World	Open-Vocabulary Detection	https://arxiv.org/abs/2401.17270	GitHub - AILab-CVC/YOLO-World: [CVPR 2024] Real-Time Open-Vocabulary Object Detection

Segment anything(SAM)论文及demo使用保姆级教程-CSDN博客

Point Cloud

	Description	Paper	Code
SAM3D	Segmentation	https://arxiv.org/abs/2306.03908	GitHub - Pointcept/SegmentAnything3D: [ICCV'23 Workshop] SAM3D: Segment Anything in 3D Scenes
PointMixer	Understanding	https://arxiv.org/abs/2401.17270	GitHub - LifeBeyondExpectations/PointMixer

Multi-Modal Grounding

	Description	Paper	Code
GPT4V	MLM(Image+Language->Language)	https://arxiv.org/abs/2303.08774
Claude3-Opus	MLM(Image+Language->Language)	Introducing the next generation of Claude \ Anthropic
GLaMM	Pixel Grounding	https://arxiv.org/abs/2311.03356	GitHub - mbzuai-oryx/groundingLMM: [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
All-Seeing	Pixel Grounding	https://arxiv.org/abs/2402.19474	GitHub - OpenGVLab/all-seeing: [ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
LEO	3D	https://arxiv.org/abs/2311.12871	GitHub - embodied-generalist/embodied-generalist: [ICML 2024] Official code repository for 3D embodied generalist agent LEO

ICML'24开源 | LEO：首个三维世界中的具身通用智能体 - 哔哩哔哩

Data Collection

From Video

	Description	Paper	Code
Vid2Robot		https://vid2robot.github.io/vid2robot.pdf
RT-Trajectory		https://arxiv.org/abs/2311.01977
MimicPlay		https://mimic-play.github.io/assets/MimicPlay.pdf	GitHub - j96w/MimicPlay: "MimicPlay: Long-Horizon Imitation Learning by Watching Human Play" code repository

Hardware

	Description	Paper	Code
UMI	Two-Fingers	https://arxiv.org/abs/2402.10329	GitHub - real-stanford/universal_manipulation_interface: Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
DexCap	Five-Fingers	https://dex-cap.github.io/assets/DexCap_paper.pdf	GitHub - j96w/DexCap: [RSS 2024] "DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation" code repository
HIRO Hand	Hand-over-hand	https://sites.google.com/view/hiro-hand

Generative Simulation

	Description	Paper	Code
MimicGen		https://arxiv.org/abs/2310.17596	GitHub - NVlabs/mimicgen_environments: This code corresponds to simulation environments used as part of the MimicGen project.
RoboGen		https://arxiv.org/abs/2311.01455	GitHub - Genesis-Embodied-AI/RoboGen: A generative and self-guided robotic agent that endlessly propose and master new skills.

Action Output

动作规划

Generative Imitation Learning

	Description	Paper	Code
Diffusion Policy		https://arxiv.org/abs/2303.04137	GitHub - real-stanford/diffusion_policy: [RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
ACT		https://arxiv.org/abs/2304.13705	GitHub - tonyzhaozh/act

ffordance Map

	Description	Paper	Code
CLIPort	Pick&Place	https://arxiv.org/pdf/2109.12098.pdf	GitHub - cliport/cliport: CLIPort: What and Where Pathways for Robotic Manipulation
Robo-Affordances	Contact&Post-contact trajectories	https://arxiv.org/abs/2304.08488	GitHub - shikharbahl/vrb
Robo-ABC		https://arxiv.org/abs/2401.07487	GitHub - TEA-Lab/Robo-ABC: This is the official repository of "Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation"
Where2Explore	Few shot learning from semantic similarity	https://proceedings.neurips.cc/paper_files/paper/2023/file/0e7e2af2e5ba822c9ad35a37b31b5dd4-Paper-Conference.pdf
Move as You Say, Interact as You Can	Affordance to motion from diffusion model	https://arxiv.org/pdf/2403.18036.pdf
AffordanceLLM	Grounding affordance with LLM	https://arxiv.org/pdf/2401.06341.pdf
Environment-aware Affordance		https://proceedings.neurips.cc/paper_files/paper/2023/file/bf78fc727cf882df66e6dbc826161e86-Paper-Conference.pdf
OpenAD	Open-Voc Affordance Detection from point cloud	https://www.csc.liv.ac.uk/~anguyen/assets/pdfs/2023_OpenAD.pdf	GitHub - Fsoft-AIC/Open-Vocabulary-Affordance-Detection-in-3D-Point-Clouds: [IROS 2023] Open-Vocabulary Affordance Detection in 3d Point Clouds
RLAfford	End-to-End affordance learning with RL	https://gengyiran.github.io/pdf/RLAfford.pdf
General Flow	Collect affordance from video	https://general-flow.github.io/general_flow.pdf	GitHub - michaelyuancb/general_flow: Repository for "General Flow as Foundation Affordance for Scalable Robot Learning"
PreAffordance	Pre-grasping planning	https://arxiv.org/pdf/2404.03634.pdf
ScenFun3d	Fine-grained functionality&affordance in 3D scene	https://aycatakmaz.github.io/data/SceneFun3D-preprint.pdf	GitHub - SceneFun3D/scenefun3d: SceneFun3D ToolKit

Question&Answer from LLM

	Description	Paper	Code
COPA		https://arxiv.org/abs/2403.08248
ManipLLM		https://arxiv.org/abs/2312.16217
ManipVQA		https://arxiv.org/pdf/2403.11289.pdf	GitHub - SiyuanHuang95/ManipVQA: ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

Language Corrections

	Description	Paper	Code
OLAF		https://arxiv.org/pdf/2310.17555
YAYRobot		https://arxiv.org/abs/2403.12910	GitHub - yay-robot/yay_robot: PyTorch implementation of YAY Robot

Planning from LLM

	Description	Paper	Code
SayCan	API Level	https://arxiv.org/abs/2204.01691	google-research/saycan at master · google-research/google-research · GitHub
VILA	Prompt Level	https://arxiv.org/abs/2311.17842

DAMO开发者矩阵

DAMO开发者矩阵，由阿里巴巴达摩院和中国互联网协会联合发起，致力于探讨最前沿的技术趋势与应用成果，搭建高质量的交流与分享平台，推动技术创新与产业应用链接，围绕“人工智能与新型计算”构建开放共享的开发者生态。

更多推荐

cover

KingbaseES数据库：医疗信创潮涌沅江，国产化信创打造医疗新质生产力

DAMO开发者矩阵

cover

告别迁移焦虑：金仓KDMS V4带你轻松搞定数据库国产化替代

DAMO开发者矩阵

cover

KingbaseES数据库在常德二院全栈国产化信创中产生新质生产力,医疗信创的部署如何实现，如何操作？

DAMO开发者矩阵

所有评论(0)

查看更多评论

大妞

已为社区贡献2条内容