智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

WOC: A Handy Webcam-based 3D Online Chatroom

Chuanhang Yan , Yu Sun , Qian Bao , Jinhui Pang , Wu Liu , Tao Mei

分类：计算机视觉

2022-09-02

我们开发了WOC，这是一个基于网络摄像头的3D虚拟在线聊天室，用于多人交互，该聊天介绍了用户的3D运动，并实时驱动其单独的3D虚拟化头像。与现有的基于可穿戴设备的解决方案相比，WOC使用单个相机提供方便和低成本的3D运动捕获。为了促进身临其境的聊天体验，WOC提供了高保真虚拟化的化身操纵，这也支持用户定义的字符。使用分布式数据流服务，系统为所有用户提供高度同步的运动和声音。部署在网站上，无需安装，用户可以在https://yanch.cloud上自由体验虚拟在线聊天。

translated by 谷歌翻译

HTML版本

Global Planning for Contact-Rich Manipulation via Local Smoothing of Quasi-dynamic Contact Models

Tao Pang , H. J. Terry Suh , Lujie Yang , Russ Tedrake

分类：机器人

2022-06-22

增强学习（RL）在接触式操纵中的经验成功（RL）从基于模型的角度来理解了很多待理解，其中关键困难通常归因于（i）触点模式的爆炸，（ii）僵硬，非平滑接触动力学和由此产生的爆炸 /不连续梯度，以及（iii）计划问题的非转换性。 RL的随机性质通过有效采样和平均接触模式来解决（i）和（ii）。另一方面，基于模型的方法通过分析平滑接触动力学来解决相同的挑战。我们的第一个贡献是建立两种方法的简单系统方法的理论等效性，并在许多复杂示例上提供定性和经验的等效性。为了进一步减轻（II），我们的第二个贡献是凸面的凸面，可区分和准动力的触点动力学表述，这两个方案都可以平滑方案，并且通过实验证明了对接触富含接触的计划非常有效。我们的最终贡献解决了（III），在其中我们表明，当通过平滑度抽取接触模式时，基于经典的运动计划算法在全球计划中可以有效。将我们的方法应用于具有挑战性的接触式操纵任务的集合中，我们证明了基于模型的有效运动计划可以实现与RL相当的结果，而计算却大大较少。视频：https：//youtu.be/12ew4xc-vwa

translated by 谷歌翻译

CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

Chinedu Innocent Nwoye , Deepak Alapatt , Tong Yu , Armine Vardazaryan , Fangfang Xia , Zixuan Zhao , Tong Xia , Fucang Jia , Yuxuan Yang , Hao Wang

分类：计算机视觉

2022-04-10

Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.

translated by 谷歌翻译

SEED: Series Elastic End Effectors in 6D for Visuotactile Tool Use

H. J. Terry Suh , Naveen Kuppuswamy , Tao Pang , Paul Mitiguy , Alex Alspach , Russ Tedrake

分类：机器人

2021-11-02

我们提出了6D（种子）中系列弹性末端效应器的框架，其将空间兼容的元素结合在粘合性感觉中，以掌握和操纵野外的工具。我们的框架将串联弹性的益处推广到6- DOF，同时提供使用粘液触觉感测的控制抽象。我们提出了一种用于粘合性感测的相对姿势估计的算法，以及能够实现与环境的稳定力相互作用的空间混合力力位置控制器。我们展示了我们对需要监管空间力量的工具的效果。视频链接：https://youtu.be/2-yuifspdrk

translated by 谷歌翻译

GSPMD: General and Scalable Parallelization for ML Computation Graphs

Yuanzhong Xu , HyoukJoong Lee , Dehao Chen , Blake Hechtman , Yanping Huang , Rahul Joshi , Maxim Krikun , Dmitry Lepikhin , Andy Ly , Marcello Maggioni

分类：机器学习

2021-05-10

我们呈现GSPMD，一种用于公共机器学习计算的自动，基于编译的并行化系统。它允许用户以与单个设备的方式相同的方式编写程序，然后通过关于如何分发Tensors的一些注释来提供提示，基于哪个GSPMD将并行化计算。其分区的表示简单尚不一般，允许它在各种模型上表达并行性的不同或混合范式。GSPMD基于有限的用户注释为每个运算符的分区Inventing，使得缩放现有的单设备程序方便。它解决了生产使用的几种技术挑战，允许GSPMD实现50％至62％的计算利用率，用于高达2048个云TPUv3核心，适用于高达1万亿参数的模型。

translated by 谷歌翻译

Risk-Averse MDPs under Reward Ambiguity

Haolin Ruan , Zhi Chen , Chin Pang Ho

分类：机器学习

2023-01-03

We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.

translated by 谷歌翻译

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

Sirui Zhao , Huaying Tang , Xinglong Mao , Shifeng Liu , Hanqing Tao , Hao Wang , Tong Xu , Enhong Chen

分类：计算机视觉

2023-01-03

As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.

translated by 谷歌翻译

EZInterviewer: To Improve Job Interview Performance with Mock Interview Generator

Mingzhe Li , Xiuying Chen , Weiheng Liao , Yang Song , Tao Zhang , Dongyan Zhao , Rui Yan

分类：自然语言处理

2023-01-03

Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.

translated by 谷歌翻译

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

Xiangtai Li , Shilin Xu , Yibo Yang , Haobo Yuan , Guangliang Cheng , Yunhai Tong , Zhouchen Lin , Dacheng Tao

分类：计算机视觉

2023-01-03

Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.

translated by 谷歌翻译