The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Causal chain reasoning (CCR) is an essential ability for many decision-making AI systems, which requires the model to build reliable causal chains by connecting causal pairs. However, CCR suffers from two main transitive problems: threshold effect and scene drift. In other words, the causal pairs to be spliced may have a conflicting threshold boundary or scenario. To address these issues, we propose a novel Reliable Causal chain reasoning framework~(ReCo), which introduces exogenous variables to represent the threshold and scene factors of each causal pair within the causal chain, and estimates the threshold and scene contradictions across exogenous variables via structural causal recurrent neural networks~(SRNN). Experiments show that ReCo outperforms a series of strong baselines on both Chinese and English CCR datasets. Moreover, by injecting reliable causal chain knowledge distilled by ReCo, BERT can achieve better performances on four downstream causal-related tasks than BERT models enhanced by other kinds of knowledge.
translated by 谷歌翻译
Human modeling and relighting are two fundamental problems in computer vision and graphics, where high-quality datasets can largely facilitate related research. However, most existing human datasets only provide multi-view human images captured under the same illumination. Although valuable for modeling tasks, they are not readily used in relighting problems. To promote research in both fields, in this paper, we present UltraStage, a new 3D human dataset that contains more than 2K high-quality human assets captured under both multi-view and multi-illumination settings. Specifically, for each example, we provide 32 surrounding views illuminated with one white light and two gradient illuminations. In addition to regular multi-view images, gradient illuminations help recover detailed surface normal and spatially-varying material maps, enabling various relighting applications. Inspired by recent advances in neural representation, we further interpret each example into a neural human asset which allows novel view synthesis under arbitrary lighting conditions. We show our neural human assets can achieve extremely high capture performance and are capable of representing fine details such as facial wrinkles and cloth folds. We also validate UltraStage in single image relighting tasks, training neural networks with virtual relighted data from neural assets and demonstrating realistic rendering improvements over prior arts. UltraStage will be publicly available to the community to stimulate significant future developments in various human modeling and rendering tasks.
translated by 谷歌翻译
Monocular depth estimation is a challenging problem on which deep neural networks have demonstrated great potential. However, depth maps predicted by existing deep models usually lack fine-grained details due to the convolution operations and the down-samplings in networks. We find that increasing input resolution is helpful to preserve more local details while the estimation at low resolution is more accurate globally. Therefore, we propose a novel depth map fusion module to combine the advantages of estimations with multi-resolution inputs. Instead of merging the low- and high-resolution estimations equally, we adopt the core idea of Poisson fusion, trying to implant the gradient domain of high-resolution depth into the low-resolution depth. While classic Poisson fusion requires a fusion mask as supervision, we propose a self-supervised framework based on guided image filtering. We demonstrate that this gradient-based composition performs much better at noisy immunity, compared with the state-of-the-art depth map fusion method. Our lightweight depth fusion is one-shot and runs in real-time, making our method 80X faster than a state-of-the-art depth fusion method. Quantitative evaluations demonstrate that the proposed method can be integrated into many fully convolutional monocular depth estimation backbones with a significant performance boost, leading to state-of-the-art results of detail enhancement on depth maps.
translated by 谷歌翻译
Recent years we have witnessed rapid development in NeRF-based image rendering due to its high quality. However, point clouds rendering is somehow less explored. Compared to NeRF-based rendering which suffers from dense spatial sampling, point clouds rendering is naturally less computation intensive, which enables its deployment in mobile computing device. In this work, we focus on boosting the image quality of point clouds rendering with a compact model design. We first analyze the adaption of the volume rendering formulation on point clouds. Based on the analysis, we simplify the NeRF representation to a spatial mapping function which only requires single evaluation per pixel. Further, motivated by ray marching, we rectify the the noisy raw point clouds to the estimated intersection between rays and surfaces as queried coordinates, which could avoid \textit{spatial frequency collapse} and neighbor point disturbance. Composed of rasterization, spatial mapping and the refinement stages, our method achieves the state-of-the-art performance on point clouds rendering, outperforming prior works by notable margins, with a smaller model size. We obtain a PSNR of 31.74 on NeRF-Synthetic, 25.88 on ScanNet and 30.81 on DTU. Code and data are publicly available at https://github.com/seanywang0408/RadianceMapping.
translated by 谷歌翻译
点对特征(PPF)广泛用于6D姿势估计。在本文中,我们提出了一种基于PPF框架的有效的6D姿势估计方法。我们介绍了一个目标良好的下采样策略,该策略更多地集中在边缘区域,以有效地提取复杂的几何形状。提出了一种姿势假设验证方法来通过计算边缘匹配度来解决对称歧义。我们对两个具有挑战性的数据集和一个现实世界中收集的数据集进行评估,这证明了我们方法对姿势估计几何复杂,遮挡,对称对象的优越性。我们通过将其应用于模拟穿刺来进一步验证我们的方法。
translated by 谷歌翻译
在无监督的域适应性(UDA)中,直接从源到目标域的适应通常会遭受明显的差异,并导致对齐不足。因此,许多UDA的作品试图通过各种中间空间逐渐和轻柔地消失域间隙,这些空间被称为域桥接(DB)。但是,对于诸如域自适应语义分割(DASS)之类的密集预测任务,现有的解决方案主要依赖于粗糙的样式转移以及如何优雅地桥接域的优雅桥梁。在这项工作中,我们诉诸于数据混合以建立用于DASS的经过经过经过经过讨论的域桥接(DDB),通过该域的源和目标域的联合分布与中间空间中的每个分布进行对齐并与每个分布。 DDB的核心是双路径域桥接步骤,用于使用粗糙和精细的数据混合技术生成两个中间域,以及一个跨路径知识蒸馏步骤,用于对两个互补模型进行对生成的中间样品进行培训的互补模型作为“老师”以多教老师的蒸馏方式发展出色的“学生”。这两个优化步骤以交替的方式工作,并相互加强以具有强大的适应能力引起DDB。对具有不同设置的自适应分割任务进行的广泛实验表明,我们的DDB显着优于最先进的方法。代码可从https://github.com/xiaoachen98/ddb.git获得。
translated by 谷歌翻译
COBB角度估计的自动化方法对脊柱侧弯评估的需求很高。现有方法通常从地标估计计算COBB角度,或者简单地将低级任务(例如,地标检测和脊柱分段)与Cobb角度回归任务相结合,而无需完全互相探索彼此的好处。在这项研究中,我们提出了一个名为SEG4REG+的新型多任务框架,该框架共同优化了分割和回归网络。我们对彼此之间的本地和全球一致性和知识转移进行彻底研究。具体而言,我们提出了一个关注正规化模块,利用了从图像分段对的类激活图(CAM),以在回归网络中发现其他监督,并且CAM可以用作利益区域的增强门,以促进分段任务进而促进分段任务。 。同时,我们设计了一种新颖的三角一致性学习,以共同训练两个网络以进行全球优化。对公共AASCE挑战数据集进行的评估证明了每个模块的有效性以及我们的模型与最先进方法的卓越性能。
translated by 谷歌翻译
我们研究基于{\ em本地培训(LT)}范式的分布式优化方法:通过在参数平均之前对客户进行基于本地梯度的培训来实现沟通效率。回顾田地的进度,我们{\ em识别5代LT方法}:1)启发式,2)均匀,3)sublinear,4)线性和5)加速。由Mishchenko,Malinovsky,Stich和Richt \'{A} Rik(2022)发起的5 $ {}^{\ rm th} $生成,由Proxskip方法发起通信加速机制。受到最近进度的启发,我们通过证明可以使用{\ em差异}进一步增强它们,为5 $ {}^{\ rm th} $生成LT方法的生成。尽管LT方法的所有以前的所有理论结果都完全忽略了本地工作的成本,并且仅根据交流回合的数量而被构成,但我们证明我们的方法在{\ em总培训成本方面都比{\ em em总培训成本}大得多当本地计算足够昂贵时,在制度中的理论和实践中,最先进的方法是proxskip。我们从理论上表征了这个阈值,并通过经验结果证实了我们的理论预测。
translated by 谷歌翻译
超声(US)广泛用于实时成像,无辐射和便携性的优势。在临床实践中,分析和诊断通常依赖于美国序列,而不是单个图像来获得动态的解剖信息。对于新手来说,这是一项挑战,因为使用患者的足够视频进行练习是临床上不可行的。在本文中,我们提出了一个新颖的框架,以综合高保真美国视频。具体而言,合成视频是通过基于给定驾驶视频的动作来动画源内容图像来生成的。我们的亮点是三倍。首先,利用自我监督学习的优势,我们提出的系统以弱监督的方式进行了培训,以进行关键点检测。然后,这些关键点为处理美国视频中的复杂动态动作提供了重要信息。其次,我们使用双重解码器将内容和纹理学习解除,以有效地减少模型学习难度。最后,我们采用了对抗性训练策略,并采用了GAN损失,以进一步改善生成的视频的清晰度,从而缩小了真实和合成视频之间的差距。我们在具有高动态运动的大型内部骨盆数据集上验证我们的方法。广泛的评估指标和用户研究证明了我们提出的方法的有效性。
translated by 谷歌翻译