The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
图异常检测(GAD)是至关重要的任务,因为即使有一些异常也可能对良性用户构成巨大威胁。最近可以有效利用可用标签作为先验知识的半监督GAD方法比无监督的方法实现了卓越的性能。实际上,人们通常需要在新(子)图上识别异常以确保其业务,但他们可能缺乏培训有效检测模型的标签。一个自然的想法是将经过训练的GAD模型直接在新的(子)图中进行测试。但是,我们发现现有的半监督GAD方法遇到了不良的概括问题,即训练有素的模型无法在同一图的看不见的区域(即无法在培训中无法访问)上表现良好。这可能会造成极大的麻烦。在本文中,我们以这种现象为基础,并提出了广义图异常检测的一般研究问题,旨在有效地识别训练域图和看不见的测试图,以消除潜在的危险。然而,这是一项具有挑战性的任务,因为只有有限的标签可用,并且正常背景在培训和测试数据之间可能有所不同。因此,我们提出了一个名为\ textit {augan}(\ uline {augan}的数据增强方法,用于\ uline {a} nomaly和\ uline {n} ormal分布),以丰富培训数据并促进GAD模型的普遍性。实验验证了我们方法在改善模型推广性方面的有效性。
translated by 谷歌翻译
细颗粒实体打字(FET)旨在推断本文中提及的特定语义类型。 FET的现代方法主要集中于学习某种类型的外观。很少有作品直接建模类型差异,也就是说,让模型知道一种类型与其他类型不同的程度。为了减轻这个问题,我们提出了一种富含类型的FET的分层对比策略。我们的方法可以直接建模层次类型之间的差异,并提高区分多元类似类型的能力。一方面,我们将类型嵌入到实体上下文中,以使类型的信息直接感知。另一方面,我们在层次结构上设计了一个约束的对比策略,以直接建模类型差异,这可以同时感知不同粒度下类型之间的区分性。 BBN,Ontonotes和Figer的三个基准测试的实验结果表明,我们的方法通过有效建模类型差异在FET上实现了显着性能。
translated by 谷歌翻译
回归学习是经典的,是医学图像分析的基础。它为许多关键应用程序提供了连续的映射,例如属性估计,对象检测,分割和非刚性注册。但是,先前的研究主要以案例标准(如均方误差)为优化目标。他们忽略了非常重要的人口相关标准,这正是许多任务中的最终评估指标。在这项工作中,我们建议通过有关直接优化细粒相关损失的新型研究来重新审视经典回归任务。我们主要探索两个互补相关索引作为可学习的损失:Pearson线性相关(PLC)和Spearman等级相关性(SRC)。本文的贡献是两个折叠。首先,对于全球层面的PLC,我们提出了一项策略,以使其对异常值进行强大的态度并规范关键分布因素。这些努力显着稳定学习并扩大了PLC的功效。其次,对于本地级别的SRC,我们提出了一种粗到精细的方案,以减轻样品之间确切排名顺序的学习。具体而言,我们将样本排名的学习转换为样本之间相似关系的学习。我们在两个典型的超声图像回归任务上广泛验证了我们的方法,包括图像质量评估和生物措施测量。实验证明,通过直接优化相关性的细粒度指导,回归性能得到显着提高。我们提出的相关性损失是一般的,可以扩展到更重要的应用程序。
translated by 谷歌翻译
With the rapid development of artificial intelligence (AI) in medical image processing, deep learning in color fundus photography (CFP) analysis is also evolving. Although there are some open-source, labeled datasets of CFPs in the ophthalmology community, large-scale datasets for screening only have labels of disease categories, and datasets with annotations of fundus structures are usually small in size. In addition, labeling standards are not uniform across datasets, and there is no clear information on the acquisition device. Here we release a multi-annotation, multi-quality, and multi-device color fundus image dataset for glaucoma analysis on an original challenge -- Retinal Fundus Glaucoma Challenge 2nd Edition (REFUGE2). The REFUGE2 dataset contains 2000 color fundus images with annotations of glaucoma classification, optic disc/cup segmentation, as well as fovea localization. Meanwhile, the REFUGE2 challenge sets three sub-tasks of automatic glaucoma diagnosis and fundus structure analysis and provides an online evaluation framework. Based on the characteristics of multi-device and multi-quality data, some methods with strong generalizations are provided in the challenge to make the predictions more robust. This shows that REFUGE2 brings attention to the characteristics of real-world multi-domain data, bridging the gap between scientific research and clinical application.
translated by 谷歌翻译
Learning on Graphs (LoG) is widely used in multi-client systems when each client has insufficient local data, and multiple clients have to share their raw data to learn a model of good quality. One scenario is to recommend items to clients with limited historical data and sharing similar preferences with other clients in a social network. On the other hand, due to the increasing demands for the protection of clients' data privacy, Federated Learning (FL) has been widely adopted: FL requires models to be trained in a multi-client system and restricts sharing of raw data among clients. The underlying potential data-sharing conflict between LoG and FL is under-explored and how to benefit from both sides is a promising problem. In this work, we first formulate the Graph Federated Learning (GFL) problem that unifies LoG and FL in multi-client systems and then propose sharing hidden representation instead of the raw data of neighbors to protect data privacy as a solution. To overcome the biased gradient problem in GFL, we provide a gradient estimation method and its convergence analysis under the non-convex objective. In experiments, we evaluate our method in classification tasks on graphs. Our experiment shows a good match between our theory and the practice.
translated by 谷歌翻译
Unsupervised pre-training on millions of digital-born or scanned documents has shown promising advances in visual document understanding~(VDU). While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far. A document textline usually contains words that are spatially and semantically correlated, which can be easily obtained from OCR engines. In this paper, we propose Wukong-Reader, trained with new pre-training objectives to leverage the structural knowledge nested in document textlines. We introduce textline-region contrastive learning to achieve fine-grained alignment between the visual regions and texts of document textlines. Furthermore, masked region modeling and textline-grid matching are also designed to enhance the visual and layout representations of textlines. Experiments show that our Wukong-Reader has superior performance on various VDU tasks such as information extraction. The fine-grained alignment over textlines also empowers Wukong-Reader with promising localization ability.
translated by 谷歌翻译
Both goal-agnostic and goal-oriented tasks have practical value for robotic grasping: goal-agnostic tasks target all objects in the workspace, while goal-oriented tasks aim at grasping pre-assigned goal objects. However, most current grasping methods are only better at coping with one task. In this work, we propose a bifunctional push-grasping synergistic strategy for goal-agnostic and goal-oriented grasping tasks. Our method integrates pushing along with grasping to pick up all objects or pre-assigned goal objects with high action efficiency depending on the task requirement. We introduce a bifunctional network, which takes in visual observations and outputs dense pixel-wise maps of Q values for pushing and grasping primitive actions, to increase the available samples in the action space. Then we propose a hierarchical reinforcement learning framework to coordinate the two tasks by considering the goal-agnostic task as a combination of multiple goal-oriented tasks. To reduce the training difficulty of the hierarchical framework, we design a two-stage training method to train the two types of tasks separately. We perform pre-training of the model in simulation, and then transfer the learned model to the real world without any additional real-world fine-tuning. Experimental results show that the proposed approach outperforms existing methods in task completion rate and grasp success rate with less motion number. Supplementary material is available at https: //github.com/DafaRen/Learning_Bifunctional_Push-grasping_Synergistic_Strategy_for_Goal-agnostic_and_Goal-oriented_Tasks
translated by 谷歌翻译
Stairs are common building structures in urban environment, and stair detection is an important part of environment perception for autonomous mobile robots. Most existing algorithms have difficulty combining the visual information from binocular sensors effectively and ensuring reliable detection at night and in the case of extremely fuzzy visual clues. To solve these problems, we propose a neural network architecture with inputs of both RGB map and depth map. Specifically, we design the selective module which can make the network learn the complementary relationship between RGB map and depth map and effectively combine the information from RGB map and depth map in different scenes. In addition, we also design a line clustering algorithm for the post-processing of detection results, which can make full use of the detection results to obtain the geometric parameters of stairs. Experiments on our dataset show that our method can achieve better accuracy and recall compared with the previous state-of-the-art deep learning method, which are 5.64% and 7.97%, respectively. Our method also has extremely fast detection speed, and a lightweight version can achieve 300 + frames per second with the same resolution, which can meet the needs of most real-time detection scenes.
translated by 谷歌翻译
Sleep stage recognition is crucial for assessing sleep and diagnosing chronic diseases. Deep learning models, such as Convolutional Neural Networks and Recurrent Neural Networks, are trained using grid data as input, making them not capable of learning relationships in non-Euclidean spaces. Graph-based deep models have been developed to address this issue when investigating the external relationship of electrode signals across different brain regions. However, the models cannot solve problems related to the internal relationships between segments of electrode signals within a specific brain region. In this study, we propose a Pearson correlation-based graph attention network, called PearNet, as a solution to this problem. Graph nodes are generated based on the spatial-temporal features extracted by a hierarchical feature extraction method, and then the graph structure is learned adaptively to build node connections. Based on our experiments on the Sleep-EDF-20 and Sleep-EDF-78 datasets, PearNet performs better than the state-of-the-art baselines.
translated by 谷歌翻译