经认证的稳健性是安全关键应用中的深度神经网络的理想性质,流行的训练算法可以通过计算其Lipschitz常数的全球界限来认证神经网络的鲁棒性。然而,这种界限往往松动:它倾向于过度规范神经网络并降低其自然精度。绑定的Lipschitz绑定可以在自然和认证的准确性之间提供更好的权衡,但通常很难根据网络的非凸起计算。在这项工作中,我们通过考虑激活函数(例如Relu)和权重矩阵之间的相互作用,提出了一种有效和培训的\ emph {本地} Lipschitz上限。具体地,当计算权重矩阵的诱发标准时,我们消除了相应的行和列,其中保证激活函数在每个给定数据点的邻域中是常数,它提供比全局Lipschitz常数的可怕更严格的绑定神经网络。我们的方法可用作插入式模块,以拧紧在许多可认证的训练算法中绑定的Lipschitz。此外,我们建议夹住激活功能(例如,Relu和Maxmin),具有可读的上限阈值和稀疏性损失,以帮助网络实现甚至更严格的本地嘴唇尖端。在实验上,我们表明我们的方法始终如一地优于Mnist,CiFar-10和Tinyimagenet数据集的清洁和认证准确性,具有各种网络架构的清洁和认证的准确性。
translated by 谷歌翻译
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$-differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains near-optimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{-1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{-1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{-1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately low-dimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is near-linear.
translated by 谷歌翻译
Exploring dense matching between the current frame and past frames for long-range context modeling, memory-based methods have demonstrated impressive results in video object segmentation (VOS) recently. Nevertheless, due to the lack of instance understanding ability, the above approaches are oftentimes brittle to large appearance variations or viewpoint changes resulted from the movement of objects and cameras. In this paper, we argue that instance understanding matters in VOS, and integrating it with memory-based matching can enjoy the synergy, which is intuitively sensible from the definition of VOS task, \ie, identifying and segmenting object instances within the video. Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank. We employ the well-learned object queries from IS branch to inject instance-specific information into the query key, with which the instance-augmented matching is further performed. In addition, we introduce a multi-path fusion block to effectively combine the memory readout with multi-scale features from the instance segmentation decoder, which incorporates high-resolution instance-aware features to produce final segmentation results. Our method achieves state-of-the-art performance on DAVIS 2016/2017 val (92.6% and 87.1%), DAVIS 2017 test-dev (82.8%), and YouTube-VOS 2018/2019 val (86.3% and 86.3%), outperforming alternative methods by clear margins.
translated by 谷歌翻译
We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards. We design a new algorithm, Variance-weighted Optimistic $Q$-Learning (VO$Q$L), based on $Q$-learning and bound its regret assuming completeness and bounded Eluder dimension for the regression function class. As a special case, VO$Q$L achieves $\tilde{O}(d\sqrt{HT}+d^6H^{5})$ regret over $T$ episodes for a horizon $H$ MDP under ($d$-dimensional) linear function approximation, which is asymptotically optimal. Our algorithm incorporates weighted regression-based upper and lower bounds on the optimal value function to obtain this improved regret. The algorithm is computationally efficient given a regression oracle over the function class, making this the first computationally tractable and statistically optimal approach for linear MDPs.
translated by 谷歌翻译
This paper focuses on analyzing and improving the commonsense ability of recent popular vision-language (VL) models. Despite the great success, we observe that existing VL-models still lack commonsense knowledge/reasoning ability (e.g., "Lemons are sour"), which is a vital component towards artificial general intelligence. Through our analysis, we find one important reason is that existing large-scale VL datasets do not contain much commonsense knowledge, which motivates us to improve the commonsense of VL-models from the data perspective. Rather than collecting a new VL training dataset, we propose a more scalable strategy, i.e., "Data Augmentation with kNowledge graph linearization for CommonsensE capability" (DANCE). It can be viewed as one type of data augmentation technique, which can inject commonsense knowledge into existing VL datasets on the fly during training. More specifically, we leverage the commonsense knowledge graph (e.g., ConceptNet) and create variants of text description in VL datasets via bidirectional sub-graph sequentialization. For better commonsense evaluation, we further propose the first retrieval-based commonsense diagnostic benchmark. By conducting extensive experiments on some representative VL-models, we demonstrate that our DANCE technique is able to significantly improve the commonsense ability while maintaining the performance on vanilla retrieval tasks. The code and data are available at https://github.com/pleaseconnectwifi/DANCE
translated by 谷歌翻译
本文介绍了Omnivl,这是一种新的基础模型,旨在使用一种通用体系结构来支持图像语言和视频语言任务。它为图像和视频输入采用了统一的基于变压器的视觉编码器,因此可以执行联合图像语言和视频语言预处理。我们首次证明了这样的范式受益于图像和视频任务,而不是传统的单向传输(例如,使用图像语言来帮助视频语言)。为此,我们提出了对图像语言和视频语言的脱钩关节预处理,以有效地将视觉模型分解为空间和时间维度,并在图像和视频任务上获得性能提升。此外,我们引入了一种新颖的统一视觉对比度(UNIVLC)损失,以利用图像文本,视频文本,图像标签(例如,图像分类),视频标签(例如,视频动作识别)在一起受到监督和吵闹的监督预处理数据都尽可能多地利用。无需额外的任务适配器,Omnivl可以同时支持仅视觉任务(例如,图像分类,视频操作识别),跨模式对齐任务(例如,图像/视频 - 文本检索)和多模式理解和生成任务(例如,图像/视频问答,字幕)。我们在各种下游任务上评估Omnivl,并以相似的模型大小和数据量表获得最新的或竞争结果。
translated by 谷歌翻译
供应链平台(SCP)为下游行业提供了许多原材料。与传统的电子商务平台相比,由于用户兴趣有限,SCP中的数据更为稀疏。为了解决数据稀疏问题,可以应用跨域建议(CDR),从而通过源域信息提高目标域的建议性能。但是,将CDR应用于SCP,直接忽略了SCP中商品的层次结构,从而降低了建议性能。为了利用此功能,在本文中,我们以餐饮平台为例,并提出了图形跨域推荐模型GRES。该模型首先构造了树状图,以表示菜肴和成分不同节点的层次结构,然后应用我们提出的Tree2Vec方法将GCN和BERT模型组合到嵌入图中以嵌入图表以获取建议。商业数据集上的实验结果表明,GRES在供应链平台的跨域建议中明显优于最先进的方法。
translated by 谷歌翻译
我们试图将广泛的神经网络的非线性建模功能与模型预测控制(MPC)的安全保证相结合,并在严格的在线计算框架中。可以使用Koopman运算符捕获所考虑的网络类,并将其集成到基于Koopman的跟踪MPC(KTMPC)中,以用于非线性系统以跟踪分段常数引用。原始非线性动力学与其训练有素的Koopman线性模型之间模型不匹配的影响是通过在建议的跟踪MPC策略中使用约束拧紧方法来处理的。通过选择两个Lyapunov候选功能,我们证明解决方案是可行的,并且在存在有限的建模错误的情况下,在线和离线最佳可触发稳定输出均具有稳定的输入到状态。最后,我们展示了一个数值示例的结果以及自动地面车辆在跟踪给定参考文献中的应用。
translated by 谷歌翻译
Covid-19-Pandemic继续在世界上迅速传播,并在全球人类健康和经济中造成巨大危机。它的早期检测和诊断对于控制进一步的扩散至关重要。已经提出了许多基于学习的深度方法,以帮助临床医生根据计算机断层扫描成像进行自动COVID-19诊断。但是,仍然存在挑战,包括现有数据集中的数据多样性,以及由于深度学习模型的准确性和敏感性不足而导致的检测不满意。为了增强数据多样性,我们设计了增量级别的增强技术,并将其应用于最大的开放式基准测试数据集Covidx CT-2A。同时,在本研究中提出了从对比度学习中得出的相似性正则化(SR),以使CNN能够学习更多参数有效的表示,从而提高了CNN的准确性和敏感性。七个常用CNN的结果表明,通过应用设计的增强和SR技术,可以稳定地提高CNN性能。特别是,具有SR的Densenet121在三个试验中的三类分类中达到99.44%的平均测试准确性,包括正常,非covid-19-19-19肺炎和Covid-19-19。 COVID-19肺炎类别的精确度,敏感性和特异性分别为98.40%,99.59%和99.50%。这些统计数据表明,我们的方法已经超过了COVIDX CT-2A数据集上现有的最新方法。
translated by 谷歌翻译
伪装的对象检测(COD),将其优雅地融合到周围环境中的对象是一项有价值但充满挑战的任务。现有的深度学习方法通常陷入具有完整和精细的对象结构准确识别伪装对象的困难。为此,在本文中,我们提出了一个新颖的边界引导网络(BGNET),以用于伪装对象检测。我们的方法探索了有价值的和额外的对象相关的边缘语义,以指导COD的表示形式学习,这迫使模型生成突出对象结构的特征,从而促进了精确边界定位的伪装对象检测。对三个具有挑战性的基准数据集进行的广泛实验表明,我们的BGNET在四个广泛使用的评估指标下的现有18种最新方法明显优于现有的18种最新方法。我们的代码可在以下网址公开获取:https://github.com/thograce/bgnet。
translated by 谷歌翻译