This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks, including question answering, natural language inference, word sense disambiguation, coreference resolution, and reasoning. [Method] Instead of arbitrarily increasing the size of a pretrained language model (PLM), our aim is to 1) fully extract knowledge from the input pretraining data given a certain parameter budget, e.g., 6B, and 2) effectively transfer this knowledge to downstream tasks. To achieve goal 1), we propose self-evolution learning for PLMs to wisely predict the informative tokens that should be masked, and supervise the masked language modeling (MLM) process with rectified smooth labels. For goal 2), we leverage the prompt transfer technique to improve the low-resource tasks by transferring the knowledge from the foundation model and related downstream tasks to the target task. [Results] According to our submission record (Oct. 2022), with our optimized pretraining and fine-tuning strategies, our 6B Vega method achieved new state-of-the-art performance on 4/8 tasks, sitting atop the SuperGLUE leaderboard on Oct. 8, 2022, with an average score of 91.3.
translated by 谷歌翻译
迅速调整,它冻结了预审计的语言模型(PLM),只有微调的几个额外软提示的参数,在PLM具有数十亿个参数时,对全参数微调(即模型调整)显示出具有竞争性的性能,但仍然显示出竞争力。在较小的PLM的情况下,性能差。因此,迅速转移(POT),通过训练有素的类似源任务的提示来初始化目标提示,最近提议改善及时调整。但是,这样的香草锅方法通常会实现次优的性能,因为(i)锅对源目标对的相似性和(ii)直接对目标提示进行初始提示的提示敏感,而目标任务可能会导致灾难性忘记来源知识。为了解决这些问题,我们提出了一个新的指标,以准确预测及时的转移性(关于(i)),以及一种利用知识蒸馏技术将“知识”从源提示转移到的新颖的锅方法(即熊猫)目标以微妙的方式提示,并有效缓解灾难性遗忘(关于(ii))。此外,为了实现每个源目标对的自适应及时转移,我们使用指标来控制熊猫方法中的知识转移。对PLM的5个量表的21个源和9个目标数据集的189组组合进行了广泛而系统的实验,表明:1)我们提出的指标很好地预测了及时的可传递性; 2)在所有任务和型号中,我们的熊猫始终优于香草锅的平均得分2.3%(最高24.1%); 3)通过我们的熊猫方法,及时调整可以比在各种PLM量表场景中的模型调整来实现竞争性甚至更好的性能。接受代码和模型将在接受后发布。
translated by 谷歌翻译
Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models. However, the prior seq2seq pretraining models generally focus on reconstructive objectives on the decoder side and neglect the effect of encoder-side supervision, which we argue may lead to sub-optimal performance. To verify our hypothesis, we first empirically study the functionalities of the encoder and decoder in seq2seq pretrained language models, and find that the encoder takes an important but under-exploitation role than the decoder regarding the downstream performance and neuron activation. Therefore, we propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2, which improves the seq2seq models via integrating more efficient self-supervised information into the encoders. Specifically, E2S2 adopts two self-supervised objectives on the encoder side from two aspects: 1) locally denoising the corrupted sentence (denoising objective); and 2) globally learning better sentence representations (contrastive objective). With the help of both objectives, the encoder can effectively distinguish the noise tokens and capture high-level (i.e. syntactic and semantic) knowledge, thus strengthening the ability of seq2seq model to accurately achieve the conditional generation. On a large diversity of downstream natural language understanding and generation tasks, E2S2 dominantly improves the performance of its powerful backbone models, e.g. BART and T5. For example, upon BART backbone, we achieve +1.1% averaged gain on the general language understanding evaluation (GLUE) benchmark and +1.75% F_0.5 score improvement on CoNLL2014 dataset. We also provide in-depth analyses to show the improvement stems from better linguistic representation. We hope that our work will foster future self-supervision research on seq2seq language model pretraining.
translated by 谷歌翻译
基于方面的情感分析(ABSA)是一项精细的情感分析任务,它的重点是检测句子中的情感极性。但是,它始终对多方面的挑战敏感,在句子中,多个方面的特征将相互影响。为了减轻此问题,我们设计了一个新颖的培训框架,称为对比度跨通道数据增强(C3 DA),该框架利用了一个内域的发电机来构建更多的多种相应样本,然后通过对比度模型通过对比度学习的稳健性,从而通过对比度学习的稳健性这些生成的数据。实际上,鉴于生成预审预测的语言模型和一些有限的ABSA标记数据,我们首先采用一些参数效率的方法来执行内域微调。然后,所获得的内域发生器用于从两个通道(即方面增强通道和极性增强通道)生成合成句子,该句子分别在给定的方面和极性上生成句子条件。具体而言,我们的C3 DA以跨渠道的方式执行句子生成以获取更多句子,并提出了熵最小化过滤器以滤除低质量生成的样品。广泛的实验表明,我们的C3 DA可以在准确性和宏观上胜过约1%的基准,而不会增加1%。代码和数据在https://github.com/wangbing1416/c3da中发布。
translated by 谷歌翻译
基于宽高的情绪分析(ABSA)是一种细粒度的情绪分析任务。为了更好地理解长期复杂的句子,并获得准确的方面的信息,这项任务通常需要语言和致辞知识。然而,大多数方法采用复杂和低效的方法来结合外部知识,例如,直接搜索图形节点。此外,尚未彻底研究外部知识和语言信息之间的互补性。为此,我们提出了一个知识图形增强网络(kgan),该网络(kgan)旨在有效地将外部知识与明确的句法和上下文信息纳入。特别是,kgan从多个不同的角度来看,即基于上下文,语法和知识的情绪表示。首先,kgan通过并行地了解上下文和句法表示,以完全提取语义功能。然后,KGAN将知识图形集成到嵌入空间中,基于该嵌入空间,基于该嵌入空间,通过注意机制进一步获得了方面特异性知识表示。最后,我们提出了一个分层融合模块,以便以本地到全局方式补充这些多视图表示。关于三个流行的ABSA基准测试的广泛实验证明了我们康复的效果和坚固性。值得注意的是,在罗伯塔的预用模型的帮助下,Kggan实现了最先进的性能的新记录。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Although deep learning has made remarkable progress in processing various types of data such as images, text and speech, they are known to be susceptible to adversarial perturbations: perturbations specifically designed and added to the input to make the target model produce erroneous output. Most of the existing studies on generating adversarial perturbations attempt to perturb the entire input indiscriminately. In this paper, we propose ExploreADV, a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks, allowing users to explore various kinds of adversarial examples as needed. We adapt and combine two existing boundary attack methods, DeepFool and Brendel\&Bethge Attack, and propose a mask-constrained adversarial attack system, which generates minimal adversarial perturbations under the pixel-level constraints, namely ``mask-constraints''. We study different ways of generating such mask-constraints considering the variance and importance of the input features, and show that our adversarial attack system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks. We demonstrate our system to be effective based on extensive experiments and user study.
translated by 谷歌翻译
Recently the deep learning has shown its advantage in representation learning and clustering for time series data. Despite the considerable progress, the existing deep time series clustering approaches mostly seek to train the deep neural network by some instance reconstruction based or cluster distribution based objective, which, however, lack the ability to exploit the sample-wise (or augmentation-wise) contrastive information or even the higher-level (e.g., cluster-level) contrastiveness for learning discriminative and clustering-friendly representations. In light of this, this paper presents a deep temporal contrastive clustering (DTCC) approach, which for the first time, to our knowledge, incorporates the contrastive learning paradigm into the deep time series clustering research. Specifically, with two parallel views generated from the original time series and their augmentations, we utilize two identical auto-encoders to learn the corresponding representations, and in the meantime perform the cluster distribution learning by incorporating a k-means objective. Further, two levels of contrastive learning are simultaneously enforced to capture the instance-level and cluster-level contrastive information, respectively. With the reconstruction loss of the auto-encoder, the cluster distribution loss, and the two levels of contrastive losses jointly optimized, the network architecture is trained in a self-supervised manner and the clustering result can thereby be obtained. Experiments on a variety of time series datasets demonstrate the superiority of our DTCC approach over the state-of-the-art.
translated by 谷歌翻译
Accurate and smooth global navigation satellite system (GNSS) positioning for pedestrians in urban canyons is still a challenge due to the multipath effects and the non-light-of-sight (NLOS) receptions caused by the reflections from surrounding buildings. The recently developed factor graph optimization (FGO) based GNSS positioning method opened a new window for improving urban GNSS positioning by effectively exploiting the measurement redundancy from the historical information to resist the outlier measurements. Unfortunately, the FGO-based GNSS standalone positioning is still challenged in highly urbanized areas. As an extension of the previous FGO-based GNSS positioning method, this paper exploits the potential of the pedestrian dead reckoning (PDR) model in FGO to improve the GNSS standalone positioning performance in urban canyons. Specifically, the relative motion of the pedestrian is estimated based on the raw acceleration measurements from the onboard smartphone inertial measurement unit (IMU) via the PDR algorithm. Then the raw GNSS pseudorange, Doppler measurements, and relative motion from PDR are integrated using the FGO. Given the context of pedestrian navigation with a small acceleration most of the time, a novel soft motion model is proposed to smooth the states involved in the factor graph model. The effectiveness of the proposed method is verified step-by-step through two datasets collected in dense urban canyons of Hong Kong using smartphone-level GNSS receivers. The comparison between the conventional extended Kalman filter, several existing methods, and FGO-based integration is presented. The results reveal that the existing FGO-based GNSS standalone positioning is highly complementary to the PDR's relative motion estimation. Both improved positioning accuracy and trajectory smoothness are obtained with the help of the proposed method.
translated by 谷歌翻译
In person re-identification (ReID) tasks, many works explore the learning of part features to improve the performance over global image features. Existing methods extract part features in an explicit manner, by either using a hand-designed image division or keypoints obtained with external visual systems. In this work, we propose to learn Discriminative implicit Parts (DiPs) which are decoupled from explicit body parts. Therefore, DiPs can learn to extract any discriminative features that can benefit in distinguishing identities, which is beyond predefined body parts (such as accessories). Moreover, we propose a novel implicit position to give a geometric interpretation for each DiP. The implicit position can also serve as a learning signal to encourage DiPs to be more position-equivariant with the identity in the image. Lastly, a set of attributes and auxiliary losses are introduced to further improve the learning of DiPs. Extensive experiments show that the proposed method achieves state-of-the-art performance on multiple person ReID benchmarks.
translated by 谷歌翻译