文本分类已被广泛用于各种自然语言处理应用,例如情感分析。当前的应用程序通常使用大型基于变压器的语言模型来对输入文本进行分类。但是,缺乏关于发布模型时可以倒入多少私人信息的系统研究。在本文中,我们制定了\ emph {text揭示器} - 第一个模型反演攻击,用于针对变形金刚的文本分类的文本重建。我们的攻击忠实地重建了培训数据中包含的私人文本,并访问了目标模型。我们利用外部数据集和GPT-2来生成类似目标域的流利文本,然后使用目标模型的反馈来最佳地扰动其隐藏状态。我们的广泛实验表明,我们的攻击对于具有不同文本长度的数据集有效,并且可以准确地重建私人文本。
translated by 谷歌翻译
Video compression plays a crucial role in video streaming and classification systems by maximizing the end-user quality of experience (QoE) at a given bandwidth budget. In this paper, we conduct the first systematic study for adversarial attacks on deep learning-based video compression and downstream classification systems. Our attack framework, dubbed RoVISQ, manipulates the Rate-Distortion ($\textit{R}$-$\textit{D}$) relationship of a video compression model to achieve one or both of the following goals: (1) increasing the network bandwidth, (2) degrading the video quality for end-users. We further devise new objectives for targeted and untargeted attacks to a downstream video classification service. Finally, we design an input-invariant perturbation that universally disrupts video compression and classification systems in real time. Unlike previously proposed attacks on video classification, our adversarial perturbations are the first to withstand compression. We empirically show the resilience of RoVISQ attacks against various defenses, i.e., adversarial training, video denoising, and JPEG compression. Our extensive experimental results on various video datasets show RoVISQ attacks deteriorate peak signal-to-noise ratio by up to 5.6dB and the bit-rate by up to $\sim$ 2.4$\times$ while achieving over 90$\%$ attack success rate on a downstream classifier. Our user study further demonstrates the effect of RoVISQ attacks on users' QoE.
translated by 谷歌翻译