蒸馏工作导致语言模型更紧凑,没有严重的性能下降。蒸馏的标准方法培训了针对两个目标的学生模型:特定于任务的目标(例如,语言建模)和模仿目标,并鼓励学生模型的隐藏状态与较大的教师模型类似。在本文中,我们表明,增强蒸馏有利于第三个目标,鼓励学生通过交换干预培训(IIT)来模仿教师的因果计算过程。 IIT推动学生模型成为教师模型的因果抽象 - 一种具有相同因果结构的更简单的模型。 IIT是完全可差异的,容易实施,并与其他目标灵活结合。与伯特标准蒸馏相比,通过IIT蒸馏导致维基百科(屏蔽语言建模)逐步困惑,并对胶水基准(自然语言理解),队(问题接听)和Conll-2003(命名实体识别)进行了改进。
translated by 谷歌翻译
在许多领域,我们有很好的了解有关导致结构的洞察,这将使我们训练有素的型号有用,同时仍然可以以数据驱动的方式学习。为此,我们介绍了交换干预培训的新方法(IIT)。在IIT中,我们(1)与神经模型中的表示的因果模型中的变量和(2)列车在一个神经模型中,以匹配当两个模型中的对齐表示时的基本输入上的因果模型的反事行为它们是第二源输入的值。 IIT完全可分辨,灵活地与其他目标结合,并保证目标因果模型是当其损失最小化时神经模型的ACAUSAL抽象。我们在结构化视觉任务(MNIST-PVR)和导航指令任务(REARCAN)上评估IIT。我们将IIT与多任务培训目标和数据增强进行比较。在我们的所有实验中,IIT在他们实现目标因果模型的意义上实现了最佳结果,并产生了更可观的诠释。
translated by 谷歌翻译
AI正在经历范式转变,随着模型的兴起(例如Bert,Dall-E,GPT-3),这些模型经过大规模的数据训练,并且可以适应广泛的下游任务。我们称这些模型基础模型来强调其至关重要但不完整的特征。该报告提供了基础模型的机会和风险的详尽说明,包括其功能(例如语言,愿景,机器人技术,推理,人类互动)和技术原则(例如,模型架构,培训程序,数据,系统,安全,安全性,评估,理论)对其应用(例如法律,医疗保健,教育)和社会影响(例如不平等,滥用,经济和环境影响,法律和道德考虑)。尽管基础模型基于标准的深度学习和转移学习,但它们的规模导致了新的新兴能力,以及它们在许多任务中的有效性都激发了同质化。同质化提供了强大的杠杆作用,但要求谨慎,因为基础模型的缺陷均由下游的所有适应模型继承。尽管即将广泛地部署基础模型,但我们目前对它们的工作方式,失败以及由于其新兴属性的影响而缺乏清晰的了解。为了解决这些问题,我们认为基础模型的许多批判性研究都需要与他们的基本社会技术性质相称。
translated by 谷歌翻译
本文介绍了在结构因果模型(SCM)的一般空间上定义的一系列拓扑结构,介绍了因果推断的拓扑学习 - 理论观点。作为框架的说明,我们证明了拓扑因果层次结构定理,表明只有在微薄的SCM集中就可以实现了无实体的假设因果推断。由于弱拓扑结构和统计上可验证假设的开放集之间的已知对应关系,我们的结果表明,原则上的归纳假设足以许可有效的因果推论是统计上无可核实的。类似于无午餐定理的统计推断,目前的结果阐明了因果推断的实质性假设的必然性。我们拓扑方法的额外好处是它很容易容纳具有无限变量的SCM。我们终于建议该框架对探索和评估替代因果归纳的积极项目有所帮助。
translated by 谷歌翻译
View-dependent effects such as reflections pose a substantial challenge for image-based and neural rendering algorithms. Above all, curved reflectors are particularly hard, as they lead to highly non-linear reflection flows as the camera moves. We introduce a new point-based representation to compute Neural Point Catacaustics allowing novel-view synthesis of scenes with curved reflectors, from a set of casually-captured input photos. At the core of our method is a neural warp field that models catacaustic trajectories of reflections, so complex specular effects can be rendered using efficient point splatting in conjunction with a neural renderer. One of our key contributions is the explicit representation of reflections with a reflection point cloud which is displaced by the neural warp field, and a primary point cloud which is optimized to represent the rest of the scene. After a short manual annotation step, our approach allows interactive high-quality renderings of novel views with accurate reflection flow. Additionally, the explicit representation of reflection flow supports several forms of scene manipulation in captured scenes, such as reflection editing, cloning of specular objects, reflection tracking across views, and comfortable stereo viewing. We provide the source code and other supplemental material on https://repo-sam.inria.fr/ fungraph/neural_catacaustics/
translated by 谷歌翻译
Edge computing is changing the face of many industries and services. Common edge computing models offload computing which is prone to security risks and privacy violation. However, advances in deep learning enabled Internet of Things (IoTs) to take decisions and run cognitive tasks locally. This research introduces a decentralized-control edge model where most computation and decisions are moved to the IoT level. The model aims at decreasing communication to the edge which in return enhances efficiency and decreases latency. The model also avoids data transfer which raises security and privacy risks. To examine the model, we developed SAFEMYRIDES, a scene-aware ridesharing monitoring system where smart phones are detecting violations at the runtime. Current real-time monitoring systems are costly and require continuous network connectivity. The system uses optimized deep learning that run locally on IoTs to detect violations in ridesharing and record violation incidences. The system would enhance safety and security in ridesharing without violating privacy.
translated by 谷歌翻译
Cognitive Computing (COC) aims to build highly cognitive machines with low computational resources that respond in real-time. However, scholarly literature shows varying research areas and various interpretations of COC. This calls for a cohesive architecture that delineates the nature of COC. We argue that if Herbert Simon considered the design science is the science of artificial, cognitive systems are the products of cognitive science or 'the newest science of the artificial'. Therefore, building a conceptual basis for COC is an essential step into prospective cognitive computing-based systems. This paper proposes an architecture of COC through analyzing the literature on COC using a myriad of statistical analysis methods. Then, we compare the statistical analysis results with previous qualitative analysis results to confirm our findings. The study also comprehensively surveys the recent research on COC to identify the state of the art and connect the advances in varied research disciplines in COC. The study found that there are three underlaying computing paradigms, Von-Neuman, Neuromorphic Engineering and Quantum Computing, that comprehensively complement the structure of cognitive computation. The research discuss possible applications and open research directions under the COC umbrella.
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
The application of deep learning algorithms to financial data is difficult due to heavy non-stationarities which can lead to over-fitted models that underperform under regime changes. Using the Numerai tournament data set as a motivating example, we propose a machine learning pipeline for trading market-neutral stock portfolios based on tabular data which is robust under changes in market conditions. We evaluate various machine-learning models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks with and without simple feature engineering, as the building blocks for the pipeline. We find that GBDT models with dropout display high performance, robustness and generalisability with relatively low complexity and reduced computational cost. We then show that online learning techniques can be used in post-prediction processing to enhance the results. In particular, dynamic feature neutralisation, an efficient procedure that requires no retraining of models and can be applied post-prediction to any machine learning model, improves robustness by reducing drawdown in volatile market conditions. Furthermore, we demonstrate that the creation of model ensembles through dynamic model selection based on recent model performance leads to improved performance over baseline by improving the Sharpe and Calmar ratios. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility of results.
translated by 谷歌翻译
In this work, we address the problem of unsupervised moving object segmentation (MOS) in 4D LiDAR data recorded from a stationary sensor, where no ground truth annotations are involved. Deep learning-based state-of-the-art methods for LiDAR MOS strongly depend on annotated ground truth data, which is expensive to obtain and scarce in existence. To close this gap in the stationary setting, we propose a novel 4D LiDAR representation based on multivariate time series that relaxes the problem of unsupervised MOS to a time series clustering problem. More specifically, we propose modeling the change in occupancy of a voxel by a multivariate occupancy time series (MOTS), which captures spatio-temporal occupancy changes on the voxel level and its surrounding neighborhood. To perform unsupervised MOS, we train a neural network in a self-supervised manner to encode MOTS into voxel-level feature representations, which can be partitioned by a clustering algorithm into moving or stationary. Experiments on stationary scenes from the Raw KITTI dataset show that our fully unsupervised approach achieves performance that is comparable to that of supervised state-of-the-art approaches.
translated by 谷歌翻译