We demonstrate how language can improve geolocation: the task of predicting the location where an image was taken. Here we study explicit knowledge from human-written guidebooks that describe the salient and class-discriminative visual features humans use for geolocation. We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations and an associated textual guidebook for GeoGuessr, a popular interactive geolocation game. Our approach predicts a country for each image by attending over the clues automatically extracted from the guidebook. Supervising attention with country-level pseudo labels achieves the best performance. Our approach substantially outperforms a state-of-the-art image-only geolocation method, with an improvement of over 5% in Top-1 accuracy. Our dataset and code can be found at https://github.com/g-luo/geolocation_via_guidebook_grounding.
translated by 谷歌翻译
检测到推特上的上下文介质(例如“MISCALTIONED”图像)通常需要检测两种模式之间的不一致。本文介绍了我们对DARPA语义取证(SEMAFOR)程序的图像文本不一致检测挑战的方法。首先,我们收集Twitter-Comms,一个大型多模式数据集,具有884K推文,与气候变化,Covid-19和军用车辆的主题相关。我们根据最先进的剪辑模型培训我们的方法,利用自动生成随机和硬质否定。然后在隐藏的人生成的评估集上测试我们的方法。我们在节目排行榜上实现了最佳结果,在零射剪辑基线上具有11%的检测改进。
translated by 谷歌翻译