IMCI: Integrate Multi-view Contextual Information
for Fact Extraction and Verification

Hao Wang

^{†}

, Yangguang Li

^{‡}

, Zhen Huang

^{†}

, Yong Dou

^{†}

^{†}

National University of Defense Technology, Changsha

^{‡}

SenseTime, Beijing
{hao.wang, zhenhuang, yongdou}@nudt.edu.cn
liyangguang@sensetime.com

Abstract

With the rapid development of automatic fake news detection technology, fact extraction and verification (FEVER) has been attracting more attention. The task aims to extract the most related fact evidences from millions of open-domain Wikipedia documents and then verify the credibility of corresponding claims. Although several strong models have been proposed for the task and they have made great progress, we argue that they fail to utilize multi-view contextual information and thus cannot obtain better performance. In this paper, we propose to integrate multi-view contextual information (IMCI) for fact extraction and verification. For each evidence sentence, we define two kinds of context, i.e. intra-document context and inter-document context. Intra-document context consists of the document title and all the other sentences from the same document. Inter-document context consists of all other evidences which may come from different documents. Then we integrate the multi-view contextual information to encode the evidence sentences to handle the task. Our experimental results on FEVER 1.0 shared task show that our IMCI framework makes great progress on both fact extraction and verification, and achieves state-of-the-art performance with a winning FEVER score of $72.97 %$ and label accuracy of $75.84 %$ on the online blind test set. We also conduct ablation study to detect the impact of multi-view contextual information. Our codes will be released at https://github.com/phoenixsecularbird/IMCI.

1 Introduction

Fake news propagation is a severe social problem, which may cause great loss and lead to serious consequence, e.g. panic, quarrel, opposition and even war. The situation has become a general concern since Brexit and the U.S. President Campaign in 2016 and gets far more intense due to COVID-19 pandemic Martino et al. (2020). In this condition, automatic fake news detection has been developing rapidly. According to Ruffo et al. Ruffo et al. (2021), automatic fake news detection mainly include textual-content based methods Giachanou et al. (2019); Ghanem et al. (2020); Kaliyar et al. (2021), user-role based methods Vo and Lee (2019); Giachanou et al. (2020), multi-modal approaches Zlatkova et al. (2019); Fung et al. (2021) and detection of bots and trolls Stella et al. (2018); Sayyadiharikandeh et al. (2020). Among textual-content based methods, fact extraction and verification (FEVER) Thorne et al. (2018) has been attracting developing attention. As shown in Figure 1, for a given claim, the task aims to select at most 5 most related sentences as evidences from millions of open-domain Wikipedia documents for fact extraction, and combine the selected evidences to judge the claim as SUPPORTS, REFUTES or NOT ENOUGH INFO (NEI) for fact verification.

Figure 1: An example from FEVER 1.0 shared task. (Underlined sentence is the unlabeled intra-document context. Words in red involve alias name, coreference and multi-hop reasoning, which may lead to model confusion. Words in blue help to handle these issues.)

Recently, several strong models Nie et al. (2019b, a); Zhou et al. (2019); Liu et al. (2020); Hidey et al. (2020); Subramanian and Lee (2020) have been proposed for the task. Although they have made great progress and obtained excellent performance on the task, we argue that they fail to utilize multi-view contextual information and thus cannot obtain better performance. Specifically, we define two kinds of context for each evidence sentence, i.e. intra-document context and inter-document context. Intra-document context consists of the document title and all the other sentences from the same document. Inter-document context consists of all other evidences which may come from different documents. Multi-view contextual information is of great importance for fact extraction and verification. For instance, as shown in Figure 1, intra-document context information can help to clarify the relationship between different entities, e.g. “Edar Wright” and its alias name “Edar Howard Wright” in the first evidence, and “Gemini” and it coreference “this sign” in the second sentence. Besides, the two evidence sentences can be regarded as inter-document context of each other, and the information interaction and fusion between them is essential to verify the claim in this multi-hop sample.

To this end, we propose to integrate multi-view contextual information (IMCI) for fact extraction and fact verification, where we introduce the multi-view contextual information to encode the evidence sentences to handle the task. In summary, our contributions are as follows:

$∙$ We propose an iterative multi-view fact extraction model. It retrieves related documents and extracts related evidence sentences in two iterations, with multi-view context information joined.

$∙$ We propose a multi-view fact verification model. Each evidence sentence is encoded from two views, and a dual evidence fusion graph is adopted to fuse the information from diverse views and different evidences.

$∙$ Our IMCI framework makes great progress on both fact extraction and verification, and achieves state-of-the-art performance with a winning FEVER score of $72.97 %$ on the online blind test set.

2 Iterative Multi-view Fact Extraction

Our fact extraction model iteratively conducts document retrieval and sentence retrieval in two iterations to obtain corresponding candidate evidence sentences, and then reranks the candidates of different iterations for better performance.

2.1 Document Retrieval

Document retrieval includes coarse document retrieval in iteration 1 and refined document retrieval in iteration 2.

Coarse document retrieval aims to quickly obtain most related documents from millions of open-domain Wikipedia documents with as high as possible recall and acceptable precision. Inspired by UKP-Athene Hanselowski et al. (2018) and SR-MRS Nie et al. (2019b), coarse document retrieval is a combination of constituency-based Wikipedia search and TF-IDF retrieval. These two respectively utilize search engine power and statistical word frequency information. For constituency-based Wikipedia search, we also conduct mention filtering like UKP-Athene Hanselowski et al. (2018). That is, if the title of a document is not explicitly mentioned in the claim, then we consider it as weakly related and remove it.

Refined document retrieval aims to retrieve documents with improved performance than coarse retrieval, namely higher recall and also higher precision and F1 score. It adopts dense semantic retrieval and utilizes Wikipedia hyperlinks. Specifically, in iteration 2, we decide refined candidate documents according to corresponding candidate evidences from iteration 1. That is, for each claim, all documents which contain at least one candidate evidence will be taken into account. Furthermore, as top one candidate evidences show pretty high precision (86.11%, in Table 5), we regard them as gold evidence, and take all documents which have hyperlinks with them as refined candidate documents to process multi-hop problem.

2.2 Sentence Retrieval

Sentence retrieval aims at selecting most related sentences as evidences from candidate documents. In previous models, during sentence retrieval, it is required to design sampling strategy to obtain negative samples for neural retrieval model training. Besides, these models respectively encode and score each claim-sentence pair.

Figure 2: Sentence retrieval model. Sentences are encoded within intra-document context. In iteration 2, we insert top one candidate evidence (red dashed box) into the input sequence as inter-document context.

Differently, in our framework, to avoid sampling strategy design and also utilize multi-view contextual information, we encode each sentence within its corresponding intra-document context. Moreover, as mentioned, top one candidate evidences of iteration 1 show pretty high precision (86.11%, in Table 5). Therefore, in iteration 2 we take them as inter-document context, and insert them into the input sequence to process multi-hop problem.

Formally, as shown in Figure 2, the claim, the document title and all the sentences in the document are concatenated:

\begin{matrix} [C L S] claim [S E P] sen^{*} [S E P] title [S E P] sen1 sen2 \dots [S E P] \end{matrix}

(1)

where sen $^{*}$ denotes top one candidate evidence of iteration 1, which are taken as inter-document context in iteration 2. The sequence is encoded by BERT encoder. For the claim, we take the hidden state of the first claim token as claim representation $E_{c}$ . For the title, we take the hidden state of the first title token as title representation $E_{t}$ . For each sentence, we take the hidden state of the first sentence token as the sentence representation $E_{s}$ . The sentence representation is enhanced through alignment with the title representation:

E_{t s} = W_{a} [E_{t}, E_{s}, E_{t} - E_{s}, E_{t} ⊙ E_{s}]

(2)

and the claim representation:

E_{c t s} = W_{a}^{^{'}} [E_{c}, E_{t s}, E_{c} - E_{t s}, E_{c} ⊙ E_{t s}]

(3)

where $⊙$ means element-wise Hadamard product. Then, the score of sentence $^y$ is obtained through a Multi Layer Perceptron (MLP) with sigmoid activation function:

^y = S i g m o i d (M L P (E_{c t s}))

(4)

The training objective of sentence retrieval is defined as binary cross entropy loss, to maximize the probability of groundtruth evidence sentences:

\begin{matrix} L_{E} = - \frac{1}{m \sum i = 1 n_{i}} m \sum i = 1 n_{i} \sum j = 1 [y_{i j} \cdot log ({^y}_{i j}) + (1 - y_{i j}) \cdot log (1 - {^y}_{i j})] \end{matrix}

(5)

where m is the batch size, $n_{i}$ is the sentence number of document i, and y is the sentence label, 1 for groundtruth evidence sentences while 0 for non-evidence sentences.

2.3 Full Pipeline

In each iteration, we have scored different sentences as candidate evidences. To obtain better performance, for each claim, we merge the results of different iterations, and rerank the sentences through their scores. Finally, according to the original setup of the task, we keep at most top 5 sentences as evidences, for further fact verification.

3 Multi-view Fact Verification

3.1 Multi-view Contextual Encoding

For each evidence sentence, we respectively obtain its representations through intra-document encoding and inter-document encoding.

$∙$ Intra-document Encoding aims to capture intra-document contextual information of each evidence sentence. It is similar to the sentence retrieval model in Section 2.2. Each evidence sentence is encoded within its intra-document context. Then its intra-document representation is also obtained through alignment.

$∙$ Inter-document Encoding is utilized to capture token-level information interaction among different evidence sentences to handle multi-hop problem. The claim, all evidence sentences and their document titles are concatenated as another input sequence:

\begin{matrix} [C L S] claim [S E P] title1 [S E P] evi1 [S E P] title2 [S E P] evi2 [S E P] \dots [S E P] \end{matrix}

(6)

The concatenation is also encoded by BERT encoder. Then, similarly, we obtain claim, title or evidence representation from the hidden state of the first token. Finally, for each evidence, we obtain its inter-document representation through alignment with the claim representation and its corresponding title representation.

3.2 Dual Evidence Fusion Graph

Through multi-view contextual encoding, for each evidence, we can obtain two alignment representations from different contextual views. To further integrate multi-view evidence information to handle multi-hop problem, inspired by multi-relational graph convolutional network Cao et al. (2019); Tu et al. (2019, 2020), we propose dual evidence fusion graph network. As shown in Figure 3, one evidence sentence corresponds to two different nodes in this graph, whose initial representations respectively come from intra-document encoding and inter-document encoding. For each evidence sentence, the noun phrases and named entities are extracted as keywords through spaCy¹¹1https://spacy.io/ tool. Then for a pair of nodes, the links between them are decided according to following rules:

$∙$ Common Document Two nodes are linked if they come from the same document.

$∙$ Common Keyword Two nodes are linked if they share overlapped keywords.

$∙$ Claim Jump Two nodes are linked if they respectively share overlapped keywords with the claim.

Figure 3: Dual Evidence Fusion Graph Network. Node 1 and node I denote different representations of the same evidence sentence from different encoding methods, similarly for node 2 and node II, etc. We define three kinds of edges in total.

For each claim, N selected evidence sentences introduce 2N evidence nodes. Let $H_{i}$ $\in$ $R^{2 N \times d}$ denotes the node representations at i-th graph layer, where $d$ refers to the hidden dimension. The initial representation $H_{0}$ is the claim-title-evidence alignment representation through multi-view contextual encoding. Updated information $U_{i}$ $\in$ $R^{2 N \times d}$ after a single graph layer is defined as :

U_{i} = H_{i} W_{0} + 3 \sum j = 1 {~ A}_{j} H_{i} W_{j}

(7)

where ${~ A}_{j}$ $\in$ $R^{2 N \times 2 N}$ denotes corresponding row normalized adjacent matrix for different kinds of edges. Then the forget ratio $G_{i}$ $\in$ $R^{2 N \times d}$ between the updated and old information is:

G_{i} = S i g m o i d (W_{g} [U_{i}, H_{i}])

(8)

And the updated evidence representation through the graph layer is:

H_{i + 1} = A c t i v a t i o n (U_{i}) ⊙ G_{i} + H_{i} ⊙ (1 - G_{i})

(9)

In this way, with several stacked layers, evidence representations are updated and multi-view evidence information is fused.

3.3 Confidence Aggregation

Aggregation aims to combines the evidence representations for final inference representation to verify the claim. Among the selected evidence sentences of a claim, some are groundtruth ones while others are not. To utilize evidence label information to enhance fact verification, like Tu et al. Tu et al. (2020), we adopt confidence aggregation.

Formally, let $H_{k}$ $\in$ $R^{2 N \times d}$ denotes evidence representations at the last graph layer. The confidence score of j-th evidence ${^y}_{j}$ is obtained from its representation $H_{k}^{j}$ :

{^y}_{j} = S i g m o i d (M L P (H_{k}^{j}))

(10)

The final inference representation for fact verification $R_{v}$ is the weighted sum of the evidence representations, where the weights are corresponding confidence scores:

R_{v} = N \sum j = 1 {^y}_{j} H_{k}^{j}

(11)

and the fact verification result is obtained through a 3-way classification network:

^v = S o f t m a x (W R_{v} + b)

(12)

The total loss consists of the binary cross entropy loss of evidence confidence, and the cross entropy loss of 3-way fact verification:

L_{I I} = B C E (y,^y) + C E (v,^v)

(13)

Here y is the evidence sentence label, 1 for groundtruth evidence sentences and 0 for non-evidence sentences. Besides, v is the fact verification label.

4 Experiment

4.1 Dataset

We conduct our experiments on FEVER 1.0 shared task Thorne et al. (2018), which consists of 185,455 annotated claims with 5,416,537 Wikipedia documents from the June 2017 dumps. We adopt the original dataset split of the task, which includes a training set, a development set and an online blind test set. The detailed information is shown in Table 1.

Split	SUPPORTS	REFUTES	NEI	Total
train	80035	29775	35639	145449
dev	6666	6666	6666	19998
test	6666	6666	6666	19998

Table 1: Statistics information of FEVER 1.0 Shared Task.

Moreover, for a claim, there exist several groups of evidences, and each group itself is enough to independently verify the claim. To further study the impact of multi-view contextual information, we conduct a refined split on the development set. Specifically, samples of the development set can be divided into 5 parts and the ratio of different parts are displayed in Table 2:

$∙$ Single. All evidence groups contain exactly one sentence.

$∙$ Single+. At least one evidence group contains only one sentence, and at least one group contains multi sentences.

$∙$ Multi. All evidence groups contain exactly two sentences.

$∙$ Multi+. All evidence groups contain multi sentences, and at least one group contains more than two sentences.

$∙$ NEI. The sample is labeled as NEI with no evidence groups annotated.

Single	Single+	Multi	Multi+	NEI
56.87	3.78	5.03	0.99	33.33

Table 2: Ratio of different parts on the development set.

4.2 Experiment Setup

Our IMCI is implemented through Pytorch 1.2.0 and our experiments are conducted on a computation node with 4 NVIDIA Titan V GPU. Pre-trained BERT Devlin et al. (2019) encoder is employed for all experiments. We also try RoBERTa encoder Liu et al. (2019) for fact verification. For the claims, we set max length as 64, and claims longer than this will be truncated. For the encoders, we set max input sequence length as 512, and sequence longer than this will be split with stride window size of 128. We utilize BERTAdam optimizer with initial learning rate of 1e-5 and warmup ratio of 0.1. For sentence retrieval, we adopt mini batch size of 4 and gradient accumulation step of 8. In each iteration, we train 2 epochs and select top 5 sentences as candidate evidences. For fact verification, we adopt mini batch size of 1 and gradient accumulation step of 32. For dual evidence graph, we stack 3 graph layers, where the hidden dimension is the same as that of the encoder. In each condition, we randomly start 4 times, train 4 epochs, and choose model parameters with the best performance on the development set.

4.3 Evaluation Metric

We adopt FEVER score as the dominant evaluation metric, which is the officially chief metric. FEVER score requests that fact verification label is correctly predicted, and at least one complete group of evidence sentences is found for SUPPORTS and REFUTES samples. The second important metric is label accuracy. For document retrieval and sentence retrieval, we take precision, recall as well as F1 into account. Here, we attach more importance to recall according to the task setting.

5 Results

5.1 Main Results

Main results on the blind test set are shown in Table 3. With multi-view contextual information joined, our ICMI framework obtains FEVER score of 70.10% and label accuracy of 73.04% with BERT $_{b a s e}$ encoder. The performance is comparable and even slightly promoted compared with the state-of-the-art one among all baselines with BERT $_{b a s e}$ encoder. Moreover, our model with RoBERTa $_{b a s e}$ encoder obtains FEVER score of 72.97% and label accuracy of 75.84%, and shows even higher performance than several baselines with large encoder. These indicate that our framework has made great progress to conduct more accurate fact extraction and verification.

Model	LA	FEVER
UKP-Athene(2018)	65.46	61.58
QFE(2019)	69.30	61.80
NSMN(2019a)	68.16	64.23
GEAR-BERT $_{b a s e}$ (2019)	71.60	67.10
SR-MRS-BERT $_{b a s e}$ (2019b)	72.56	67.26
DeSePtion-BERT $_{b a s e}$ (2020)	72.47	68.80
Transformer-XH-BERT $_{b a s e}$ (2020)	72.39	69.07
KGAT-BERT $_{b a s e}$ (2020)	72.81	69.40
CorefBERT-BERT $_{b a s e}$ (2020)	72.88	69.82
HESM-ALBERT $_{b a s e}$ (2020)	73.25	70.06
HESM-BERT $_{b a s e}$ (2020)	73.18	70.07
ours IMCI-BERT $_{b a s e}$	73.04	70.10
KGAT-BERT $_{l a r g e}$ (2020)	73.61	70.24
CorefBERT-BERT $_{l a r g e}$ (2020)	74.37	70.86
HESM-ALBERT $_{l a r g e}$ (2020)	74.64	71.48
KGAT-RoBERTa $_{l a r g e}$ (2020)	74.07	70.38
CorefBERT-RoBERTa $_{l a r g e}$ (2020)	75.96	72.30
ours IMCI-RoBERTa $_{b a s e}$	75.84	72.97

Table 3: Overall performance on the online blind test det. FEVER is the officially chief score. LA denotes label accuracy.

5.2 Document Retrieval

Document retrieval results of different iterations on the development set are displayed in Table 4. With search engine power adopted through Wikipedia search, and statistical word frequency information joined through TF-IDF retrieval, coarse document retrieval obtains the highest recall of 92.77%. Besides, with dense semantic retrieval model guided and top one evidence hyperlinks joined, refined document retrieval shows even higher recall of 95.69%. Furthermore, refined document retrieval has much higher precision of 29.90% and F1 of 45.56%, respectively obtains 21.22% and 29.69% absolute increase than coarse retrieval. Therefore, our iterative fact extraction model has made great improvement on document retrieval.

Model	P	R	F1
UKP-Athene(2018)	-	90.32	-
NSMN(2019a)	51.04	89.23	64.94
GEAR-BERT $_{b a s e}$ (2019)	-	89.99	-
SR-MRS-BERT $_{b a s e}$ (2019b)	18.11	92.03	30.27
Coarse Retrieval	8.68	92.77	15.87
Refined Retrieval	29.90	95.69	45.56

Table 4: Document retrieval results on the development set. - denotes that the item is not available.

Figure 4: Document retrieval results on different parts of the development set. NEI samples are not taken into consideration since no evidence sentences are annotated for them.

Moreover, document retrieval results on different parts of the development set are displayed in Figure 4. Coarse document retrieval can handle Single and Single+ samples, where the recall are respectively as high as 97.56% and 98.02%. However, coarse document retrieval fails to handle multi-hop samples, where the recall of Multi and Multi+ samples are both pretty low, respectively 46.82% and 31.31%. Compared to coarse document retrieval, refined document retrieval shows comparable recalls but much higher precision and F1 score on Single and Single+ samples. Furthermore, refined document retrieval makes great progress on multi-hop samples. For Multi and Multi+ samples, refined document retrieval respectively achieves 32.60% and 33.34% absolute increase on recall. Besides, the precision and F1 score also get significantly improved. These results indicate the high efficiency of our iterative multi-view fact extraction model. However, although refined document retrieval has achieved significant improvement, the recall of multi-hop samples is still far lower than single-hop ones.

5.3 Sentence Retrieval

Sentence retrieval results on the development set are summarized in Table 5. Our IMCI framework obtains the highest recall of 92.86%, and significantly outperforms all baselines.

Model	P	R	F1
UKP-Athene(2018)	-	86.24	-
NSMN(2019a)	36.49	86.79	51.38
GEAR-BERT $_{b a s e}$ (2019)	24.08	86.72	37.69
SR-MRS-BERT $_{b a s e}$ (2019b)	44.47	86.60	58.77
HESM-BERT $_{b a s e}$ (2020) $^{#}$	-	90.50	-
Iteration 1	25.31	90.30	39.54
w.o. Alignment	24.86	90.16	38.97
Iteration 1 (Top 1)	86.11	78.08	81.90
Iteration 2	25.90	91.98	40.42
IMCI $^{#}$	25.74	92.86	40.30

Table 5: Sentence retrieval results on the development set. According to the original task setup, we keep top 5 sentences as evidence for each claim.

#

means the models adopt iterative sentence retrieval. w.o. means without the item.

For sentence retrieval of iteration 1, upstream coarse document retrieval obtains an extremely low precision of 8.68% (in Table 4). Thus, for each claim, on average our sentence retrieval model is requested to distinguish top 5 sentences as candidate evidences from more than 250 sentences. In this condition, with intra-document contextual information joined, the model obtains pretty high recall of 90.30%, and shows comparable performance with state-of-the-art iterative sentence retrieval model HESMSubramanian and Lee (2020). Besides, top one candidate evidences show a pretty high precision of 86.11%. This shows the importance of intra-document context, and is the base of refined document retrieval. Besides, the high precision also guarantees that top one candidate evidences can be considered as inter-document context in sentence retrieval of iteration 2. With multi-view contextual information joined, our sentence retrieval model of iteration 2 obtains even higher recall of 91.98%. Moreover, full pipeline reranking makes the recall get far more increase to 92.86%. These show the great power of multi-view contextual information on fact extraction.

Moreover, sentence retrieval results on different parts of the development set are displayed in detail in Table 6. For Single and Single+ samples, the recall are respectively high at 96.33% and 95.90%, while the precision and F1 score are pretty low. However, the recall of Multi samples is pretty low at 64.41%, while that of Multi+ samples is far lower at 26.26%. Therefore, taking these and the document retrieval results in Figure 4 into consideration, it seems that fact extraction for multi-hop samples is still a difficult problem, although our model has made several progress.

Part	P	R	F1
Single	23.00	96.33	37.14
Single+	52.06	95.90	67.48
Multi	32.98	64.41	43.63
Multi+	45.25	26.26	33.23

Table 6: Sentence retrieval results on different parts of the development set. NEI samples are not taken into consideration since no evidence sentences are annotated for them.

5.4 Fact Verification

Fact verification results on the development set are shown in Figure 5. With multi-view contextual information joined, our IMCI framework obtains the highest label accuracy of 75.83% and the highest FEVER score of 73.21%. When ignoring intra-document encoding, the label accuracy and FEVER score suffer severe decrease to 74.23% and 71.58%. This indicates the great importance of intra-document contextual information on fact verification. However, compared to intra-document encoding, inter-document encoding and dual evidence fusion graph have relatively weak influence on the performance. These two components mainly aims to handle multi-hop samples. However, multi-hop samples take pretty low ratio (about 6.02% in total in Table 2). Even worse, multi-hop samples have suffered serious performance damage on upstream fact verification task (in Table 5). Evidence confidence aggregation also makes some contribution, indicating the influence of evidence label information. Besides, we also study the influence of fact verification. It seems that progress on fact verification mainly contributes to FEVER score while shows weak influence on label accuracy.

Figure 5: Fact verification results on the development set. These are averaged results on 4 random starts. w.o. means without the item.

Moreover, for our IMCI framework, the statistic information of prediction errors on fact verification is shown in Figure 6. The framework can correctly distinguish SUPPORTS and REFUTES examples, since SUPPORTS (REFUTES) and REFUTES (SUPPORTS) errors respectively take about 3.66% and 11.51%. This may indicate that the logical boundary between SUPPORTS and REFUTES is relatively clear. Besides, the framework hardly mistakes SUPPORTS examples for NEI examples. However, it may be difficult for the framework to distinguish REFUTES examples from NEI examples, as well as NEI examples from non-NEI examples, for REFUTES (NEI), NEI (SUPPORTS), and NEI (REFUTES) errors respectively take 26.59%, 26.75%, and 20.76%. The situation may be due to the pretty unbalanced label distribution of the training set (in Table 1). Besides, NEI may contain more complex logic semantic than the other two.

Figure 6: Statistic information of prediction errors on fact verification. (Label out of brackets denotes groundtruth, while label in brackets denotes wrong prediction.)

6 Related Work

$∙$ Fake News Detection Fake news detection has been attracting more attention. Ruffo et al. Ruffo et al. (2021) give a detailed survey about the development of this field. Textual-content based methods Giachanou et al. (2019); Ghanem et al. (2020); Kaliyar et al. (2021) aim at understanding the linguistic and semantic information in the text to detect fake news. User-role based methods Vo and Lee (2019); Giachanou et al. (2020) pay more attention to the role of users in the propagation of fake news. Multi-modal approaches Zlatkova et al. (2019); Fung et al. (2021) involve multi-modal information, i.e. text, table, knowledge base, image, speech and video, to evaluate the credibility of news. Besides, bots and trolls aim at influencing users with commercial, political or ideological purposes by spreading disinformation deliberately. The detection of them Stella et al. (2018); Sayyadiharikandeh et al. (2020) is also an important direction. Moreover, Sheng et al. Sheng et al. (2022) recently propose news environment perception for fake news detection, which focus on the background environment of fake news.

$∙$ Fact Extraction Fact extraction includes document retrieval and sentence retrieval. For document retrieval, Hanselowski et al. Hanselowski et al. (2018) propose a constituency-based Wikipedia search model. Nie et al. Nie et al. (2019a) utilize a keyword matching model based on a quick string matching algorithm FlashText Singh (2017). Nie et al. Nie et al. (2019b) further adopt a combination model of keyword match and TF-IDF retrieval.

For sentence retrieval, Hanselowski et al. Hanselowski et al. (2018), Nie et al. Nie et al. (2019a), and Zhou et al. Zhou et al. (2019) respectively modify Enhanced Sequential Inference Model (ESIM) Chen et al. (2017). These models separately encode the claim and an evidence sentence, and adopt cross-attention mechanism to accomplish information interaction between the claim and the evidence sentence. Nie et al. Nie et al. (2019b) and Liu et al. Liu et al. (2020) adopt BERT-based model. Subramanian and Lee Subramanian and Lee (2020) propose iterative fact verification models to retrieve evidence sentences and combine evidence sets.

$∙$ Fact Verification For fact verification, Nie et al. Nie et al. (2019b) concatenate the claim and the evidence sentences into a sequence as input to BERT encoder, and take the hidden state of the first special token [CLS], as final inference representation. Zhou et al. Zhou et al. (2019) adopt graph neural network for evidence aggregating and reasoning. Zhong et al. Zhong et al. (2020) introduce semantic role information to construct refined graph, and adopt graph convolutional network to handle the task. Liu et al. Liu et al. (2020) propose fine-grained kernel-based graph attention network for information interaction between the claim and the evidences. Subramanian and Lee Subramanian and Lee (2020) propose to combine evidence sets during fact extraction, and conduct fact verification on evidence sets. Si et al. Si et al. (2021) introduce topic model and stance detection model, and study the influence of topic and stance information on fact verification.

7 Conclusion

In this paper, we propose to integrate multi-view contextual information for fact extraction and verification. Our experimental results show that our IMCI model can obtain state-of-the-art performance on the task. Moreover, the ablation study results indicate that multi-view contextual information is essential for both fact extraction and fact verification. In the future, we will explore much stronger model to utilize contextual information in a more efficient way.

References

N. D. Cao, W. Aziz, and I. Titov (2019) Question answering by reasoning across documents with graph convolutional networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2306–2317. External Links: Link Cited by: §3.2.
Q. Chen, X. Zhu, Z. Ling, S. Wei, H. Jiang, and D. Inkpen (2017) Enhanced lstm for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1657–1668. External Links: Link Cited by: §6.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. External Links: Link Cited by: §4.2.
Y. Fung, C. Thomas, R. G. Reddy, S. Polisetty, H. Ji, S. Chang, K. R. McKeown, M. Bansal, and A. Sil (2021) InfoSurgeon: cross-media fine-grained information consistency checking for fake news detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 1683–1698. External Links: Link Cited by: §1, §6.
B. Ghanem, P. Rosso, and F. M. R. Pardo (2020) An emotional analysis of false information in social media and news articles. ACM Transactions on Internet Technology 20 (2), pp. 19:1–19:18. External Links: Link Cited by: §1, §6.
A. Giachanou, E. ARissola, B. Ghanem, F. Crestani, and P. Rosso (2020) The role of personality and linguistic patterns in discriminating between fake news spreaders and fact checkers. In Proceedings of the 25th International Conference on Applications of Natural Language to Information Systems, Lecture Notes in Computer Science, pp. 181–192. External Links: Link Cited by: §1, §6.
A. Giachanou, P. Rosso, and F. Crestani (2019) Leveraging emotional signals for credibility detection. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 877–880. External Links: Link Cited by: §1, §6.
A. Hanselowski, H. Zhang, Z. Li, D. Sorokin, B. Schiller, C. Schulz, and I. Gurevych (2018) UKP-athene: multi-sentence textual entailment for claim verification. CoRR abs/1809.01479. External Links: Link Cited by: §2.1, Table 3, Table 4, Table 5, §6, §6.
C. Hidey, T. Chakrabarty, T. Alhindi, S. Varia, K. Krstovski, M. T. Diab, and S. Muresan (2020) DeSePtion: dual sequence prediction and adversarial examples for improved fact-checking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8593–8606. External Links: Link Cited by: §1, Table 3.
R. K. Kaliyar, A. Goswami, and P. Narang (2021) FakeBERT: fake news detection in social media with a bert-based deep learning approach. Multimedia Tools and Applications 80 (8), pp. 11765–11788. External Links: Link Cited by: §1, §6.
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov (2019) RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692. External Links: Link Cited by: §4.2.
Z. Liu, C. Xiong, M. Sun, and Z. Liu (2020) Fine-grained fact verification with kernel graph attention network. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7342–7351. External Links: Link Cited by: §1, Table 3, §6, §6.
G. D. S. Martino, S. Cresci, A. B. Cedeno, S. Yu, R. D. Pietro, and P. Nakov (2020) A survey on computational propaganda detection. In Proceedings of the 29th International Joint Conference on Artificial Intelligence, pp. 4826–4832. External Links: Link Cited by: §1.
Y. Nie, H. Chen, and M. Bansal (2019a) Combining fact extraction and verification with neural semantic matching networks. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp. 6859–6866. External Links: Link Cited by: §1, Table 3, Table 4, Table 5, §6, §6.
Y. Nie, S. Wang, and M. Bansal (2019b) Revealing the importance of semantic retrieval for machine reading at scale. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 2553–2566. External Links: Link Cited by: §1, §2.1, Table 3, Table 4, Table 5, §6, §6, §6.
K. Nishida, K. Nishida, M. Nagata, A. Otsuka, I. Saito, H. Asano, and J. Tomita (2019) Answering while summarizing: multi-task learning for multi-hop qa with evidence extraction. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pp. 2335–2345. External Links: Link Cited by: Table 3.
G. Ruffo, A. Semeraro, A. Giachanou, and P. Rosso (2021) Surveying the research on fake news in social media: a tale of networks and language. CoRR abs/2109.07909. External Links: Link Cited by: §1, §6.
M. Sayyadiharikandeh, O. Varol, K. Yang, A. Flammini, and F. Menczer (2020) Detection of novel social bots by ensembles of specialized classifiers. In Proceedings of 29th ACM International Conference on Information and Knowledge Management, pp. 2725–2732. External Links: Link Cited by: §1, §6.
Q. Sheng, J. Cao, X. Zhang, R. Li, D. Wang, and Y. Zhu (2022) Zoom out and observe: news environment perception for fake news detection. CoRR abs/2203.10885. External Links: Link Cited by: §6.
J. Si, D. Zhou, T. Li, X. Shi, and Y. He (2021) Topic-aware evidence reasoning and stance-aware aggregation for fact verification. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, pp. 1612–1622. External Links: Link Cited by: §6.
V. Singh (2017) Replace or retrieve keywords in documents at scale. CoRR abs/1711.00046. External Links: Link Cited by: §6.
M. Stella, E. Ferrara, and M. D. Domenico (2018) Bots increase exposure to negative and inflammatory content in online social systems. Proceedings of the National Academy of Sciences 115 (49), pp. 12435–12440. External Links: Link Cited by: §1, §6.
S. Subramanian and K. Lee (2020) Hierarchical evidence set modeling for automated fact extraction and verification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 7798–7809. External Links: Link Cited by: §1, §5.3, Table 3, Table 5, §6, §6.
J. Thorne, A. Vlachos, C. Christodoulopoulos, and A. Mittal (2018) FEVER: a large-scale dataset for fact extraction and verification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, pp. 809–819. External Links: Link Cited by: §1, §4.1.
M. Tu, K. Huang, G. Wang, J. Huang, X. He, and B. Zhou (2020) Select, answer and explain: interpretable multi-hop reading comprehension over multiple documents. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 9073–9080. External Links: Link Cited by: §3.2, §3.3.
M. Tu, G. Wang, J. Huang, Y. Tang, X. He, and B. Zhou (2019) Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pp. 2704–2713. External Links: Link Cited by: §3.2.
N. Vo and K. Lee (2019) Learning from fact-checkers: analysis and generation of fact-checkinglanguage. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–344. External Links: Link Cited by: §1, §6.
D. Ye, Y. Lin, J. Du, Z. Liu, P. Li, M. Sun, and Z. Liu (2020) Coreferential reasoning learning for language representation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 7170–7186. External Links: Link Cited by: Table 3.
C. Zhao, C. Xiong, C. Rosset, X. Song, P. N. Bennett, and S. Tiwary (2020) Transformer-xh: multi-evidence reasoning with extra hop attention. In Proceedings of the 8th International Conference on Learning Representations, External Links: Link Cited by: Table 3.
W. Zhong, J. Xu, D. Tang, Z. Xu, N. Duan, M. Zhou, J. Wang, and J. Yin (2020) Reasoning over semantic-level graph for fact checking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6170–6180. External Links: Link Cited by: §6.
J. Zhou, X. Han, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun (2019) GEAR: graph-based evidence aggregating and reasoning for fact verification. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pp. 892–901. External Links: Link Cited by: §1, Table 3, Table 4, Table 5, §6, §6.
D. Zlatkova, P. Nakov, and I. Koychev (2019) Fact-checking meets fauxtography: verifying claims about images. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 2099–2108. External Links: Link Cited by: §1, §6.

IMCI: Integrate Multi-view Contextual Information for Fact Extraction and Verification