Linguistic Approaches to the Identification of Fake News
The following four linguistic approaches are being tested by researchers:
In the Bag of Words approach, each word in a sen- tence or paragraph or article is considered as a sepa- rate unit with equal importance when compared to every other word. Frequencies of individual words and identified multiword phrases are counted and analyzed. Part of speech, location-based words, and counts of the use of pronouns, conjunctions, and neg- ative emotion words are all considered. The analysis can reveal patterns of word use. Certain patterns can reliably indicate that information is untrue. For exam- ple, deceptive writers tend to use verbs and personal pronouns more often, and truthful writers tend to use more nouns, adjectives, and prepositions.20
In the Deep Syntax approach, language structure is analyzed by using a set of rules to rewrite sentences to describe syntax structures. For example, noun and verb phrases are identified in the rewritten sentences. The number of identified syntactic structures of each kind compared to known syntax patterns for lies can lead to a probability rating for veracity.21
In the Semantic Analysis approach, actual experi- ence of something is compared with something writ- ten about the same topic. Comparing written text from a number of authors about an event or experi- ence and creating a compatibility score from the com- parison can show anomalies that indicate falsehood. If one writer says the room was painted blue while three others say it was painted green, there is a chance that the first writer is providing false information.22
In Rhetorical Structure (RST), the analytic frame- work identifies relationships between linguistic ele- ments of text. Those comparisons can be plotted on a graph, Vector Space Modeling (VSM) showing how close to the truth they fall.23
Networks
In approaches that use network information, human classifiers identify instances of words or phrases that are indicators of deception. Known instances of words used to deceive are compiled to create a database. Databases of known facts are also created from vari- ous trusted sources.24 Examples from a constructed database of deceptive words or verified facts can be compared to new writing. Emotion-laden content can also be measured, helping to separate feeling from facts. By linking these databases, existing knowledge networks can be compared to information offered in new text. Disagreements between established knowl- edge and new writing can point to deception.25
Social Network Behavior using multiple reference points can help social media platform owners to iden- tify fake news.26 Author authentication can be veri- fied from internet metadata.27 Location coordination for messages can be used to indicate personal knowl- edge of an event. Inclusion or exclusion of hyper- links is also demonstrative of trustworthy or untrust- worthy sources. (For example, TweetCred, available as a browser plugin, is software that assigns a score for credibility to tweets in real time, based on char- acteristics of a tweet such as content, characteristics of the author, and external URLs.28) The presence or absence of images, the total number of images by mul- tiple sources, and their relationships and relevance to the text of a message can also be compared with known norms and are an indicator of the truth of the message. Ironically, all of this information can be col- lected by bots.