The internet is full of lies. That maxim has become an operating assumption for any remotely skeptical person interacting anywhere online, from Facebook and Twitter to phishing-plagued inboxes to spammy comment sections to online dating and disinformation-plagued media. Now one group of researchers has suggested the first hint of a solution: They claim to have built a prototype for an “online polygraph” that uses machine learning to detect deception from text alone. But what they’ve actually demonstrated, according to a few machine learning academics, is the inherent danger of overblown machine learning claims.
In last month’s issue of the journal Computers in Human Behavior, Florida State University and Stanford researchers proposed a system that uses automated algorithms to separate truths and lies, what they refer to as the first step toward “an online polygraph system—or a prototype detection system for computer-mediated deception when face-to-face interaction is not available.” They say that in a series of experiments, they were able to train a machine learning model to separate liars and truth-tellers by watching a one-on-one conversation between two people typing online, while using only the content and speed of their typing—and none of the other physical clues that polygraph machines claim can sort lies from truth.
“We used a statistical modeling and machine learning approach to parse out the cues of conversations, and based on those cues we made different analyses” of whether participants were lying, says Shuyuan Ho, a professor at FSU’s School of Information. “The results were amazingly promising, and that’s the foundation of the online polygraph.”
But when WIRED showed the study to a few academics and machine learning experts, they responded with deep skepticism. Not only does the study not necessarily serve as the basis of any kind of reliable truth-telling algorithm, it makes potentially dangerous claims: A text-based “online polygraph” that’s faulty, they warn, could have far worse social and ethical implications if adopted than leaving those determinations up to human judgment.
“It’s an eye-catching result. But when we’re dealing with humans, we have to be extra careful, especially when the implications of whether someone’s lying could lead to conviction, censorship, the loss of a job,” says Jevin West, a professor at the Information School at the University of Washington and a noted critic of machine learning hype. “When people think the technology has these abilities, the implications are bigger than a study.”
Real or Spiel
The Stanford/FSU study had 40 participants repeatedly play a game that the researchers called “Real or Spiel” via Google Hangouts. In the game, pairs of those individuals, with their real identities hidden, would answer questions from the other in a kind of roleplaying game. A participant would be told at the start of each game whether they were a “sinner” who lied in response to every question, or a “saint” who always told the truth. The researchers then took the resulting textual data, including the exact timing of each response, and used a portion of it as the training data for a machine learning model designed to sort sinners from saints, while using the rest of their data to test that model.
They found that by tuning their machine learning model, they could identify deceivers with as much as 82.5 percent accuracy. Humans who looked at the data, by contrast, barely performed better than guessing, according to Ho. The algorithm could spot liars based on cues like faster answers than truth-tellers, a greater display of “negative emotions,” more signs of “anxiety” in their communications, a greater volume of words, and expressions of certainty like “always” and “never.” Truth-tellers, by contrast, used more words of causal explanation like “because,” as well as words of uncertainty, like “perhaps” and “guess.”
“That’s very different from the way people really speak in daily life.”
Kate Crawford, AI Now Institute
The algorithm’s resulting ability to outperform humans’ innate lie detector might seem like a remarkable result. But the study’s critics point out that it was achieved in a highly controlled, narrowly defined game—not the freewheeling world of practiced, motivated, less consistent, unpredictable liars in real world scenarios. “This is a bad study,” says Cathy O’Neill, a data science consultant and author of the 2016 book Weapons of Math Destruction. “Telling people to lie in a study is a very different setup from having someone lie about something they’ve been lying about for months or years. Even if they can determine who’s lying in a study, that has no bearing on whether they’d be able to determine if someone was a more studied liar.”
She compares the setup to telling people to be left-handed for the purposes of a study—their signatures would be very different from real-world left-handed people. “Most people can get pretty good at a lie if they care enough,” O’Neill says. “The point is, the lab [scenario] is utterly artificial.”
FSU professor Ho counters critics that the study is merely a first step toward text-based lie detection, and that further studies would be needed before it could be applied. She points to caveats in the paper that clearly acknowledge the narrow context of its experiments. But even the claim that this could create a path toward a reliable online polygraph makes experts anxious.
Frowning Criminals, Performing Liars
Two different critics pointed to an analogous study they say captures the fallacy of making broad claims about machine learning’s abilities based on a narrow test scenario. Chinese researchers in 2016 announced that they’d created a machine learning model that could detect criminality based merely on looking at someone’s face. But that study was based on photos of convicted criminals that had been used as identification by police, while the non-convict photos in the same study were more likely to have been chosen by the person themselves or by their employer. The simple difference: The convicts were much less likely to be smiling. “They’d created a smile detector,” the University of Washington’s West says.
In the lie detection study, there’s almost certainly a similarly artificial difference in the study’s groups that doesn’t apply in the real world, says Kate Crawford, the co-founder of the AI Now Institute at New York University. Just as the criminality study was actually detecting smiles, the lie detection study is likely carrying out “performance detection,” Crawford argues. “You’re looking at linguistic patterns of people playing a game, and that’s very different from the way people really speak in daily life,” she says.
In her interview with WIRED, FSU’s Ho did acknowledge the artifice of the study. But in the same conversation, she also suggested that it could serve as a prototype for an online lie detector system that could be used in applications like online dating platforms, as an element in an intelligence agency polygraph test, or even by banks who are trying to assess the honesty of a person communicating with an automated chatbot. “If a bank implements it, they can very quickly know more about the person they’re doing business with,” she said.
Crawford sees those suggestions as, at best, a continuation of an already problematic history of polygraph tests, which have been shown for years to have scientifically dubious results that are prone to both false positives and being gamed by trained test takers. Now, the FSU and Stanford researchers are reviving that faulty technology, but with even fewer data sources than a traditional polygraph test. “Sure, banks might want a really cheap way to make decision to give loans to or not,” Crawford says. “But do we want to be invoking this kind of problematic history based on experiments that are themselves questionable in terms of their methodology?”
The researchers may argue that their test is only a reference point, or that they’re not recommending it be used for real-world decisions. But Crawford says they nonetheless don’t seem to be truly weighing how a faulty lie detector could be applied—and its consequences. “They’re not thinking through the full social implications,” she says. “Realistically they need a lot more attention to the negative externalities of a tool like this.”