(Content in this
article is based on a
press release from University of Melbourne and an article by Professor Stephan Winer and
Marie Truelove published on Pursuit, an open-access website which is published
by the University. Any reproduced content is under the Creative Commons
Attribution-No Derivatives 3.0 Australia (CC BY-ND 3.0 AU). The paper can
be accessed here.)
Researchers at the University of Melbourne are using machine
learning to distinguish false information from the truth on social media
platform, Twitter.
Increasing numbers of people rely on their social media
feeds for news. But algorithms on social media platforms prioritise engagement
over accuracy, and unscrupulous content creators can easily create and post misleading
or even outright false information’, motivated by financial, political or other
reasons.
Professor Stephan Winer and Marie Truelove from the Melbourne
School of Engineering have developed a framework to assess whether a tweet is a
witness account from a first-hand experience or not, relying on the principle
that witness accounts are more trustworthy than hearsay.
The framework analyses details of a tweet to determine
whether it is a witness account. It starts with checking the georeferenced or
location information in the metadata of tweets, but only a small fraction of
users turn on that option.
To identify more sources of evidence the researchers turned
to the content of the tweet itself, that is the text and the pictures.
The text of the tweet could contain statements like
observations of the event (smoke in the sky for a bushfire is the example given
in the article written by the researchers). This information combined with
images (of the smoke rising above the house or a live shot from a football
match) and location information (geotags from the relevant town or suburb) can
provide evidence for the Twitter user being a credible witness.
The researchers also look for counter-evidence which might
show that a tweeter is not a witness on-the-ground. For example, if they
describe themselves being in some other place or post an image of the event on
a TV screen, the case for their being an eyewitness is weakened by the contradictory
evidence.
This evidence can be extracted automatically by using
machine learning and it is used to assign a tweet with a credibility measure,
from low to high.
But Ms. Truelove said there were still challenges to
be addressed. For instance, the Twitter user could have learnt about the event by
watching it on TV. Attached pictures may be unattributed copies from other
sources, or feature historic events at the same place.
Tweeters can post their excited anticipation of attending an
event later in the day but not go, or alternatively delay posting their witness
accounts until on the way home after the event has ended.
The posting behaviour of witnesses can also vary, depending
on the type of event. For example, tweets reporting an event has not occurred
will only appear if the event had been predicted in the first place, such as
when predicted flooding and power outages associated with a cyclone do not happen.
I
The researchers are overcoming these challenges primarily by
investigating different evidence sources within tweets. A series of processes
are applied to remove tweets that cannot support inferences the tweeter is at the
event, for example, retweets. Then supervised machine learning techniques are
used to apply classification models to extract evidence from the remaining
tweets that support inferences the tweeter is at the event.
The framework is in early development phases, but it could
be useful to journalists and news organisations around the world, for checking
the veracity of social media sources.