Data Science Versus Manipulative News
“Even the best AI for spotting fake news is still terrible”, stated the MIT Technology Review in the end of 2018. This was the conclusion after the probably most extensive study on disinformation conducted by MIT, Qatar Computing Research Institute (QCRI), part of Hamad bin Khalifa University, and Sofia University “ St. Kliment Ohridski”. Despite all technology advancements, the recognition (not prevention) is still too dependent on human fact checking. The algorithms themselves are capable of checking the reliability and authenticity of a source, and to a certain extent grade of propaganda in a particular article, but are still in their early stage.
A hands-on approach
Natural language processing, part of the artificial intelligence domain, seems to be the only suitable approach to the complex disinformation issues currently. Most of the fact checking is still done manually by journalists and NGOs like the UK Full Fact and this is far from scalable. “Using data science we can faster and easier automate identification of propaganda or manipulative news”, Sergi Sergiev, founder of Data Science Society, explained Trending Topics. This is also the goal of the upcoming Hack the News datathon, organized by Data Science.
Between 21st and 27th of January over 200 data scientists and developers from all over the world will participate in a global data hackathon called Hack the News. In particular, the initiative aims to train algorithms capable to identify misleading statements and conclusions in the news stream on three levels: on article level, on sentence level and the type of propaganda. During the datathon participants will have a data set of 500 articles and 10K sentences to work on.
“Our goal is to open the stage for development of such applications and I would be more than happy if we see startups coming out of this event or people who would try to develop and commercialize such products”, explained Sergiev, for whom this is the sixth datathon as co-organizer.
20 of the world’s top 100 scientists
This Hack the News datathon in particular is a community driven initiative of the Data Science Society. The three initiators are Preslav Nakov, senior researcher at QCRI, Laura Tolosi-Halatcheva, a data scientist with over 15 years of experience on international level and Viktor Senderov, Guest Researcher at Naturhistoriska riksmuseet. “These are names everyone in the field of propaganda detection knows. I’m happy to say that in this datathon we will have 20 of the top 100 experts in the world participating”, so Sergiev. Researchers from MIT, The Universities of Sheffield, University of British Columbia and the Technical University in Darmstadt will also take part.
The development of software that tackles the complex phenomena of fake and manipulative news seems to be something everyone is talking about but few are really doing. In a conversation with Preslav Nakov several months ago, we found out that the best automated process so far is the identification on media level – reliability and political ideology of particular media could be automatically detected through a platform the QCRI has launched. His institute has launched several tools in this domain. One of them is ClaimRank which automatically identifies claims in a given document are most worthy and should be prioritized for fact-checking (still a human task).
As surprising it is R&D in this field has a long way to go. Even though there is enough research suggesting that there are particular phrases and wording patterns that are symptomatic for misleading news, machines and their algorithms are still not there. And even if the process starts accelerating, it will probably affect mostly the English speaking and reporting world.
The Bulgaria connection
It’s an interesting but not surprising fact that exactly a Bulgarian organization – the founded in 2014 Data Science Society, is the initiator of Hack the News. On the one hand, with Preslav Nakov and several companies such as established and award-winning Ontotext, scaling up Senskika, early stage HyperNews, and Google-grantee Damocles Analytics, there is accumulated knowledge in the domain.
On the other hand, propaganda and misleading news are a major topic in Bulgaria. It continues shrinking in the World Press Freedom Index and ranked 111 last year, which makes it the EU-member with the lowest media freedom.
Thus tech ventures aiming to conquer disinformation, even developed in Bulgaria, are not targeted at the local market and don’t support the local language. The reasons are both economical meaning no business logic and technical, as the training of AI algorithms requires massive data sets and scientific dedication.
Trending Topics will cover the results of the Hack the News hackathon and will continue following the tech against disinformation line. If you have any insights, stories and opinion to share drop us a line at email@example.com.