Artificial Intelligence in Journalism
On the Hunt for Hidden Patterns
Even today, Artificial Intelligence (AI) already plays a key role in journalism: algorithms find stories in large data sets and automatically generate thousands of texts. Very soon, AI could become a critical infrastructure of media production.
By Rebecca Ciesielski
‘I will work tirelessly to inform you’ were some of the first words uttered by the news anchor China introduced at its own World Internet Conference in 2018. “Tirelessly” could be taken literally in this case, as the speaker was not a human, but a collection of video frames and audio files generated by Artificial Intelligence.
What sounds like science fiction is being tested not only in China, but media and companies in Europe are also working on automated moderation staff. Together with the London-based start-up Synthesia, the Reuters news agency has developed the prototype of an AI sports anchor. The anchor will provide match summaries without a human having to write presentation texts or present them in front of the camera.
So, will journalists soon be replaced by AI algorithms? Hardly. ‘The best and worst thing that you could think of is an AI system writing articles,’ said Abishek Prasad of India's HT Media Group in early December 2021 at a panel titled The Future of AI in Journalism at the JournalismAI Festival organised by the London School of Economics and Political Science (LSE).
AI can help discover newsworthy storiesGenerally, “data journalism” is thought of as the analysis of larger or smaller spreadsheets. One of the most time-consuming tasks of data journalists is to structure data sets in such a way that story-worthy correlations can be found in them. AI complements this workflow, recognises patterns in data sets, and can produce texts directly from the data.
Major news agencies such as Thomson Reuters, Bloomberg and AP have algorithms combing through huge datasets looking for anything that seems newsworthy: conspicuously changing stock prices, other market movements, or even salient social media comments. Bloomberg uses a whole assortment of AI tools to automatically create news stories about financial topics. So-called Named-Entity Recognition (NER) algorithms recognise people, companies and organisations in texts, and Automatic Sentiment Analyses provide an assessment of how positive or negative a news fact might be for a company.
Such AI tools can help journalists stay on top of the news and spot important events at an early stage.
AI shows surveillance flights, solar cells and non-authentic tweetsAI pattern recognition can also help investigative journalists analyse large and complex data sets based on hypotheses: Buzzfeed News used AI to discover the flight routines of secret US surveillance planes, and La Nación Argentina counted Argentine solar farms in satellite images. And The Atlantic has programmed a Twitter bot that used machine-learning and natural language processing to determine which tweets were written by Donald Trump himself and which by his staff.
AI is particularly useful for research with large image datasets: reporters from Bayerischer Rundfunk (BR), Norddeutscher Rundfunk (NDR) and Westdeutscher Rundfunk (WDR) used image recognition AI to find hate symbols such as SS runes and Hitler images on Facebook. The Panama Papers research team similarly used Optical Character Recognition (OCR) to convert scanned IDs and contracts into machine-readable text data.
Sports, the stock market, crime – algorithms are already writing thousands of texts every dayIn the process, AI algorithms are not only changing the analysis but also the creation of stories based on data. ‘We have founded the world's only automated news agency,’ claims British company RADAR on its website. Anyone who consumes local news in the UK is likely to have already read articles created automatically by the company's AI, as it serves hundreds of news websites, newspapers and radio stations across the country every day. According to the company's own information, six employees produce about 3,000 AI‑supported articles every week. That makes a good 70 texts per person per day.
In order to be able to guarantee this enormous output, RADAR relies on data journalism that can be broken down regionally. This way, a few stories become several hundred in the blink of an eye.
AI saves “humans-in-the-loop” time and stressThis does not just work for local news: we at the AI + Automation Lab of Bayerischer Rundfunk (BR), together with BR's sports editorial team and the Technical University of Munich have developed a system that generates news reports for preliminary-round basketball games. The application automatically generates texts from results data, game schedules, league positions, and the players' throwing statistics, which can be checked and, if necessary, edited by the sports editors before publication. This saves the editors time and still ensures – and in fact requires – editorial fact-checking by humans.
Text automation of this kind cannot replace humans, as it only works where there is a regular supply of predictable, clearly structured data: in business, sports, or crime reporting, for instance.
Data can also pose a riskAI algorithms support journalists in their work and create opportunities that would not exist without them, but the systems are not without risk. This is because AI does what all algorithms do: classify, sort, and score. In doing so, the systems are not always right. In the case of research, often this is not a problem as AI is used especially where a few incorrectly classified images, tweets or documents are tolerable and can be detected when checked individually.
The danger lurks where data about people is being assessed. The paywall of the Wall Street Journal (WSJ) uses machine-learning algorithms that record variables such as frequency of visits, devices used, and content consumed, score them, and then calculate a subscription probability for readers. This probability score influences the number of free texts that each reader is allowed to view. Although the risk of discrimination, as in this example, should be manageable in most cases, many media take a particularly critical look at their own AI systems: Bayerischer Rundfunk has set itself ethical guidelines against which every AI is to be assessed. The assessment criteria include the responsible use of resources, the frugal collection of data, its secure storage, and editorial control over systems: ‘Even with automated journalism and data journalism, the journalistic responsibility lies with the editorial boards’. They should check automatically generated media content and also critically assess the plausibility of data structures and data sources.
Some AI models form an environmental burdenThe Dutch Schibsted Group has also developed a framework for risk assessment of its own AI systems: FAST which stands for Fairness, Accountability, Sustainability and Transparency, whereby “Sustainability” means social as well as ecological sustainability. Therefore, new AI models are also assessed for their carbon footprint. There are good reasons for this consideration: training some AI models consumes as much energy as several cars during their entire lifecycle.
AI will have a huge impact on journalism, according to Agnes Stenbom, Data & AI Specialist at Schibsted, at the JournalismAI Festival in December: ‘I believe that we will talk about it as infrastructure.’ After the invention of electricity, there was a similar situation. At the start, it triggered diffuse anxieties in many people. ‘But today, we walk into a room and press a button, and if that button doesn't work, we are frustrated.’ A similar kind of integration of AI into the everyday work of journalists is likely.