An algorithm designed to expand Wikipedia in all languages

Home Eurotech Universities EPFL An algorithm designed to expand Wikipedia in all languages

A Swiss researcher has created a system that scans Wikipedia for important articles that are missing in other languages. This project could help expand the online encyclopedia’s coverage in minority languages.

Robert West, researcher at EPFLRobert West, researcher at EPFL

With 40 million articles in 293 languages, Wikipedia is the largest encyclopedia ever made. The 5.4 million pages in English are particularly varied, covering 60 times more topics than the Encyclopaedia Britannica. But not all languages enjoy such depth of coverage. “Information that some language groups need has not been translated,” says Robert West, a researcher at EPFL’s Data Science Lab.

“For example, global warming is a crucial issue in Madagascar, yet there are no articles on this topic in Malagasy.”

Closer to home, only 3,400 articles are available in Romansh versus 1.8 million in French and over two million in German. And it’s hard for Wikipedia editors to know which of the millions of pages they should translate in order to really make a difference. That’s where Robert West comes in: he used machine learning to identify and rank the most important articles missing in each language. But determining how relevant a given topic is for a culture is more complex than it appears.

Objective machines

To help the machines assess how important an article would be in Romansch, for example, it was necessary to calculate how many views a missing article should theoretically generate. “Taylor Swift and Pokémon may be popular, but do they really count as important?” says West. “To avoid ethnocentric biases, we predicted page statistics by taking all languages into account, and the machine learning algorithms then figured out the weighting to apply to each language. For example, Japanese is more important than English when it comes to predicting the impact of a page in Chinese.”

Once the algorithms have come up with the most neutral ranking possible, the lists of missing topics are displayed on a new platform called Wikipedia GapFinder. Volunteer editors are given recommended topics based on their languages and interests. With help from a translation tool provided on the platform, the humans then finish the job – artificial intelligence is not yet ready to take over the whole process. “Human intervention is still required to meet Wikipedia’s publication standards, since machine translation is not yet up to scratch,” adds West.

► The platform, which was developed together with Stanford University and the Wikimedia Foundation, is open to the public and can publish 200 new articles per week. That’s not much compared to the 7,000 texts published daily on Wikipedia, but the focus is on quality, not quantity. West is working on a second project that uses data mining to find the key paragraphs in an article. This process, once mastered, will make it even easier to expand the online encyclopedia’s content in local languages.

Article by Sarah Aubort, EPFL Mediacom 

Wikipedia: How smart is Wiki-intelligence?

SIMILAR ARTICLES

Airbus Toulouse

Europe’s aerospace hub is a thriving, synergistic blend of industry giants, start-ups and research centres. It is gaining even more…
Robots vs Jobs

The looming confrontation
Technologist Issue 13

A universal basic income would mitigate the negative effects of automation. But it might be more effective if combined with…
Internet of things

The Internet of Things is on the verge of revolutionizing our daily lives, but also involves major challenges.