In a galaxy far far away...
Using a new computer program, EPFL researchers offer unusual insight into the universe of Star Wars, which includes more than 20,000 characters spread among 640 communities over a period of 36,000 years.
Do you think you know all there is to know about Star Wars? You may change your mind after reading this article. Using a new computer program, EPFL researchers revealed some interesting statistics on the famous saga. Drawing on the principles of graph theory, which harnesses computing power and mathematical calculations, they analyzed hundreds of web pages devoted to the legendary film series created in the 1970s by producer George Lucas. Apart from highlighting Star Wars geeks’ prodigious output, their study could prove interesting for a large number of scientific fields when it comes to extracting and analyzing data.
The Star Wars universe is huge. In addition to seven full-length films, the story is further developed in ways the general public may not follow, such as books and video games, which have gradually added episodes and expanded the saga. The EPFL study, which was conducted at the Signal Processing Laboratory 2 (LTS2) under the direction of Professor Vandergheynst, gives some idea of the scope of the saga.
80% human beings
“Fans will be surprised to learn, for example, that we came up with over 20,000 characters,” said Kirell Benzi, a PhD student and the project lead. Among them, 7,500 play an important role. There are also 1,367 Jedi and 724 Sith. All the characters are spread among 640 different communities on 294 planets. And an analysis of the 10 largest communities reveals an aberration: nearly 80 per cent of the galaxy’s population is human.
In addition to counting the number of characters and identifying their tribe of origin, the program also situates them in the timeline of the story. A very long timeline, since the saga covers 36,000 years, broken down into six main periods: before the Republic, the Old Republic, the Empire, the Rebellion, the New Republic, and the Jedi Order.
“To put some order into this massive forest of data, we based our approach on network analysis. In other words, all the connections that one character has with all the others,” said Xavier Bresson, an LTS2 researcher.
“Using these cross-references, we are able to accurately determine the time period of the character almost without fail, when this information is not directly provided in the books or movies.”
Part of the Star Wars character graph colored by era. Black nodes represent missing values. Red nodes: the Rise of the Empire Era (episodes 1,2,3). Blue nodes: the Rebellion era (episodes 4,5,6). Green ones: both eras.Figure 2: Result of the label propagation algorithm. Black nodes have been replaced by the best compromise using their neighbors.
Mapping out connections
The researchers want to use this study to demonstrate the program’s ability to extract and analyze digital data. “The program maps out connections in the mass of unorganized data available on the net,” said Benzi.
Given a huge amount of information, the algorithms developed by the LTS2 researchers offer a service that cannot be matched by human beings. In addition to extracting data according to extremely precise criteria, the algorithms can also create links among data points, sort them, quantify them, interpret them and find missing information. All this in very little time. The results are then presented in the form of interactive charts that are easy to read and understand.
In the long run, this tool could find applications in many fields. “Once enough documents and archives have been digitized,” said Bresson, “this method could be useful in filling knowledge gaps that remain in historical and sociological research and in numerous scientific fields as well.”
Here is Kirell Benzi’s blog, where he summarizes the results of the study. In the future, he intends to pursue the analysis with the transmission of the “force” from Jedi and Sith masters to their pupils.
Article by Sarah Perrin, EPFL Mediacom