At the moment, Covid-19 is constantly changing lives everywhere – often for the worse, but in some cases as a catalyst for innovation. And for every day that passes, we are collecting more and more knowledge about this dangerous virus, its behavior and its impact on humans.
From day to day, this knowledge enables health authorities around the world to track and announce statistics and figures about the current conditions at hospitals as well as the total number of people infected by the virus.
But these figures and statistics are retrospective and do not necessarily predict how the situation will look in the future. Yet they can be very valuable data tools, like a crystal ball we can consult for future Covid-19 predictions.
And that is exactly what Project Coordinator Dolores Romero Morales and the EU project NeEDS (Network of European Data Scientists) is attempting by using coronavirus data to build a ‘predictor’.
The complex data that surround us
NeEDS is funded by the EU. The network behind the project consists of six academic and eight industrial participants from five EU countries, USA and Chile. CBS is one of its academic partners.
These participants exchange and combine expertise on Data Science, big data and machine learning to be used within industry sectors ranging from energy and retailing to insurance and banking, as well as national statistical offices.
And according to the Project Coordinator and Professor at the Department of Economics at CBS, Dolores Romero Morales, the project aims to unravel the complexities of dealing with data by facilitating more suitable ways of processing data, and conducting data analysis and data communication as well as facilitating putting data-driven decision making into practice.
“Over the past years, companies, public sectors and academics have been experiencing problems grappling with the growing amount of data that surrounds us,” Dolores Romero Morales says and explains:
“Companies and public sectors around Europe lack the capabilities necessary to extract and utilize data for data-driven decision making quickly enough, and in addition to that, Europe is remarkably far behind US academia in increasing Data Science capacity.”
“Therefore, this project aims to clarify, embed and improve the use of the data surrounding us,” she says.
However, when Covid-19 broke out, a new assignment landed on the Network of European Data Scientists’ table.
The project was launched on January 1, 2019, and one of the agendas that NeEDS has been working on involve issues related to artificial intelligence – intelligence demonstrated by machines.
Artificial intelligence is based on algorithms, and according to Dolores Romero Morales, some people view these algorithms in a negative light as ‘black boxes’ that calculate decisions on the basis of data, while no one really understand what happens inside ‘the box’.
In other words, algorithms can be very ambiguous and difficult to engage with, and that is just one issue that Dolores Romero Morales and her colleagues at the Network of European Data Scientists are working to change.
“The EU is very interested in making algorithmic decision-making calculations more transparent, so that is a top priority on our agenda: making the boxes more grey than black,” says Dolores Romero Morales.
And while NeEDS was working on assignments such as improving algorithm transparency, the coronavirus gradually began to severely impact on several European countries, including Spain.
The rapidly deteriorating situation in Spain led the Spanish Commission of Mathematics to form a group of experts tasked with helping the data-driven decision making related to Covid-19 issues in the country.
Shortly after, NeEDS joined the Spanish Commission of Mathematics to add its expertise in the field. And they have been working together ever since.
“Our contribution to the collaboration is that we’re trying to build an accurate predictor for the different metrics related to Covid-19,” Dolores Romero Morales says and continues:
“This means we’re building an algorithmic calculator that we can feed with the number of people who are in intensive care, how many hospital beds are occupied as well as how many people are carrying the virus at any given time.”
“And based on all these types of data, we’re developing a predictor using state-of-the-art machine learning methodology.”
Dolores Romero Morales gives a specific example of how the predictor works in terms of the collaboration with Spain.
One NeEDS participant, the University of Seville, is collaborating with different academic teams of researchers in the south of Spain, including the Spanish Commission of Mathematics.
Every day, all the teams collect the most recently accumulated data on the number of people in intensive care, how many hospital beds are occupied, how many people are confirmed to be currently carrying the virus, how many people have died, and how many people have recovered in the different regions of Spain.
From that data, a total prediction for the next day is produced covering the same five categories.
This produces a collective overall prediction for every day that passes, which can then be used to build on new and more accurate predictions as time passes.
In a nutshell, the predictor is based fundamentally on expert collaboration. However, there are no algorithmic predictors without data. And collecting the data required to help with coronavirus problems is not child’s play – not even for a project funded by the EU.
Easier said than done
Recently, various news media have been reporting about a group of Danish scientists who are criticizing the health authorities for holding back data that could potentially contribute towards reducing the societal and health-related consequences of Covid-19.
And even a network of European data scientists faces a range of challenges when collecting important data for an accurate algorithmic predictor, according to Dolores Romero Morales.
“Sometimes, the data we need are stored at the health authorities in different countries. And often, they’re not openly available to academics. In some instances, the data are not even accessible to public institutions such as national statistical offices,” she says and continues:
“So this is definitely one obstacle we’re dealing with but we constantly try to create awareness about the data collecting issues that we encounter. Sometimes, that’s easier said than done.”
At the same time, Dolores Romero Morales explains that the variables they are using in the predictor are defined in real-time and reflect the current situation and number of occupied hospital beds, people carrying the virus and so on.
But since the situation is constantly changing, they must keep redefining these variables.
Therefore, the Network of European Data Scientists and its collaborators must always monitor the latest data and search for data that are not immediately accessible.
And although her role as coordinator of such a project may seem a demanding and somewhat discouraging job at times, Dolores Romero Morales is very proud to be at the helm.
“I believe that it’s an asset for me to lead the project. Providing answers to some of the essential questions about the problems we’re currently facing is very valuable. I’m confident that we’re making an important contribution,” she says.