How can data help in research on COVID-19?
Every single thing we do is based on data and again also produces data. Some of these data we might never get into our hands even though they exist.
Think about the number of left-handed people at any given moment in time on this planet. Indeed it is a natural number. Albeit, it does not make a life-changing difference as much as I can tell from my point of view. However, it would still be interesting to know, but the effort to find out this number would outweigh the benefit generated from the knowledge.
We think that some data in some contexts are more important than others, and since the world works on data, we have come up with the idea that data in COVID-19 might open up a door to get an earlier view on some correlations that years later we will find in lab tests.
So we ask citizens for altruistic, non-payed, data donation. We have employed an AI algorithm to sort these dirty data and find patterns that provide hints to medical correlations we didn’t know about. Then we want to analyze the data within our batch of hypotheses that we had generated a priori. For example, we wanted to understand how well resochin derivates protect those who take them regularly – most likely because they get them for rheumatoid disease anyway already. Or we wanted to see if habits such as smoking do have an impact on how the disease progresses. Our AI is trying to filter all the noise in the dirty data to push to the core.
Whenever we find traces, we pass results and raw data on to scientific institutions. They double-check and possibly then employ targeted lab tests to see if there are correlations or even causal connections between ours, what we call them, “computer-generated biomarkers,” and clinical or laboratory findings. We hope to produce a set of these computer-generated biomarkers and so be of help in the quest of deciphering the COVID-19 chameleon.
Are data donated by citizens, for example, from wearables or online questionnaires, reliable enough to draw conclusions based on them?
We have decided not to tap into gadgets and wearables as others do, but we explicitly ask the people to provide longitudinal data about their health status. In this sense, we are not a tracking app or a tracing app but a citizen science program, i.e., web-app that invites to participate and become a citizen researcher.
What we do is we collect “dirty data.” They are “dirty” if you compare them to randomized controlled trials (RCT), but what makes them interesting is the fact that in order to approach Big Data, we are looking at 500.000 data sets, i.e., big data rather than small data in RCTs. Of course, we come from the school of evidence-based medicine, but we are moving fast into the area of emergence based medicine, which might also be called data-driven medicine.
This age comes with the mastery of algorithms that will do a lot of work for you, be it separating data or pointing to interesting data sets. The magic lies in the combination of Big Data, Machine Learning, and Artificial Intelligence, all of which transferred into the medical and healthcare area.
The project “Faster Than Corona” aims to gather data to learn more about the virus and “save lives.” What benefits do you expect?
As claimed, we want to do exactly this: learn more about the virus by cutting through the data and discerning noise from patterns and use the patterns detected through machine learning and advanced mathematical methods to save lives. It is the proverbial golden needle in the haystack.
We hope to find out if specific populations are at higher risk to conceive COVID-19 or how superspreader – if they exist at all – behave and whether they can be detected from the data at an earlier point in time.
Over 11000 data donations have so far been submitted by citizens
Besides these hypotheses that we have set up together with our medical experts, we think it feasible that the algorithm will be able to detect further patterns and from them generate hypotheses by itself. All of these, of course, we will have to run through labs where we need to see if our data is reflected in biology.
But we hope to accelerate scientific knowledge expansion by setting the target a little narrower and defining a little closer what we might be looking for. Honestly, it is not a lot we know about COVID-19 today, and some of the older knowledge seems to be dated already while we find it challenging to put piece by piece together in our quest to approximate truth by data. It is a little bit as if every single person had a small torch and would shine a light on COVID-19. We are trying to turn the light on in the room by synchronizing all the flames. Then we can get the bigger picture and will know more about how to save lives.
Are the first conclusions of the study available yet?
Yes, we do have some conclusions that we have presented on a congress on data science already. As our mission statement says, those findings do not pertain to medical information or recommendations to take your medicine. We leave it to authorized institutions to do this. We have been reporting on the number of users and what they are looking at, where they come from, and how we got their attention. It makes us proud that as an entirely self-funded project that comprises a number of leading people in the healthcare systems in Europe, we have managed to become the largest data donation platform in COVID-19 on a global scale with donors coming from 81 countries so far.
How would you rate the readiness to share data by citizens?
I think in general, people are open-minded to share some data, even medical data. For that, they have to understand the overall benefit. Even altruism or the idea to contribute to scientific advances can be a good reason.
The governance structure of the group that asks for data donation needs to be transparent. I would think the fact that our project is pro bono and we are not getting money or funds from anyone makes us more credible and unsuspicious. This helps in making people trust us. Next, it is essential to communicate correctly and break even the most complicated things down into lay person’s language.
Our main pitfall is the over obedient discussion on data security. We thought, on the verge of the lockdown, when we first initiated www.fasterthancorona.org, that finally, the time has come to do something new that in regular times would have been devoured by the official collective of change inhibitors and given our agility, we could get by with what we do. For some time, it looked quite promising, but some people still got hung up on the data safety thing, even though it is all clearly formulated on our website’s legal disclaimer. This deprived us of the opportunity to generate some more media attention, and through this, more data doners. It is a little bit like showing the moon to someone, and they keep staring at the finger. We need to completely re-think data security anyway. Just think of acoustic biomarkers that detect diseases in the way you speak. How should that be regulated?
After all, it occurs to me that data security and data privacy at the end of the day is for healthy people. Those in dire need for a cure consider data privacy utterly different if it carries the perspective to save their life.
The current crisis shows that not everyone trusts scientific data. Do you have an idea of how to change it?
It is interesting what happens in a crisis. Politics and science get into a strange liaison. This tends to confuse a lot of people.
We found out in our “question of the day survey on fasterthancorona.org that people tend to trust those more who can explain understandably. It does not correlate with academic titles, and the trust in social media and politicians, in general, did not seem to be exuberant.
We have, from the beginning, gone the route of transparency. This means people can opt-out from data donation at any given point in time. We have a very understandable legal disclaimer which does not require a JD and a Ph.D. We open our data vault to anyone with a valid research question, and we cooperatively share our data with trustworthy institutions. It is our clear idea to go from Citizen to Citoyen, to be a movement of citizen scientists who employ a new methodology to be faster than the natural spread of the virus, hence our name.
Do you want to donate data? Visit www.fasterthancorona.org