How To Verify Health Apps So Doctors Could Prescribe Them

In November, the German parliament adopted a new law, which is expected to accelerate digitalisation in healthcare. The new regulations also include an option of reimbursing mobile medical apps. This makes Germany the first country in the world where a doctor will be able to prescribe a smartphone app just like a drug. However, progressive regulations raise an essential question: How to access the quality, usability and value for patients of the health apps?

Professional help or well-being-oriented tools?

Although only several years have passed since smartphones entered the market, health apps have already gained immense popularity. There are currently over 330,000 different mobile health apps. Yet, the vast majority of them have nothing to do with medical sciences. They are typically fitness apps whose purpose is motivating the users to play sports, monitoring their physical exertion, or helping them lose weight or abide by their diet.

Applications developed together with or by the medical professionals and based on scientific evidence constitute only a tiny percentage of all apps called “health apps”. An example of such an app is Kaia. It was designed to help patients with back pain by guiding them through the entire rehabilitation process while measuring the results at the same time. Another is Woebot – a chatbot developed based on cognitive-behavioural therapy, which helps patients suffering from light depression change their negative thought patterns and perception of reality.

Until now, all apps that even remotely concerned the subject of health, e.g. ones facilitating healthy eating, were classified as “health apps”. But as the digital tools began to resemble medical devices more and more, the term “digital therapeutics” came into play. Digital therapies are usually designed using evidence-based medicine. And here is where problems begin to emerge. How can we confirm the effectiveness of such applications? When it comes to medicine, the process is unambiguous. It is necessary to conduct highly-regulated clinical trials while abiding by all required safety guidelines and go through the entire process of introducing the drug onto the markets in all countries individually. In the case of applications, however, such guidelines do not exist. Their developers are guided by subjective criteria only.

We have to separate well-being and fitness apps from evidence-based therapeutic interventions.

For many applications, a certification obtained in a procedure meant for medical devices may serve as proof of their high quality. In Europe, the CE marking guarantees that a product is safe for patients (“indicates conformity with health, safety, and environmental protection standards for products sold within the European Economic Area”). Many applications, primarily those based on artificial intelligence, have already been approved for use as medical products. As the competition on the mobile health market increases, having a certification mark is becoming an advantage that helps convince investors and users. An increasing number of laws is being introduced in Europe to ensure that using such digital tools is safe. The high penalties provided for in the GDPR (General Data Protection Regulation) are meant to guarantee that any user data collected by them will be processed appropriately. In turn, on May 26, 2020, the Medical Devices Regulation (MDR), adopted by the European Parliament in May 2017, will enter into force as well. The MDR will cover numerous digital solutions.

Without going into the explicit details of the new law – software designed to monitor physiological processes, and therefore all applications measuring life parameters, will be classified as either class IIa or IIb software, provided that the information being monitored is vital to the patient's life. Systems providing the information required to make diagnostic or therapeutic decisions will be classified as IIa. If the data is critical for the patient’s well-being, the software will be marked as a class III product. Any other apps will still be designated as class I. Note that in this case, class I indicates the lowest level of risk associated with using the app while class III means the highest.

What about clinical tests?

Since prescription drugs must go through clinical trials, why should medical apps not meet the same standards? The answer: practically, clinical trials for apps make no sense. Research on new drugs may take years, and it certainly requires enormous financial resources. As such, only large pharmaceutical companies can afford it. Health apps are often developed by small startups which, despite occasionally having generous backing of wealthy sponsors, cannot afford to undergo this type of a complicated research process. Another critical issue is the fact that digital technologies tend to age very rapidly. An application which is not updated and developed further for a period of just 2 to 3 years becomes obsolete. At the same time, it typically takes around 8 to 10 years for a drug actually to be introduced onto the market following its invention.

It seems that the most sensible solution would be to develop guidelines for application developers and then follow up by creating an external validation system operated by an independent institution. The British healthcare system has already adopted such an approach, and the NHS now maintains its own Apps Library. Only solutions that successfully pass the screening process based on clearly defined criteria can be included in it. Thanks to this, the app’s creators know whether they want to create just another “lifestyle” app or one that meets the very demanding NHS guidelines. Therefore, as of March 2019, the NHS Apps Library contained only 79 solutions. Even though this is a rather unimpressive number, the patients can be sure that all these solutions have been precisely tested.

Case studies of different evaluation criteria

The criteria used by the NHS Apps Library include efficiency, medical safety, data protection, usability and availability, data interoperability, and technical stability. Several or even dozens of questions were prepared to evaluate them, with an appropriate number of points awarded for each answer.

Until today, several different rating systems have been created. The “Mobile Application Rating Scale (MARS)”, developed in 2016, primarily uses qualitative assessments such as commitment, functionality, aesthetics, information quality, objective quality assessment, and application characteristics. Another solution called Enligt, a methodology created by a group of scientists for apps used in psychiatry applications, takes into account such factors as usability, graphic design, user involvement, content, therapeutic impact, objective overall assessment, the app’s value as part of coordinated therapeutic activities, trustworthiness and reliability, evidence-based effectiveness, data protection, level of privacy as well as external verification of the app’s data protection mechanisms and verbal recommendations.

The American Psychiatric Association also adopted proprietary evaluation methods that are focused on elements such as data security/privacy, benefits/effectiveness – clinical evidence, ease of use, and interoperability. On the other hand, in Germany, the Bertelsmann Foundation synthesised various approaches and created the AppQ – a 9-point scale for transparent assessment of health apps’ quality.

Although many of these approaches use similar criteria, i.e. data security, usability, privacy and interoperability, it is never that easy to objectively assess the software. The utility factor alone is challenging to measure. Is an app which allows patients to access electronic medical records useful and does it add value? After all, access to information can increase the patients’ engagement in their own health and reinforce preventive healthcare practices as a result. How does one measure aesthetics or ease of use? These factors would be rated differently by a young person who is familiar with new technologies than a person who did not grow up in the age of digitisation (digital natives). And how do you measure the motivational factor, which depends heavily on the individual personality traits of the user? Considering the above, focusing primarily on concrete criteria, such as data security, data exchange in the healthcare system and interoperability seems very reasonable.

New concepts still have to be developed

Subjective criteria would thus require testing conducted by users representing the target groups of given solutions. However, we have already established that we wish to avoid approaching a clinical trial-like methodology. On the other hand, users of the Android and Apple store can already rate apps – the highest-rated ones enjoying better positioning in search results and greater trust. Theoretically, this kind of evaluation can be manipulated. The solution proposed by the Bertelsmann Foundation, however, includes forming a special coordinating commission. Here comes another problem: can we develop objective criteria for apps? Their impact on patient behaviour always depends on the patient’s personal characteristics, expectations and needs. For some, even a seemingly ordinary fitness app can lead to lifestyle changes that help to avoid cardiovascular diseases.

Once an app passes the evaluation and verification process, doctors will be able to recommend such mobile health solutions with ease as there will be no concerns regarding their personal responsibility. As of today, it is difficult to expect them to do so since they are responsible for their patients’ health and cannot rely on unverified methods. Besides, introducing prescription apps is such a significant change in health care traditions that we are talking about a real cultural transformation that may well take years to be accepted for good.