What real-world data can be collected during a clinical study?
Traditional clinical trials capture only a fraction of patients' reality. Real-world data — collected via mobile applications, connected devices, EMA, and registries — revolutionize our understanding of cognitive and mental disorders. A comprehensive guide for researchers, clinicians, and patients.
Definitions: RWD, RWE and their fundamental differences
Before exploring the types of data and collection methods, it is essential to clarify the vocabulary — as the terms are often incorrectly used interchangeably.
Real-World Data (RWD) — or real-world data — are all data related to patients' health status and care delivery collected outside of randomized controlled clinical trials. They can come from electronic health records, reimbursement databases, patient registries, wearable sensors, mobile applications, health social networks, or observational studies.
Real-World Evidence (RWE) — real-world evidence — are the clinical evidence generated by the rigorous analysis of RWD. RWD are the raw material; RWE are the result of applying scientific methodology to this material. Both the FDA and EMA have developed frameworks to accept RWE in marketing authorization submissions — a major transformation for the pharmaceutical industry and biomedical research.
Why are RWD crucial in mental and cognitive health?
Mental and cognitive disorders have characteristics that make them particularly difficult to study within the framework of traditional clinical trials. Intra-individual variability is considerable — a depressed patient may feel very differently from Monday to Friday, or depending on the season, or their relational context. This variability is invisible in a monthly consultation assessment. Similarly, the cognitive manifestations of disorders like ADHD, the aftermath of Stroke, or the early stages of Alzheimer's disease are deeply contextual — the environment, fatigue, stress modulate them in real time.
RWD allow us to capture this dynamic complexity. They make visible what happens between consultations, in the real lives of patients — which represents 99% of their existence.
The major categories of real-world data in clinical settings
1. Health system data (administrative and clinical data)
These are the most commonly used RWD in observational research. They include electronic health records (EHR), reimbursement data from Health Insurance (SNDS in France — National Health Data System), patient registries (cancer registries, rare disease registries, Alzheimer registries), hospital databases (PMSI, drug databases), and prescription data. These databases are valuable for large-scale epidemiological studies — they allow the analysis of hundreds of thousands or even millions of care pathways. Their limitation is that they only capture what is coded and reimbursed — they miss subjective, behavioral, and contextual data.
The largest health data repository in Europe
The National Health Data System (SNDS) covers all care reimbursements for the 67 million insured individuals in France, making it one of the largest health databases in the world. Its access is regulated by the Health Data Hub and requires authorization from the CNIL. For mental health research, it allows the study of care trajectories, treatment adherence, comorbidities, and large-scale hospitalizations — but does not contain data on symptoms, daily functioning, or quality of life.
2. Data collected by patients themselves (PRO)
Patient-Reported Outcomes (PRO) are data reported directly by patients, without interpretation by a clinician — quality of life scores, pain levels, symptom intensity, satisfaction, treatment adherence. In mental health, they are particularly valuable because many key symptoms (mood, anxiety, energy, intrusive thoughts) are only accessible through self-reporting.
Traditional paper questionnaires (PHQ-9 for depression, GAD-7 for anxiety, MADRS) remain clinical references. But their point-in-time administration in consultations does not capture temporal variability. That is why EMA methods (see below) are revolutionizing PRO collection in contemporary research.
3. Digital behavioral data (Digital Biomarkers)
One of the most spectacular innovations in recent years is the ability to collect digital biomarkers — objective measures of behavior and physiology captured continuously by digital devices. These data include heart rate and its variability (via smartwatches), patterns of physical activity and sedentary behavior (accelerometers), sleep quality and duration (actigraphs), geographic movement patterns (GPS), frequency of phone calls and messages, typing patterns (typing dynamics), and voice data (prosody, fluency, pauses).
These passive digital biomarkers — collected without the patient having to "do anything" — are particularly valuable in mental health research. Studies have shown that changes in sleep, activity, and communication patterns can precede documented depressive or manic episodes by several days — opening new perspectives for relapse prevention.
4. Data from digital cognitive tests
Cognitive tests administered via mobile applications represent a revolution for research in cognitive neuroscience and psychiatry. Unlike annual neuropsychological assessments conducted in clinics, short digital tests can be administered daily or weekly — capturing the temporal variability of cognitive performance.
Tests like the Trail Making Test, the Stroop, N-back working memory tests, or reaction time tests can be administered in 2 to 5 minutes on a smartphone. The collected data allow for the detection of subtle changes in cognitive performance that precede clinical manifestations — a promising application for early detection of Alzheimer's disease, monitoring the aftermath of Stroke, or tracking treatment effectiveness.
DYNSEO cognitive tests — Memory Test, Concentration and Attention Test, Executive Functions Test — are examples of digital tools enabling regular and accessible assessment of cognitive functions outside the clinical context. These data, collected repeatedly, constitute a dynamic profile of cognitive evolution — valuable for both clinical follow-up and research.
EMA (Ecological Momentary Assessment): the revolution of real-time capture
Ecological Momentary Assessment (EMA) — also called experience sampling method — is a data collection method that involves asking participants about their state (mood, symptoms, behaviors, context) at multiple and varied moments in their daily lives, via a smartphone or a dedicated application.
Why EMA changes everything for mental health research
The fundamental problem of traditional clinical assessment is that it is retrospective and point-in-time. When a patient fills out a weekly depression questionnaire, they try to "average" their week — which generates considerable biases (recall bias, moment of assessment effect, anchoring bias). EMA solves this problem by capturing the person's real state at the very moment they respond.
In practice, EMA sends notifications several times a day (generally 3 to 8 times) at random or semi-random moments. The person responds to 5-15 short questions about their emotional state, symptoms, social context, and behaviors. The entire set of responses over several weeks constitutes a dense data curve that reveals patterns, triggers, cycles, and individual variability that point-in-time assessments would never have detected.
🔬 Examples of what EMA can reveal that traditional assessments miss
In depression: the times of day when mood is consistently lower, triggering social situations, the relationship between the quality of sleep the previous night and mood the next morning.
In ADHD: the moments of the day when attention is at its peak (allowing for the planning of demanding tasks), the impact of diet and exercise on concentration, triggers of impulsivity.
In early Alzheimer's: the first fluctuations in cognitive abilities, environmental factors that improve or deteriorate performance, the progression of difficulties over the weeks.
The challenges of EMA
EMA is not without limitations. The burden on the participant is real — responding to notifications several times a day for weeks generates fatigue and can affect compliance. Dropout rates in EMA studies are high if the burden is not well calibrated. Selection biases (participants who complete are different from those who drop out) can affect external validity. And the confidentiality of very granular data (behaviors, locations, emotional states) raises significant ethical questions.
Connected objects and wearables: passive sensors of real life
Actigraphs and smartwatches
Actigraphs (advanced pedometers) and smartwatches (Apple Watch, Garmin, Fitbit, Withings) continuously collect data on physical activity, sleep (duration, stages, nighttime awakenings), and heart rate. These passive data are particularly valuable in mental health research as they objectify constructs often reported subjectively: "I sleep poorly," "I am exhausted," "I do nothing anymore."
Studies have shown that heart rate variability (HRV) measured continuously is a proxy for the functioning of the autonomic nervous system — and reflects the state of stress, anxiety, and emotional regulation. Apps like Garmin Health or Apple Health generate daily HRV data that can serve as biomarkers in mental health studies.
Voice sensors and speech analysis
Automatic voice analysis represents one of the most promising frontiers of digital biomarkers in mental health. Vocal characteristics such as speech rate, pauses, pitch, energy, response latency, and intonation patterns change measurably in depression, schizophrenia, dementia, and other mental disorders. Machine learning algorithms trained on thousands of hours of recordings can detect these changes with a precision that favorably compares to standardized clinical assessments.
Behavioral analyses via smartphone
The smartphone itself is a sensor of daily behavior. The frequency and duration of calls, messaging patterns, geolocation (mobility, frequented places), ambient light (indicator of outdoor outings), and even micro-patterns of screen unlocking constitute dense behavioral data. Studies have shown that these passive data can predict episodes of depression, anxiety, and psychosis with remarkable accuracy — opening up prospects for early warning systems.
Mobile health applications in clinical studies
Mobile health applications — from simple mood tracking apps to validated cognitive stimulation tools — play a dual role in RWD studies: data collection (via usage logs and exercise results) and therapeutic intervention (whose adherence and effectiveness can be measured in real time).
Emotional regulation and symptom tracking applications
Apps like Daylio, Moodpath, or Woebot allow users to track their mood, behaviors, and thoughts daily. In a research context, the aggregated and anonymized data from these apps provide a valuable source of RWD to study emotional patterns in large populations.
Clinical tools like the DYNSEO Emotion Thermometer, the Emotional Regulation Toolkit, and the 12 Strategies for Calming Down allow for the collection of data on the actual use of regulation techniques — which strategy is chosen, in what contexts, with what effectiveness. These ecological usage data significantly enrich our understanding of the effectiveness of mental health interventions.
Cognitive stimulation and testing applications
Cognitive stimulation applications — like CLINT for adults or SCARLETT for seniors — generate valuable data on longitudinal cognitive performance. Usage logs (frequency, session duration, exercise results, level reached, dropout) constitute RWD that allow for the study of engagement in cognitive stimulation, its evolution over time, and the factors associated with adherence or dropout.
For research on digital interventions in Alzheimer's, Parkinson's, or after a Stroke, these real usage data provide an ecological dimension that laboratory efficacy studies cannot offer. An application may show excellent results in a controlled clinical trial — but if patients do not use it in real life, its population impact will be limited. RWD allows for precise study of these adoption and engagement issues.
Methods for analyzing real-world data: methodological challenges
The confounding bias: the central challenge
The main limitation of RWD studies compared to randomized trials is the absence of randomization — and thus the potential presence of confounding biases. If patients receiving treatment A are systematically different from those receiving treatment B (younger, less ill, with better access to care), the comparison of their outcomes reflects these differences as much as the treatment effect. Several statistical techniques allow for the correction of these biases: propensity score matching, instrumental analyses, case-control studies, and structural causal models (Directed Acyclic Graphs).
Time series analysis and longitudinal data
EMA and wearable data generate dense time series — hundreds or thousands of measurement points per participant over weeks or months. Analyzing these data requires specialized statistical methods that capture their temporal structure: mixed models with random effects, vector autoregressive models (VAR) to study relationships between variables over time, network analysis to map dynamic interactions between symptoms.
A methodological revolution for mental health
The network approach, developed notably by Borsboom and Cramer, conceptualizes psychiatric disorders not as discrete entities (a "disease" causing symptoms) but as networks of interconnected symptoms that self-maintain. In this model, longitudinal RWD allows for the identification of which symptoms are the most "central" (most influencing others), which links activate first during a relapse, and which interventions could most effectively deactivate the pathological network. This approach opens up unprecedented personalized therapeutic perspectives.
Artificial intelligence and machine learning
The volume and complexity of RWD have made machine learning and artificial intelligence approaches essential. Deep learning algorithms can detect patterns in vocal, behavioral, and physiological data that escape traditional statistical analysis. The DYNSEO AI Coach illustrates this direction: an intelligent support system that learns usage patterns to personalize recommendations.
The ethical and regulatory framework of RWD in health
GDPR, HDS, and health data governance
Health data are sensitive personal data, protected by the GDPR (General Data Protection Regulation) and, for hosted health data, by HDS certification (Health Data Host) in France. Any collection of health data in a research context requires the informed consent of participants, the approval of a Committee for the Protection of Persons (CPP), and often authorization from the CNIL (National Commission on Informatics and Liberty).
The French Health Data Hub (GIE that facilitates access to SNDS data and their cross-referencing with other databases) has become the central tool for RWD research in France. Its use is governed by expert committees that assess scientific interest, the proportionality of the requested data, and the guarantees for the protection of individuals.
Selection biases in digital data
An important ethical and methodological challenge of digital RWD is their potential for representativeness bias. Users of smartwatches, smartphones, and health applications are not representative of the general population — they are on average younger, wealthier, more educated, and more engaged in their health. Studies relying on this data risk producing valid evidence for these populations but are difficult to generalize to elderly people, disadvantaged individuals, or those with low digital literacy.
⚠️ The digital divide: a blind spot in RWD
The most vulnerable individuals in mental health — elderly people with dementia, homeless individuals, people in great precariousness — are often the least represented in digital RWD. Studies that ignore this digital divide risk producing relevant evidence for the more advantaged populations but may exacerbate health inequalities by directing innovations towards populations that may need them the least.
Practical applications for mental and cognitive health research
Early detection of dementia
One of the most promising applications of RWD in clinical neuroscience is the early detection of cognitive disorders, years before the clinical manifestation of dementia. Research teams have shown that digital biomarkers — subtle changes in GPS movement patterns, typing speed, performance on short cognitive tests — can detect changes that precede the first clinical symptoms of Alzheimer's disease by 2 to 5 years.
Regular monitoring of cognitive performance through tests like the DYNSEO Memory Test and the Concentration Test, conducted monthly at home on tablet or smartphone, could constitute an ecological longitudinal monitoring protocol for at-risk populations.
Monitoring interventions in psychiatry
Real-time monitoring of responses to psychiatric treatments is another area where RWD is transforming practice. Instead of waiting for the monthly consultation to know if an antidepressant is starting to take effect or if a patient is relapsing, weekly EMA data allows for continuous therapeutic adjustment. The DYNSEO Anxiety Cognitive Restructuring Sheet and the Emotional Regulation Toolbox fit into this ecological intervention logic — providing tools usable in daily life and whose use itself constitutes relevant research data.
Effectiveness of digital interventions
RWD allows for the assessment of the actual effectiveness of digital interventions — CBT applications, cognitive stimulation tools, mindfulness programs — in ecological conditions. Engagement (number of sessions, duration, regularity), performance trajectory (improvement, plateau, decline), and predictive factors of adherence provide valuable data to improve these tools and personalize recommendations.
Towards pragmatic and hybrid studies
The future of clinical research is likely in hybrid studies that combine the rigor of randomized trials with the richness of RWD. Pragmatic trials collect data in real care conditions rather than in specialized research centers. Platform studies allow for the simultaneous evaluation of multiple interventions with adaptive adaptation. And "in silico" trials — which use digital twins or computational models powered by RWD — allow for the simulation of clinical trials before conducting them in real life, reducing costs and timelines.
Conclusion: RWD, the new frontier of personalized medicine
Real-world data transform our ability to understand mental and cognitive disorders in all their dynamic complexity. They allow us to move away from the "snapshot" model in consultations to access the "film" of the patient's daily life. This methodological revolution holds the promise of a more personalized, preventive, and equitable medicine — provided that ethical challenges (data protection, digital divide, representativeness bias) are fully addressed. DYNSEO contributes to this ecosystem with quality digital tools — cognitive tests, stimulation applications, emotional regulation tools — whose usage data can feed into tomorrow's research.
Discover DYNSEO cognitive tests →FAQ
What are real-world data (RWD)?
Health data collected outside of controlled clinical trials — medical records, reimbursements, mobile applications, sensors, registries. They capture the real life of patients outside the clinical context.
Difference between RWD and RWE?
RWD = raw data. RWE = scientific evidence generated by the rigorous analysis of RWD. The distinction is crucial for regulatory authorities (EMA, FDA).
What is EMA and why is it valuable in mental health?
Ecological Momentary Assessment: questionnaires sent multiple times a day via smartphone to capture the actual state in real time. Reveals the variability of symptoms that is invisible in point-in-time assessments during consultations.
What ethical challenges do RWD pose?
Data protection (GDPR, HDS), informed consent, risks of re-identification, digital divide, representativeness bias, ownership and governance of health data.
Can mobile applications be used in clinical studies?
Yes — for EMA, repeated cognitive tests, behavioral and emotional tracking. They require rigorous validation as measurement instruments and a strict ethical framework.