Big Data for Patients

Big data in health is patient data, so patient participation is vital in creating beneficial and patient-centered research initiatives. The new and emerging field of data science is rapidly evolving, and the methods of accessing, integrating, and interpreting data are becoming increasingly complex.

Big Data For Patients (BD4P) was a two-year Reagan-Udall Foundation for the FDA program designed to give patient advocates a basic understanding of the science and enable full and effective participation in big data initiatives. It brought together multiple stakeholder groups to create a replicable training program, with tools and best practices now accessible on this site.

BD4P was partially funded through a Patient-Centered Outcomes Research Institute (PCORI) Eugene Washington Engagement Award, with generous support from the American Society of Clinical Oncology (ASCO), the Biotechnology Innovation Organization (BIO), Celgene Corporation, Kaiser Permanente and PatientsLikeMe.

Why Patients Need Big Data

Data from electronic health records (EHRs), electronic medical records (EMRs), patient reported outcomes (PROs), and physicians’ notes can inform and improve patient care. But without training in methods and applications, patients and advocates may not feel comfortable or prepared for participating in big data design and evaluation. This is not only a setback for patient engagement, it is also a detriment to research. BD4P training helps patients understand what big data is, how it is used in research and medicine, its promises and limitations, its challenges and its potential impacts.

Big data can benefit patients across the spectrum of healthcare. It can be used to identify the most efficient and cost-effective treatments, recognize disease trends, forecast and prevent adverse events, and offer new treatments andproducts. In direct patient care, it can increase the coordination of care between providers by utilizing EHRs and other innovations. Big data can also lead to more comprehensive research, with findings informing a broad group of stakeholders. 

Patient populations who may benefit from the use of big data in health include:

  • Rare disease population – a gathering of rare disease data from around the world can create large compilations of information in areas where there is little available data 
  • People with chronic conditions – a large collection of multi-year data from disparate sources allows for the review of longitudinal effects and results
  • People with Parkinson’s/neurological disorders – wearables and similar methods of patient-provided input allows for continuous tracking of symptoms, effects, and other pertinent data
  • Healthy patients – may help to identify predispositions to certain diseases or conditions; could provide information on susceptibilities to illness by identifying specific risks and habits

What the Term Means

The term big data is often used to describe very large data sets that cannot be easily analyzed using traditional methods. Related to big data is data science, which refers to the multidisciplinary approach to analysis and evaluation of diverse data sets to garner information. The term “big data” is typically used for the tangible data, whereas “data science” is the practice of using and applying the data. 

Data science is the intersection of data analysis, computer science, and statistics, and can be used to discover new trends, analyze existing trends, and confirm reliability of data sets. Data science now involves the gathering and analyzation of data from disparate sources, including wearables (i.e. Fitbit or Garmin), smart phones, and social media platforms.

More definitions, and background on big data terminiology, are available in the first BD4P training module called "Introduction to Big Data and Data Science." All seven BD4P powerpoint training modules are available to download from this site's "Training Tools" tab.

Big data in health is data created by patients, physicians, hospitals, and researchers, and encompasses behavioral, health, genetic, exposure, and other types of data. Big data can be used as a tool for better outcomes in health care delivery and may aid in the understanding of the risks and benefits of certain medications, identification of trends in clinical trials, and development of new medicines. Big data in health can be employed in the forecasting and prevention of adverse events and the improvement care quality, research, and safety.

Big data in health can come from a variety of sources, including:

Big data is often described using "4 V's" -- volume, velocity, variety and veracity. Velocity refers to the speed of recorded or transmitted data and the rate of updates, either periodic or real-time. Volume is the term used to describe the size of the data set and to evaluate whether it is big enough to answer research questions with precision and help patients make informed decisions. Variety indicates the source and type of data being used (numerical, text, images), important in determining associations and illuminating trends. Veracity is a measure of how well the data reflects reality: its accuracy, trustworthiness and objectivity. 


How to Use Big Data

Data can be used in biomedical research to understand the inception and development of disease, detect new treatments, and develop new medical products to improve health. Utilizing big data in health and biomedicine is important because it allows for the global exchange of diverse data to supply the best information and improve decision-making. Ultimately big data in health can be used in the development of personalized analyses and treatments that are customized for each individual. 

Deriving meaning from health data is one of the main curriculum themes of BD4P and requires an understanding of data types and sources, as well as data science terminology. Use issues are outlined in the training module "Making Big Data Useable." Working with data from different sources presents challenges when data aren't organized in the same way or don't use the same units of measurement. Combining data for analysis can be problematic because of issues of integration between devices or software and interoperability between different information technologies. Structured data, organized into pre-defined models, is often easier to work with than data that is unstructured or collected from new sources like wearable medical devices or personal genonmics information.  For patients to use big data, it's important to understand the differences between standalone, federated and distributed databases (an example of which is the Foundation's IMEDS program, which uses a common data model to ensure interoperability and data standardization) 

How Big Data is Different

Traditional research is often defined as being primarily hypothesis-driven and can answer causation and association questions. In traditional research approaches, a hypothesis is tested through experiments like a randomized control trial or observational data. Big data research does not follow a set method or necessarily follow a pre-specified data collection/compilation plan. In big data research approaches, a hypothesis is generated after data reveal patterns. By accessing larger sets and types of data, it can overcome some types of sampling bias and identify differences in outcomes for particular demographics, even years after a treatment. 

Differences in approaches are outlined in the downloadable training module titled "Big Data vs. Traditional Research."

Policies and Privacy

It is important to remember that every benefit has risks and