Big Data for Patients

Big data in health is patient data, so patient participation is vital in creating beneficial and patient-centered research initiatives. The new and emerging field of data science is rapidly evolving, and the methods of accessing, integrating, and interpreting data are becoming increasingly complex.

Big Data For Patients (BD4P) was a two-year Reagan-Udall Foundation for the FDA program designed to give patient advocates a basic understanding of the science and enable full and effective participation in big data initiatives. It brought together multiple stakeholder groups to create a replicable training program, with tools and best practices now accessible on this site.

BD4P was partially funded through a Patient-Centered Outcomes Research Institute (PCORI) Eugene Washington Engagement Award, with generous support from the American Society of Clinical Oncology (ASCO), the Biotechnology Innovation Organization (BIO), Celgene Corporation, Kaiser Permanente and PatientsLikeMe.

Why Patients Need Big Data

Data from electronic health records (EHRs), electronic medical records (EMRs), patient reported outcomes (PROs), and physicians’ notes can inform and improve patient care. But without training in methods and applications, patients and advocates may not feel comfortable or prepared for participating in big data design and evaluation. This is not only a setback for patient engagement, it is also a detriment to research. BD4P training helps patients understand what big data is, how it is used in research and medicine, its promises and limitations, its challenges and its potential impacts.

Big data can benefit patients across the spectrum of healthcare. It can be used to identify the most efficient and cost-effective treatments, recognize disease trends, forecast and prevent adverse events, and offer new treatments andproducts. In direct patient care, it can increase the coordination of care between providers by utilizing EHRs and other innovations. Big data can also lead to more comprehensive research, with findings informing a broad group of stakeholders. 

Patient populations who may benefit from the use of big data in health include:

  • Rare disease population – a gathering of rare disease data from around the world can create large compilations of information in areas where there is little available data 
  • People with chronic conditions – a large collection of multi-year data from disparate sources allows for the review of longitudinal effects and results
  • People with Parkinson’s/neurological disorders – wearables and similar methods of patient-provided input allows for continuous tracking of symptoms, effects, and other pertinent data
  • Healthy patients – may help to identify predispositions to certain diseases or conditions; could provide information on susceptibilities to illness by identifying specific risks and habits

What the Term Means

The term big data is often used to describe very large data sets that cannot be easily analyzed using traditional methods. Related to big data is data science, which refers to the multidisciplinary approach to analysis and evaluation of diverse data sets to garner information. The term “big data” is typically used for the tangible data, whereas “data science” is the practice of using and applying the data. 

Data science is the intersection of data analysis, computer science, and statistics, and can be used to discover new trends, analyze existing trends, and confirm reliability of data sets. Data science now involves the gathering and analyzation of data from disparate sources, including wearables (i.e. Fitbit or Garmin), smart phones, and social media platforms.

More definitions, and background on big data terminiology, are available in the first BD4P training module called "Introduction to Big Data and Data Science." All seven BD4P powerpoint training modules are available to download from this site's "Training Tools" tab.

Big data in health is data created by patients, physicians, hospitals, and researchers, and encompasses behavioral, health, genetic, exposure, and other types of data. Big data can be used as a tool for better outcomes in health care delivery and may aid in the understanding of the risks and benefits of certain medications, identification of trends in clinical trials, and development of new medicines. Big data in health can be employed in the forecasting and prevention of adverse events and the improvement care quality, research, and safety.

Big data in health can come from a variety of sources, including:

Big data is often described using "4 V's" -- volume, velocity, variety and veracity. Velocity refers to the speed of recorded or transmitted data and the rate of updates, either periodic or real-time. Volume is the term used to describe the size of the data set and to evaluate whether it is big enough to answer research questions with precision and help patients make informed decisions. Variety indicates the source and type of data being used (numerical, text, images), important in determining associations and illuminating trends. Veracity is a measure of how well the data reflects reality: its accuracy, trustworthiness and objectivity. 

 

How to Use Big Data

Data can be used in biomedical research to understand the inception and development of disease, detect new treatments, and develop new medical products to improve health. Utilizing big data in health and biomedicine is important because it allows for the global exchange of diverse data to supply the best information and improve decision-making. Ultimately big data in health can be used in the development of personalized analyses and treatments that are customized for each individual. 

Deriving meaning from health data is one of the main curriculum themes of BD4P and requires an understanding of data types and sources, as well as data science terminology. Use issues are outlined in the training module "Making Big Data Useable." Working with data from different sources presents challenges when data aren't organized in the same way or don't use the same units of measurement. Combining data for analysis can be problematic because of issues of integration between devices or software and interoperability between different information technologies. Structured data, organized into pre-defined models, is often easier to work with than data that is unstructured or collected from new sources like wearable medical devices or personal genonmics information.  For patients to use big data, it's important to understand the differences between standalone, federated and distributed databases (an example of which is the Foundation's IMEDS program, which uses a common data model to ensure interoperability and data standardization) 

How Big Data is Different

Traditional research is often defined as being primarily hypothesis-driven and can answer causation and association questions. In traditional research approaches, a hypothesis is tested through experiments like a randomized control trial or observational data. Big data research does not follow a set method or necessarily follow a pre-specified data collection/compilation plan. In big data research approaches, a hypothesis is generated after data reveal patterns. By accessing larger sets and types of data, it can overcome some types of sampling bias and identify differences in outcomes for particular demographics, even years after a treatment. 

Differences in approaches are outlined in the downloadable training module titled "Big Data vs. Traditional Research."

Policies and Privacy

It is important to remember that every benefit has risks and that big data is not a remedy to all concerns. However, some promises of big data use include better predictions about health risks, faster treatment developments, more rapid progress toward precision medicine, and more efficient use of health resources. Some risks of big data use include immature methodology resulting in incorrect interpretations and security breaches resulting in loss of privacy.

Concerns surrounding the use of big data in health include:

  • Privacy of collected data – who will see the data and how will confidentiality be protected?
  • Data control and sharing – who owns the data and how will it be shared?
  • Informed consent and returning results – who will make sure participants understand the risks, and will they be able to access the data?
  • Methods of data collection and analysis 
  • Ethical consideration of using predictive analytics to disseminate resources and make decisions 
  • Ethical concerns of responsible information use – who will ensure that decisions made from data results will be determined using sound judgement and understanding? 
  • Research priorities – what/who will be able to participate in research initiatives? 
  • Representative quality of collected data – is the data useful and representative of the correct population? 
  • Payment and sustainability – who will be responsible for paying for research and delivery initiatives, and how can these programs be sustained?

Privacy is the right to control access to ourselves and our personal information. As more and more data is being collected, several federal laws help protect patients' health data and privacy, including the Health Insurance Portability and Accountability Act (HIPAA), the Genetics Information Nondiscrimination Act (GINA) and a baseline standard of ethics in the Federal Policy for the Protection of Human Subjects, known as the Common Rule. How these laws impact patients is explored in further detail in the BD4P module titled "Legal and Ethical Issues." 

Data access, which referes to who can obtain and use data, and data stewardship, which looks at responsibilities and accountability in big data, are covered in the BD4P training module "Data Access and Stewardship." The training gives examples of transparency, how to articulate the purpose of research and how to minimize the amount of data collection relevant to the purpose. 

Role of Patient Advocates

A primary goal of BD4P was to develop a community of informed and empowered advocates who understand what big data is, how it is being used in research and medicine, its promises and limitations, the challenges, the impact on patients, and how they can use this knowledge. The final two training modules address "Advocates and other Big Data Stakeholders" and "Improving Advocacy Skills to Impact Big Data."

Patients can benefit from big data and help advance innovations by sharing data, outcomes, and experiences with other patients, physicians, and researchers. Examples of this are patient-focused data-sharing platforms in which patients upload their health information and consent to sharing it for research or with other patients. They can also participate in discussion boards and share information with other advocates. Another example of how big data can benefit patients is by providing access to diverse disease information in areas such as rare diseases where the patient population may be small and spread out. Advocates versed in big data can collaborate with others by:

  • Providing scientific and technical knowledge to navigate the landscape of big data
  • Applying evidence-based decision making to key issues in big data research of relevance to their constituency
  • Partnering with scientists, clinicians, and other to ensure patient-centeredness in big data research
  • Providing educated patient perspective at public forums and in print on issues relating to big data
  • Participating on scientific advisory committees, steering committees, review panels, study sections, etc.