Big Data and Big Data Analytics

Published by Daniel at April 1, 2022

Big Data refers to vast and unique data sets that are changing the frontiers of many practices; healthcare is no exception...

Big Data refers to datasets that the standard data processing software cannot deal with because of their size, variability, and complexity. To extract valuable information, such datasets need new designs, techniques, algorithms, and analytics. As the amount of data in any data set increases over a critical point, the issues of data quantity transform into quality ones. This includes issues regarding capturing, processing, storage, analysis, and visualization of data.

Big Data is defined by the Health Directorate of the Directorate-General for Research and Innovation of the European Commission, in the context of health research, as high volume, high diversity biological, clinical, environmental, and lifestyle information collected from single individuals to large cohorts, in relation to their health and wellness status, at one or several time points.(1)

Characteristics of Big Data

Although Big Data is frequently characterized by “the 4 Vs” (volume, velocity, variety, and veracity), Big Data is unique beyond the scope of the characteristics such as size or volume. Its ability to relay the status quo almost without any bias, connect with other data sets, accumulate value with time, and create multi-dimensional system-level understanding should be considered⁽²⁾. Lately, a 5th V is being considered for Big Data: value.

Volume points to the quantity of Big Data in healthcare. Total global data storage is projected to exceed 200 zeta bytes by 2025 (a measure of digital storage capacity equivalent to 1 billion terabytes). This includes data stored on private and public infrastructures, cloud data centers, personal computing devices, and IoT (Internet-of-Things) devices.

Velocity refers to the speed of data generation as well as data collection.
Variety refers to the different types of healthcare Big Data collected. Data collected has markedly heterogeneous characteristics and varies between structured and unstructured medical data.
Veracity refers to the fact that Big Data sets seek the precision of their massive data. This is done by checking inconsistencies, missing data, ambiguities, fraud, duplication, spam, and latency.
Value refers to the cost-benefit for the decision-maker. Big Data allows decision-makers to take actions based on insights derived from the data.

Specific to healthcare, Dinov et al. mention two important additional characteristics of Big Data:

Energy: This refers to the holistic information included in the data. The energy of the aggregated dataset is much higher than individual databases making it more beneficial for exploring associations.
Life-span refers to the value of the data past the time of acquisition that decays at an exponential rate.(2)

Sources of Healthcare Big Data

Some of the primary sources of Big Data in healthcare are administrative databases (insurance claims and pharmaceuticals), clinical databases, electronic health record data, and laboratory information system data. The other data comes from biometric registries (wearable or sensor-generated), patient-reported data (standardized health surveys), social media, medical imaging, and biomarkers (including genomic, proteomic, and metabolomic data).(2)

Big Data Analytics

Big Data analytics commonly take advantage of methods developed in data mining like classification, clustering, and regression. Rumsfeld et al. describe at least eight areas of application of big data analytics to improve healthcare: 1) predictive modeling for risk and resource use, 2) population management, 3) drug and medical device safety surveillance, 4) disease and treatment heterogeneity, 5) precision medicine and clinical decision support 6) quality of care and performance measurement 7) public health and 8) research applications.⁽³⁾

Perspectives

The development of Big data analytics faces many challenges like integrating structured, semi-structured, and unstructured data from many resources and managing fragmented data. The development is also halted by the limitations of observational data, data structure, and data standardization issues. Other issues include data inaccuracy and inconsistency, data reliability, semantic interoperability, network bandwidth, scalability, and cost when building the analytics system. Future research and the application of real-world experiences will direct the development of these technologies.⁽³⁾

Characteristics of Big Data

Volume points to the quantity of Big Data in healthcare. Total global data storage is projected to exceed 200 zeta bytes by 2025 (a measure of digital storage capacity equivalent to 1 billion terabytes). This includes data stored on private and public infrastructures, cloud data centers, personal computing devices, and IoT (Internet-of-Things) devices.
Velocity refers to the speed of data generation as well as data collection.
Variety refers to the different types of healthcare Big Data collected. Data collected has markedly heterogeneous characteristics and varies between structured and unstructured medical data.
Veracity refers to the fact that Big Data sets seek the precision of their massive data. This is done by checking inconsistencies, missing data, ambiguities, fraud, duplication, spam, and latency.
Value refers to the cost-benefit for the decision-maker. Big Data allows decision-makers to take actions based on insights derived from the data.

Specific to healthcare, Dinov et al. mention two important additional characteristics of Big Data:

Energy: This refers to the holistic information included in the data. The energy of the aggregated dataset is much higher than individual databases making it more beneficial for exploring associations.
Life-span refers to the value of the data past the time of acquisition that decays at an exponential rate.(2)

Sources of Healthcare Big Data

Big Data Analytics

Perspectives

Future research and the application of real-world experiences will direct the development of these technologies.⁽³⁾