Big Data refers to datasets that the standard data processing software cannot deal with because of their size, variability, and complexity. To extract valuable information, such datasets need new designs, techniques, algorithms, and analytics. As the amount of data in any data set increases over a critical point, the issues of data quantity transform into quality ones. This includes issues regarding capturing, processing, storage, analysis, and visualization of data.
Big Data is defined by the Health Directorate of the Directorate-General for Research and Innovation of the European Commission, in the context of health research, as high volume, high diversity biological, clinical, environmental, and lifestyle information collected from single individuals to large cohorts, in relation to their health and wellness status, at one or several time points.(1)
Although Big Data is frequently characterized by “the 4 Vs” (volume, velocity, variety, and veracity), Big Data is unique beyond the scope of the characteristics such as size or volume. Its ability to relay the status quo almost without any bias, connect with other data sets, accumulate value with time, and create multi-dimensional system-level understanding should be considered(2). Lately, a 5th V is being considered for Big Data: value.
Specific to healthcare, Dinov et al. mention two important additional characteristics of Big Data:
Some of the primary sources of Big Data in healthcare are administrative databases (insurance claims and pharmaceuticals), clinical databases, electronic health record data, and laboratory information system data. The other data comes from biometric registries (wearable or sensor-generated), patient-reported data (standardized health surveys), social media, medical imaging, and biomarkers (including genomic, proteomic, and metabolomic data).(2)
Big Data analytics commonly take advantage of methods developed in data mining like classification, clustering, and regression. Rumsfeld et al. describe at least eight areas of application of big data analytics to improve healthcare: 1) predictive modeling for risk and resource use, 2) population management, 3) drug and medical device safety surveillance, 4) disease and treatment heterogeneity, 5) precision medicine and clinical decision support 6) quality of care and performance measurement 7) public health and 8) research applications. (3)
The development of Big data analytics faces many challenges like integrating structured, semi-structured, and unstructured data from many resources and managing fragmented data. The development is also halted by the limitations of observational data, data structure, and data standardization issues. Other issues include data inaccuracy and inconsistency, data reliability, semantic interoperability, network bandwidth, scalability, and cost when building the analytics system. Future research and the application of real-world experiences will direct the development of these technologies.(3)
Big Data refers to datasets that the standard data processing software cannot deal with because of their size, variability, and complexity. To extract valuable information, such datasets need new designs, techniques, algorithms, and analytics. As the amount of data in any data set increases over a critical point, the issues of data quantity transform into quality ones. This includes issues regarding capturing, processing, storage, analysis, and visualization of data.
Big Data is defined by the Health Directorate of the Directorate-General for Research and Innovation of the European Commission, in the context of health research, as high volume, high diversity biological, clinical, environmental, and lifestyle information collected from single individuals to large cohorts, in relation to their health and wellness status, at one or several time points.(1)
Although Big Data is frequently characterized by “the 4 Vs” (volume, velocity, variety, and veracity), Big Data is unique beyond the scope of the characteristics such as size or volume. Its ability to relay the status quo almost without any bias, connect with other data sets, accumulate value with time, and create multi-dimensional system-level understanding should be considered(2). Lately, a 5th V is being considered for Big Data: value.
Specific to healthcare, Dinov et al. mention two important additional characteristics of Big Data:
Some of the primary sources of Big Data in healthcare are administrative databases (insurance claims and pharmaceuticals), clinical databases, electronic health record data, and laboratory information system data. The other data comes from biometric registries (wearable or sensor-generated), patient-reported data (standardized health surveys), social media, medical imaging, and biomarkers (including genomic, proteomic, and metabolomic data).(2)
Big Data analytics commonly take advantage of methods developed in data mining like classification, clustering, and regression. Rumsfeld et al. describe at least eight areas of application of big data analytics to improve healthcare: 1) predictive modeling for risk and resource use, 2) population management, 3) drug and medical device safety surveillance, 4) disease and treatment heterogeneity, 5) precision medicine and clinical decision support 6) quality of care and performance measurement 7) public health and 8) research applications. (3)
The development of Big data analytics faces many challenges like integrating structured, semi-structured, and unstructured data from many resources and managing fragmented data. The development is also halted by the limitations of observational data, data structure, and data standardization issues. Other issues include data inaccuracy and inconsistency, data reliability, semantic interoperability, network bandwidth, scalability, and cost when building the analytics system.
Future research and the application of real-world experiences will direct the development of these technologies.(3)