Understanding Chronic Conditions Through Big Data

By Paul Cerrato

Reading through the blogs, commentaries and news analyses on big data, it's clear that the subject has captured the imagination of data scientists and clinicians alike. Unfortunately, many have let their imaginations run wild. Some see big data as the medical profession's savior, with the promise to transform it into a well-oiled machine that will pinpoint preventive strategies and result in accurate diagnoses and precisely calculated treatment options. Others demonize it, believing big data is mostly smoke and mirrors, the product of overly enthusiastic vendors and shortsighted health care executives.


There's little doubt that big data is more than a buzzword; it represents a cultural shift in the way many industries operate. However, to arrive at any meaningful conclusions on its value, we first have to define the term. One reason it's called "big" is because of its volume — there is a lot of it. While a personal computer may hold a terabyte or more of data on its hard drive (about 1,000 gigabytes), the servers at Beth Israel Deaconess Medical Center (BIDMC) in Boston, one of the hospitals affiliated with Harvard Medical School, hold more than three petabytes of patient data, the equivalent of the contents of the Library of Congress.1 (One petabyte is equivalent to about 1,000 terabytes.)

Gaining a Deeper Understanding of Big Data

Big data is also defined by several other “V’s,” including variety and velocity. The huge storehouses of data currently being analyzed by health care organizations like BIDMC include a diverse collection of information. It may include patients’ clinical information stored in electronic health records, billing information in the form of International Classification of Diseases (ICD) codes, the results of tens of thousands of patient satisfaction surveys, social media content, genome sequencing data and much more. The speed with which this data can be analyzed, through the use of supercomputers and networks of computers spanning the globe, would impress most futuristic-minded science fiction fans. The IBM Watson supercomputer, for example, can process more than 180,000 gigabytes of data per second. By way of comparison, a desktop computer can only process a small fraction of that amount.

The actionable insights being generated by data analytics are anything but fiction. A closer look at some of the studies demonstrates that analysis of these massive databases is improving health care. An evaluation of nearly 1 million patient records published in the Journal of the American Informatics Association, for instance, was able to identify patients at highest risk for thromboembolism, which can cause blood clots in the arms, legs, lungs or heart. Even more encouraging was the fact that the analysis — which was performed by Explorys, a data analytics division of IBM — only required minimal manpower and 125 hours to perform. A manual analysis would have taken far longer.2

Similarly, an Explorys project that looked at patient records from Catholic Health Partners in Cincinnati was able to increase breast cancer screenings by 13 percent. The same analysis was used to target diabetic patients in need of hemoglobin A1c testing, which measures long-term blood glucose levels. The project resulted in a 3 percent increase in HbA1c testing.3

The actionable insights generated by analyzing large volumes of health data have also had an impact on clinicians trying to control high blood pressure, one of the nation's most serious public health concerns and a major risk factor for heart disease and stroke. Many patients receive a diuretic called hydrochlorothiazide to reduce their blood pressure, but one of the side effects of the drug is that it can cause the body to lose too much potassium, which in turn can cause palpitations, cramping of the arms and legs, fainting and psychological complications. To reduce that risk, some physicians also prescribe triamterene, which has potassium-sparing properties.

Many experts have suspected that triamterene lowers blood pressure while enhancing the benefits of hydrochlorothiazide. Wanzhu Tu, PhD, from Indiana University School of Medicine, and his colleagues studied electronic health records from more than 17,000 patients with hypertension, comparing patients taking hydrochlorothiazide alone to those taking it with several other antihypertensive agents. Their analysis revealed that combining triamterene with hydrochlorothiazide lowered blood pressure by an additional 3.8 mmHg, which is enough of an improvement to have clinical benefits. 4

Why were Tu and his associates able to detect a benefit while earlier studies didn't? In their study, they state: "The difference in findings may be due to the much larger sample size we were able to analyze in this study. Considering the large variability of BP, it is difficult for studies with small sample sizes to find moderate differences in BP." Their comments illustrate one of the advantages of big data: its ability to obtain statistically significant results that can only be gleaned by looking at an especially large number of patients. In this study, those results were gleaned from electronic health records.

The Precision Medicine Initiative

The American public will likely see even more pronounced benefits from a new big data initiative to be launched by the federal government. Called the Precision Medicine Initiative (PMI), this $215 million project will include the collection of health data from a million or more volunteers5 and will help address one of the shortcomings of big data.

To date, analysis of massive medical databases has been limited to what investigators call intermediate endpoints. The benefits mentioned above on hypertension, diabetes and heart disease suggest that big data analytics may reduce the incidence of these diseases and the mortality they incur, but they do not prove it. They only show that such analyses reduce risks, increase screenings and the like. Ultimately, large-scale studies are needed to demonstrate that these analyses directly reduce the nation's disease burden, reduce mortality rates and reduce health care expenditures.

There's every reason to believe that it's only a matter of time before these definitive studies are available. And as the research accumulates to demonstrate the value of data analytics in health care, public health professionals, health care executives and clinicians will need to become fully informed of these developments, looking for opportunities to put the new findings into effect “in the trenches.”


1. Cerrato, P. Beth Israel Deaconess Medical Center Reinvents Itself Once Again. Healthcare Informatics. May 7, 2014.

2. Kaelber DC et al. “Patient characteristics associated with venous thromboembolic events: a cohort study using pooled electronic health record data.” Journal of the American Informatics Association. Source here.

3. Cerrato, P. "7 Big Data Solutions Try To Reshape Healthcare." InformationWeek Healthcare. Dec 18, 2012. Source here.

4. Tu W, et al. “Triamterene Enhances the Blood Pressure Lowering Effect of Hydrochlorothiazide in Patients with Hypertension.” Journal of General Internal Medicine. 2015; 31(1):30-36.

5. Precision Medicine Initiative (PMI) Working Group Report to the Advisory Committee to the Director, NIH. The Precision Medicine Initiative Cohort Program — Building a Research Foundation for 21st Century Medicine. Sept. 17, 2015.

Paul Cerrato has more than 30 years of experience working in health care and has written extensively on patient care, electronic health records, protected health information security, practice management and clinical decision support. He has served as editor of InformationWeek Healthcare, executive editor of Contemporary OB/GYN, senior editor of RN and contributing writer/editor for the Yale University School of Medicine, the American Academy of Pediatrics, Information Week, Medscape, Healthcare Finance News, IMedicalapps.com and Medpage Today.

Learn More