On top of the data mountain

ITIJ 191, December 2016
With corporations and institutions in all sectors amassing big data, Tatum Anderson looks at how insurers and medical providers are avoiding information overload and are instead making clever use of their accumulated facts and figures to offer enhanced patient care, offer tailored products and make cost savings
Few companies that deal with patients or insured clients don’t gather and store gargantuan amounts of information every day. That data comes from multiple sources – claims data, electronic health records (EHR), scans, cost data that describes what services were provided and how they were reimbursed, real-time data on blood pressure, temperature and other vital signs from sensors – and this mountain of data keeps on growing. 
The reason that the amount of medical data continues to grow is partly down to the increasing digitisation of medical records. Certainly, their use has risen sharply in recent years. Just 30 per cent of US office-based physicians and hospitals used even basic electronic medical records in 2005. By 2011, half of these doctors and three-quarters of hospitals were using them. And – although interoperability between different EHR systems is still an issue – increasing numbers of doctors are able to access yet more records from other physicians and hospitals, through hospital information exchanges and EHR vendor networks.
The mountain of medical-related data is also growing because decades of stored government data – everything from disease incidence to spending on treatments – are also being made accessible and searchable online. Even the Chinese government announced, in June, a massive big data project encompassing its health and medical sectors.
With so much information has come the promise that big data will transform healthcare. As well as better health outcomes, preventing unnecessary spending is a core reason for organisations to pursue healthcare-related big data projects – and many insurers are now very much focused on efforts in this area.
George N. Argesanu, head of advanced analytics at AIG, says insurers are investing heavily in analytics to help change business decisions and improve cost containment. “The recent advancements in technology, and availability and access of data, have opened a lot of other doors and this is why we see so much more focus on analytics within insurance,” he said. “[We are] creating a continuous dynamic feedback loop where developments/changes in risk are quickly turned back into actions that can be taken to limit the potential of future losses.”
Better together
There are multiple benefits of analysing these data sets. Correlating claims information with particular doctors could reveal their effectiveness and influence employment decisions and pay rates. Big data has already been used to refine treatment plans to reduce sepsis, the costliest hospital condition in the US, too. And it has started to deliver transparency. One big data project analysed 92 billion health insurance claims from 88 million people covered by three of the US’s largest insurance companies. It revealed enormous variations in the cost of private healthcare at different hospitals (and different spending patterns when compared with publically funded treatments).
Medical providers have been accustomed to making treatment decisions independently using their own clinical judgement, rather than relying on protocols based on big data
For these reasons, and for many others, a number of insurers are examining how analytics could optimise their operations and are developing big data strategies. Mathieu Lambert, chief of pricing at AXA Belgium P&C Retail, said his company’s plan will be ready towards the end of the year. “In AXA, we are currently in a transformation journey around data,” he commented.
Allianz has created a division dedicated to inventing and building powerful data analytics and applications, and US insurance companies including UnitedHealth and Aetna have even spun off analytics firms that carry out complex analysis for other companies too.
These big data projects are nontrivial. That’s because they are not just tasked with manipulating huge volumes of data; these projects must manipulate fast-changing data sources – such as from radio-frequency identification tags or sensors that operate in near real-time. They must manipulate data that is inconsistent, too, or have daily, seasonal or event-triggered peak data loads. They must also be able to cope with a variety of data formats – from structured, numeric data in traditional databases, to unstructured data, such as output from medical devices, doctor’s notes, lab results, imaging reports, medical correspondence and clinical data. Unstructured data, in particular, is seen as an invaluable resource for improving patient care and increasing efficiency. In pharmaceuticals, unstructured data may, for instance, describe a drug’s method of action, side effects and toxicity as well as patient behaviour, activities and preferences. 
Many implementations actually largely depend on creative thinking, innovative programming, the cloud and free and open source software
Crucially, big data projects are combining these datasets, sometimes for the first time, in order to discover new and creative cost containment measures. New techniques in applied mathematics and computer sciences are increasingly effective at recognising patterns in messy data too. Machine learning, a type of artificial intelligence, for instance, is helping these IT systems to learn without being explicitly programmed; they are teaching themselves to change when exposed to new data. 
Motivating factors
A move from fee-for-service to value-based reimbursement is a driver for evaluating costs and cost drivers, as are fines. In the US, Medicare has started to penalise hospitals with unnecessary hospital readmissions, which are extremely costly. Hospitalisations generally account for more than 30 per cent of the two-trillion-dollar annual cost of healthcare in the US. But, worryingly, a fifth of all hospital admissions in the US occur within 30 days of a previous discharge. 
So, analysts are now starting to combine clinical and claims information and create algorithms to better predict, for instance, which chronic heart failure patients might be most susceptible to readmission within 30 days. That way, treatment plans can be adjusted to prevent readmission.
This is predictive analytics. It’s a hot area in big data because it could potentially save money by preventing major hospital events. That involves predicting what might happen and then working out more effective interventions for chronic conditions and preventable infectious diseases. So, perhaps a patient’s blood pressure is below a level considered hypertensive. Analytics could spot patients who are living sedentary lifestyles, have been resistant to changing behaviour, or whose parents are hypertensive. Clinical decisions can be made about whether to put such patients on blood pressure medicine now, before they become fully hypertensive, or find alternative methods to help them modify their lifestyle.
The recent advancements in technology, and availability and access of data, have opened a lot of other doors and this is why we see so much more focus on analytics within insurance
It may also enable insurers to cover big risks. “We did analytics on health record data in some countries. It turned out that machine learning can be used to predict certain severe diseases, such as diabetes,” says Andreas Braun, who created and now heads the Global Data & Analytics team at Allianz.
So, by combining predictions with new technologies, such as the Apple Watch, it might be possible to help and monitor potential patients’ control of their nutrition, workouts, doctor’s visits and medication. “We can insure a person with diabetes, as we can help to maintain certain procedures to control and mitigate the effect,” he said.
Predictive analytics is also popular because it could detect fraud. Patient records and billing information are being analysed to detect anomalies, such as patients receiving healthcare services from different hospitals in different locations simultaneously. It’s being used to compare charges against a fraud profile and analyse relations of a provider, as fraudulent billers are often organised as tight networks. The quick returns from analysing fraud alone can easily justify investment in big data infrastructure, according to Braun. “Fraud analytics and money laundry alone pays for the whole initiative,” he said.
Population health analytics – large-scale studies of populations or parts of the population at risk – is another strand of big data analytics that could help in cost containment. Combining clinical or claims information with postcodes, for instance, could reveal more about patients in specific cohorts. Experts say population analysis could have profound effects on health outcomes. An example would be analysing how drugs affect the vast majority of adults. Today, drugs must be tested in tightly controlled clinical trials to gain regulatory approval. But those trials typically exclude many kinds of patients – older and younger patients, those with co-morbidities and other conditions. So years of further studies – if they are done at all – are usually required to understand how some drugs affect sub-populations, whether they have some toxicity or even work effectively at all. Analytics is able to determine drugs cost effectiveness for very specific cohorts.
Big data thinkers reckon even more data sources – from genomic to GPS – could help expand analytics and improve health outcomes too. Personal health applications could track physical activity in vulnerable populations such as smokers, provide GPS locations of asthma sufferers, or instant blood glucose readings for diabetic patients. Changes in electricity consumption and shopping patterns, for instance, might reveal how a patient with a chronic long-term condition’s health is deteriorating. Such data allows insurers to look at ways to prevent claims while maintaining insureds’ health.
All onboard?
Despite the big thinking, only a small percentage of available data is actually being analysed, say some sources. “There is definitely a lot of hype. Big data, advanced analytics, machine learning, and artificial intelligence – many people are talking about these, few are doing things,” says Argesanu. “We are just scratching the surface here and over time, by leveraging machine learning and artificial intelligence, we will see a significant impact on losses and hence loss ratios.” 
One consultancy, McKinsey – which predicted that big data could save the healthcare industry up to $450 billion – reckons healthcare has lagged behind other industries in the use of big data because there’s been considerable resistance to change. Medical providers have been accustomed to making treatment decisions independently using their own clinical judgment, rather than relying on protocols based on big data. Data hasn’t been shared, and older IT has prevented it from being manipulated. Others say insurance is even more 
companies now require a new generation of technologies that can store and manipulate unstructured data
But most see data protection and privacy as the biggest challenge ahead. The introduction of new data protection rules in Europe – the General Data Protection Regulation – could mean fines and reputational problems for companies that don’t address issues.
Others think there are solutions. Health apps, for instance, can actually help with privacy says Shawn Dooley, industry leader, health and life science at a big data specialist IT company Cloudera. “Creating apps solves several issues, including data security, access and permissions, and the better you can predict who is about to create lots of cost, the better you can prevent the cost,” he said.
But big data is also not widespread because it’s so complex to link, match, cleanse and transform data across different systems. There are a few off-the-shelf tools that can help. Some remove names and other personal information from records, clean data and preserve patient privacy. Others integrate with, or help to link, EHRs. Meanwhile, Apple is integrating health apps with certain EHRs. 
Some data solutions firms, such as Acxiom and Accurint, aggregate information to help companies learn about patients’ finances, buying preferences and other consumer spending characteristics. Enlitic helps radiologists combine previous findings and other data associated with existing images in its databases to spot patterns to help spot likely mistakes and rule out extremely unlikely options. Furthermore, IBM’s Watson, an artificial intelligence organisation, is studying whether applying machine learning to large amounts of unstructured data like clinical guidelines, scientific literature and treatment protocols could help optimise cancer treatment.
But big data projects require a lot more than off-the-shelf software – such as traditional databases and analytics tools – that have been used for decades, say analytics experts, including Allianz’s Braun. Many implementations actually largely depend on creative thinking, innovative programming, the cloud and free and open source software, he says.
That’s because companies now require a new generation of technologies that can store and manipulate unstructured data, which forms about 80 per cent of information in healthcare. Many are using Hadoop, an open source platform for unstructured data, as a basis. 
A clear and comprehensive strategy is needed, therefore, and many companies are having to bring in talent from outside to help with their data analytics, as the skills and culture required to develop effective data mining systems are quite different from those traditionally found in a risk-averse insurance industry. “It requires careful planning, so that analytical skills are married with the right business skills and business understanding, otherwise we risk having analytics for the sake of analytics,” says AIG’s Argesanu.
The first step, then, is to collect and store as much relevant information as possible, says Cloudera’s Dooley, a Hadoop specialist. “If your international health plan, five years out, has the capacity to store large volumes of data and images at a patient level, has tested success at analysing clinical notes inside electronic health records, is collecting EHR data from clinical partners and using it to respond more quickly, and is stratifying patients more quickly, then you will be right on track as a big data practitioner ahead of the curve,” he said.