Table of Contents
Big Data Analytics
Statement of the Problem
BDA Research Workflow
Research data subsets
Reporting and Visualization
Virtual research data center
The Center for Medicare and Medicaid Services (CMS) manages health care for over one hundred million patients. A large majority of these are Medicare members, for which the CMS is responsible for administering health care benefits through medical claims data. For this population, comfortability with their health care physician, quality of life, cost, and longevity are all of concern. Indeed, Medicare members’ financial and health concerns burden not only the members themselves, but families and taxpayers as well. The recent emergence of big data analytics (BDA) has provided solutions to issues like these in many other patient populations. The CMS has recently conducted some internal BDA and has shown efficacy in fighting fraud and lowering hospital readmissions. However, the agency cannot possibly conduct research to addresses all Medicare members’ needs in such an encapsulated manner. Other government agencies that house similar data provide easy access, and even some analytical discovery tools, to outside researchers in varying settings. The CMS has partners to help manage data and its availability to researchers, however, large barriers to BDA in Medicare still exist. In addition to conventional challenges with big data characteristics, the CMS faces regulatory and legal hurdles in health care policy and data privacy. Additionally, extremely rigorous application processes and prohibitively high costs limit researchers’ access to Medicare data for discovery and early-stage research purposes. These barriers need to be overcome to expand the CMS’s success in other areas to addressing all the health needs of Medicare members.
Keywords: big data, analytics, Medicare, CMS, informatics, bioinformatics
Health care in the US makes up 17.9% of the gross domestic product or $3.2 trillion annually, more than any other sector (Sukumar, Natarajan, & Ferrell, 2015). The Center for Medicare and Medicaid Services (CMS) is the United States’ single largest payer in health care (Brennan, Oelschlaeger, Cox, & Tavenner, 2014). As of 2015, the CMS oversees the federal health insurance program for over fifty-five million Medicare beneficiaries (“CMS,” 2016). While those with certain disabilities and end-stage renal disease may be qualified to receive Medicare benefits, the majority of members are the general population over the age of sixty-five. Recent expansions under the Affordable Care Act have assigned the CMS as the leader among US health care agencies (Brennan et al., 2014). The sheer number of patients covered under the program presents fundamental challenges in administering health care benefits alone. However, the new Affordable Care Act mandate expands the CMS’s goals to include both improving the quality of health services and reducing health care costs (Anoushiravani, Patton, Sayeed, El-Othmani, & Saleh, 2016). The ability to perform research on the enormous amount of Medicare claims data to achieve these goals can only be done with appropriate BDA implementation.
Big Data Analytics
Big data can be thought of as data so big it is unable to be processed into useful information using hardware or software found on a typical PC (Shaha, Sayeed, Anoushiravani, El-Othmani, & Saleh, 2016; Belle et al., 2015). BDA is comprised of all aspects of computing in big data including infrastructure and architectural design, storage, software and algorithm development, data manipulation and statistical processing, and reporting and visualization. BDA is traditionally characterized by the three Vs – Volume, Velocity, and Variety. Recently, two additional Vs have been incorporated from the business sector that are useful in BDA characterization – Value and Veracity (Sukumar et al., 2015; Belle et al., 2015; Gray & Thorpe, 2015). As it relates to health care, Sukumar et al. (2015) explain just one million patients’ medical claims data can easily occupy terabytes (one thousand gigabytes per terabyte) of storage volume and the velocity of claims data can flow at the rate of twenty claims per minute. A variety of health care data exists in Medicare from enrollment, survey, claims and other CMS data (Brennan et al., 2014). Different coverage plans are comprised of multiple claim types such as hospital, specialist, and pharmacy claims. It is important to note that despite the many forms of claims data available, these are considered structured data, making the most important distinction when discussing data heterogeneity. In characterizing Medicare data, it will not be missed there are legal considerations especially in the areas of sharing patients’ protected health information (PHI). Veracity can be assessed through conformity with laws, general data quality and security efforts, and in the accuracy of proposed versus measured results from implemented programs. Value in health care BDA can be calculated by cost-benefit, outcome-benefit, or both created from actionable insights gained from the data. The five Vs can not only be viewed as characteristics of big data, but they are also convenient and relevant silos to allow challenges within BDA implementation in Medicare to be addressed.
Statement of the Problem
The CMS has led BDA efforts to conduct research to combat costs in health care. Among other undertakings, in 2011 a big data initiative allowed the agency to conduct a fraud analysis that resulted in the identification of $115 million in fraudulent claims in its very first year (Brennan et al., 2014). While this is a sizable amount, the CMS as the leading health care agency should be expected to account for more; a potential value from proper big data implementation in health care is estimated to be more than $300 Billion (Belle et al., 2015).
In 2015, the director of Health and Human Services, the agency that oversees CMS, announced that by 2018 90% of Medicare payment programs are to be tied to some quality metric (Al Kazzi & Hutfless, 2015). To address other cost and health outcome issues, the CMS makes Medicare data available to outside researchers through partnerships with other institutions like the Research Data Assistance Center and the Surveillance, Epidemiology, and End Results Program. However, because of the tremendous amount of Medicare data and the enormous potential to create value for members, more outside research and therefore better access to BDA in Medicare is necessary.
Even with recent data releases and new access procedures, the CMS is considerably behind the times in data sharing, Fallik (2014) asserts:
Although the data are available to the public, that doesn’t mean that they’re offered in an accessible or simple format. To paint any kind of clear picture, researchers must download millions of records and analyze them using statistical software. It’s not as if every consumer can now go online to Medicare.gov and start searching by doctor or zip code.
In this article, the barriers to outside researchers’ access to Medicare data are addressed. A research workflow is described, and each of the relevant five Vs of big data are covered during each workflow stage. Current CMS policy and methods of data sharing are described. Finally, recommendations are presented to improve access to Medicare data to the BDA research community.
An advanced literature search of the National Center for Biotechnology Information’s PubMed database was conducted. The criteria for the search was initially (big data[Title/Abstract] AND medicare[Title/Abstract]) revealing only nineteen search results. This literature was curated by reading the abstracts and introductions in the full texts to ensure the topic of research was not too narrow (e.g. big data only in Medicare cardiology patients). Five peer-reviewed journal articles from the initial search are included in the reference list.
To gain insight into current trends in BDA in health care overall, another search was conducted using the criteria (big data[Title/Abstract] AND health care[Title/Abstract]). Again, specific areas of medical practice were excluded. As this article is focused on revealing broad challenges in order for policy issues to be addressed, articles focused on algorithm development or other esoteric BDA topics such as artificial intelligence were not considered. Policy articles and articles that addressed BDA in health care as a whole were analyzed further through their abstracts and full text. Another five peer-reviewed articles were selected through this process.
Medicare, though a federal program, is not administered to beneficiaries exclusively by the government. Instead, private entities assist in providing members a blend of coverages known as Medicare Advantage. It is estimated these companies pre-process claims data and administer coverage for over 30% of the Medicare member population (“CMS,” 2016). Executives and shareholders of these companies, as well as the Medicare members they serve, have an interest in seeing BDA implemented for their benefit. For this reason, peer-reviewed articles from journals with input and readership from the privatized (industry) population alongside the scientific and medical community were considered particularly relevant. The CMS and affiliated entities’ (public, private, and academic) electronic resources are also included under references. These are necessary to provide summary BDA utilization, current practices, and policies of the network with ultimate oversight of Medicare data.
The author conducted the literature review process collecting qualitative data regarding important challenges faced in big data in health care and their specific correlates in Medicare BDA. Opportunities noted in the review were evaluated by both the frequency challenges were cited and by the potential to yield the research benefits of lowered health care costs and better health outcomes. Articles that contained specific potential measurements (dollars, lives saved, quality of life) of overcoming big data challenges or those that specifically addressed outside researchers’ barriers to Medicare BDA were thought to be especially germane.
In this work, the author presents barriers and solutions to issues faced by researchers in BDA in Medicare. Unique to this type of article, a big data workflow typically utilized by researchers is presented. Alongside this workflow, are methods and examples, challenges faced (in terms of the applicable five V characteristics), and ownership during each of the stages. Finally, potential remedies are presented to the owners of the specific challenges and policy makers in the medical and scientific research community.
BDA Research Workflow
A basic BDA research workflow is shown in Figure 1 and is described as follows. First, data must be gathered for analysis, this process is referred to as Acquisition. Second, the data must be stored – Storage. Third, data is subsetted into a research workspace file – the Research Data Subset (RDS). Fourth, the RDS undergoes Manipulation to produce values that can be presented to decision makers. Fifth, Reporting and Visualization, encompasses how the results of the analysis are rendered in text or other visual formats to easily garner information from them.
Abbildung in dieser Leseprobe nicht enthaltenFigure 1. Big data and analytics clinical research workflow using Medicare claims data: BDA class, workflow stage, users and owners, challenges, and overarching concerns.