The Promise and Security Risks of using Cloud-Based Informatics Solutions for Human Genomic Data Analysis
The ability for medical and research scientists around the globe to share data has enhanced the progress of medical discovery and healthcare delivery. Instant access to information via cloud-based technology by researchers and healthcare providers allows more time spent acting on and processing data. However, large bodies of data (a.k.a big data) generated through new technologies such as next-generation sequencing (NGS) need special data-storage considerations.
Storage of a vast body of data for access by collaborators has been facilitated by cloud-based applications. The cloud is a network of servers allowing access to data and applications via the internet. With the advantages and global access provided by cloud-based applications come security challenges.
Collaborative Environment in the Health Sciences
Successful and effective medical and research collaboration is only possible when all participants in the collaborative process are able to easily access information and data in real time and without delay. The sharing and passing of data, regardless of the methods used, must be accomplished without risking data loss or breaches of the confidentiality of the information. Complicating the data sharing process is the diversity in the types and formats of data. Twenty years ago, for example, sequencing data shared between scientists were mailed on media such as CDs. This was accompanied by other supporting information available as text in hard copy.
Traditional forms of sharing data with scientists from other countries and time-zones add additional barriers to the rapid exchange of information. Delays in data receipt, difficulty in the interpretation of data, and interrupted communication in regards to shared data can make or break the success of a research project or medical case evaluation efforts. The ability for all collaborators to use shared data for participation in medical and research-related decision-making processes in a timely manner is paramount.
Handling of Human Genomic Data
The same general challenges described for medical and research collaborative efforts exist for the multiuser exchange and use of human genomic data. Human genomic data contain a significant amount of health information. This is considered sensitive information relating to health and familial information that can be misused if not properly protected. Concerns include, but are not limited to, its use for discriminatory practices and denial by insurance agencies and employers. The Genetic Information Nondiscrimination Act restricts the access of issuers of health insurance and employers to individuals’ genetic information, as well as to prohibit genetic discrimination (1).
The outsourcing of genomic data into public cloud computing settings raises concerns over privacy and security. Given the ever-emerging use of cloud applications to handle genomic data, there must be company or organizational policies in place to protect this data. This must be balanced with the need to provide the access and ease of collaboration needed for clinical research endeavors. The NIH Genomic Data Sharing Policy provides guidelines that can be used to ensure the protection of patient genomic data while allowing necessary access to researchers.
The advent of NGS has provided valuable, yet massive, data for use in research and clinical medicine. The developments in this area can soon lead to its routine use in the clinical setting. Significant bioinformatics and computational tools are needed to store, process, and interpret NGS data. Secure procedures are needed to allow access to and processing of NGS data while protecting patient/client privacy.
Bhuvaneshwar et al. (2) conducted a case study of a solution to the challenges associated with NGS data management and analysis. The cloud-based system used, the Globus Genomics system, is a data management and analysis platform with rapid computation times and advanced data security. Illumina offers a cloud-based computing environment, BaseSpace Sequence Hub, for the management and processing NGS. These cloud tools and applications are examples of methods used to facilitate collaboration within and between labs globally.
Privacy, and Data Security in the Cloud
Cloud technology has the advantage of providing a single space or access point for all information needed for various users or collaborating project participants. For example, various research team members can quickly access various data forms for executing their specific roles in data processing and interpretation. A clinical medical team can review laboratory or sequencing results to provide their specific expertise to determine diagnoses or the best treatment strategies. Unlike traditional methods of information sharing (hard copy mail, faxes, etc.), this process can occur in an instant.
There must be robust data protection methods for cloud applications, proper data disposal protocols, and other safeguards against improper access to private data. Data encryption is the primary strategy to ensure the security of sensitive information stored in the cloud. It has been suggested that data be encrypted prior to storing in the cloud (3,4). Other security measures include ensuring that the security service complies with current data privacy regulations and the cloud provider should have the highest standards of data security strategies. Some of the data protection strategies that are implemented include protocols and guidelines to securely back-up data and recover such data in case of server or hardware crashes. Firewalls and specific restrictions so that only authorized personnel can access certain types and levels of data are also necessary. Organizations using cloud environments need to have legal and informatics technology policies that ensure the integrity of sensitive data.
Securely and effectively handling genomic data while providing a means of beneficial research and medical collaboration continue to be ongoing challenges. New legal policies must be integrated with scientific and technical solutions. As more data become available and the understanding of the nature and security implications of the data is better understood, new policies are proposed and tried. This in itself is a whole area of research that is multidisciplinary and ever developing.
1. National Human Genome Research Institute. Genetic Information Nondiscrimination Act (GINA) of 2008. Available at https://www.genome.gov/24519851/, last accessed March 16, 2012.
2. Bhuvaneshwar K, Sulakhe D, Gauba R, Rodriguez A, Madduri R, Dave U, Lacinski
L, Foster I, Gusev Y, Madhavan S. A case study for cloud based high throughput
analysis of NGS data using the globus genomics system. Comput Struct Biotechnol
J. 2014 Nov 7;13:64-74.
3. Muhammad N. Hurdles for Genomic Data Usage Management. 2014 IEEE Security and Privacy Workshops, University of llinois at Urbana-Champaign. 2014 44-48.
4. Rao, RV. Data Security Challenges and Its Solutions in Cloud Computing. Procedia Computer Science 48 ( 2015 ) 204 – 209.