How Amazon’s cloud unit is helping researchers analyze genetics

AWS has launched general availability for Amazon Omics, which helps researchers store and analyze omic data like sequences of DNA, RNA and proteins.

As health care becomes increasingly digitized, scientists, doctors and researchers have to try and decipher unprecedented amounts of data to adequately personalize care. The excess of information available to these experts often outpaces their ability to consume and analyze it. Amazon‘s cloud unit has been working to close that gap.

Amazon Web Services recently launched general availability for Amazon Omics, which helps researchers store and analyze omic data like sequences of DNA, RNA and proteins. The service provides customers with the underlying infrastructure they need to make sense of large amounts of data so they can spend more time making new scientific discoveries.

AWS generates a substantial piece of Amazon’s revenue, pulling in $20.5 billion in the third quarter. The cloud-computing business has been expanding into health care, and while AWS doesn’t disclose revenue projections for particular services, the global genomic data analysis market size is expected to reach $2.15 billion by 2030, according to a report from Straits Research.

Dr. Taha Kass-Hout, chief medical officer at AWS, said the vast majority of health care data is unstructured in nature, which means that about 97% of it goes unused. Indexing and making sense of this information is a challenge, especially when researchers are collecting omic data from tens of thousands of patients. 

Prior to his time at Amazon, Kass-Hout served two terms under President Barack Obama and was the first chief health information officer at the U.S. Food and Drug Administration.

Sequencing one human genome can require anywhere from 80 to 150 gigabytes of storage, Kass-Hout said, and some research projects deal with petabytes and exabytes of genomic information.   

“You’re talking about almost nine Harry Potter’s worth if you want to print it on a printer,” Kass-Hout told CNBC. “And that’s just for one human being.”

Amazon Omics helps researchers sort through their data by providing them with three components that they can leverage individually or as a collective. Omics-aware object storage helps researchers store and share raw sequence data; Omics Workflows helps run workflows that process raw sequence data at scale; and Omics Analytics simplifies the output of the sequence processing. 

More than a dozen customers and partners tested a beta version of the service and are already using Amazon Omics.

For Jeffrey Pennington, chief research informatics officer at the Children’s Hospital of Philadelphia, it’s already made a noticeable impact.

Pennington works in the department of biomedical and health informatics, which uses data and technology to solve issues in child health. He said the department spent five years expanding the infrastructure to analyze omics data, and now it’s no longer something they need to build or maintain themselves. 

“We’re a big pediatric academic medical center, but we’re still not big enough to learn and build everything that is required to make productive use of omic data,” Pennington said. “Our time and energy, our effort, our financial wherewithal is much better spent putting the puzzle together rather than generating those pieces in the first place.”

Amazon Omics also encourages collaboration between large research groups, smaller clinical groups and intelligence and pharmaceutical companies, said Boris Oklander, co-founder and chief technology officer of C2i Genomics.

C2i is a biotechnology company that’s working to use genomic data to develop personalized treatments for cancer. Oklander said the company participated in the beta for Amazon Omics after trying to develop its own data-analysis technology.

He said Amazon Omics has created an ecosystem for collaboration that eliminates the need for researchers to build a complex technology from the ground up. 

“We’re just democratizing,” he said. “This type of service is something that allows [us] to unlock the value in the investments that different players in this space are doing.” 

Other major tech companies have developed similar tools. Microsoft‘s cloud-computing platform Azure launched Microsoft Genomics in 2018 to help researchers interpret data generated by genomic technologies. Google‘s Cloud Life Sciences technology also allows researchers to process biomedical data at a large scale.

Pennington said the Broad Institute and DNAnexus offer popular genomic data analysis services as well, but said they can be difficult to maintain and can analyze fewer data types than Amazon Omics.

Given the sensitive and deeply personal nature of omic data, Kass-Hout said privacy and patient data protection is “job zero” for AWS. He said AWS uses more than 300 security, compliance and governance services and supports 98 security standards and compliance certifications. In doing so, AWS goes “way beyond” regulatory compliance, Kass-Hout said, and it also provides best-practice resources and encryption tools to its customers. 

Customers are also responsible for building secure applications on top of Amazon Omics’ services, which guards AWS from seeing or leveraging the data. 

Kass-Hout said that ultimately, Amazon Omics serves as a way to efficiently index information so researchers can focus on making real advances in precision medicine. 

“If the last decade was about the digitization the health and life science industry has gone through, I truly believe the next decade is about making sense of this data in ways now [where] we can find new therapeutics, new diagnostics, more targeted therapies,” he said.