Genome storage challenges: Where will millions of genomic data go?

IMAGE CREDIT:
Image credit
iStock

Genome storage challenges: Where will millions of genomic data go?

Genome storage challenges: Where will millions of genomic data go?

Subheading text
The staggering amount of storage capacity required for genome storage and analysis raises questions and concerns.
    • Author:
    • Author name
      Quantumrun Foresight
    • April 24, 2023

    The genomics industry has experienced significant success, which has resulted in the production of large quantities of DNA sequencing data. This data can be challenging for scientists to analyze and make full use of due to the lack of sufficient tools. Cloud computing could solve this problem by allowing scientists to access and process data remotely through the internet.

    Genome storage challenges context

    The use of genomics in drug development and personalized healthcare has increased significantly due to the decrease in the cost of DNA sequencing. The first sequenced genome took 13 years and cost around $2.6 billion USD, but in 2021 it is possible to have a person’s genome sequenced in under a day for under $960 USD. It is predicted that over 100 million genomes will have been sequenced by 2025 as part of various genomic projects. Both pharmaceutical companies and national population genomics initiatives are collecting large amounts of data that are expected to continue growing. With proper analysis and interpretation, this data has the potential to significantly advance the field of precision medicine.

    One human genome sequence generates around 200 gigabytes of raw data. If the life sciences industry succeeds in sequencing 100 million genomes by 2025, the world will have collected over 20 billion gigabytes of raw data. It is possible to partially manage such a large amount of data through data compression technologies. Companies such as Petagene, based in the UK, specialize in reducing the size and storage costs of genomic data. Cloud solutions can address storage problems and enhance communication and reproduction abilities. 

    However, larger pharmaceutical companies avoid taking risks with data security and prefer internal infrastructure for storage and analysis. Incorporating techniques like data federation decreases this risk by allowing computers in different networks to work together to analyze data securely. Companies like Nebula Genomics are further introducing whole-genome sequencing to be placed on a blockchain-based platform enabling users to control who their data is shared with and the organization to access de-identified data to understand trends in health.

    Disruptive impact 

    Genomic data storage challenges will likely encourage many more firms to transition to cloud computing solutions to avoid paying high costs on IT infrastructure upfront. As more storage providers compete to make their solutions stand out in the industry, the costs associated with these services will likely decrease, and new genome-specific technology will spring up in the 2030s. Though large firms will initially be hesitant, they will probably see the benefits of more recent, secure cloud computing techniques and begin employing them. 

    Other potential solutions may include data lakes, a central repository that allows storing all structured and unstructured information at any scale. Data warehousing, which involves the centralization of information from multiple sources into a single, integrated system, can also be a viable method for storing and managing large amounts of genomic data. Specialized data management systems offer advanced features, such as security, governance, and integration. In some cases, it may be necessary to store genomic data locally on in-house servers. This option can be suitable for small-scale projects or organizations with specific data security requirements.

    Blockchain-based solutions can be expected to become widely employed as well. A major benefit of using this technology is that it allows individuals to retain ownership of their genomic data. This feature is important because this information is highly sensitive, and individuals should have control over how it is used and shared.

    Implications of genome storage challenges

    Wider implications of genome storage challenges may include:

    • Novel opportunities for cybercriminals if genome storage systems are not made sufficiently secure.
    • Pressure on governments to introduce stronger policies regarding the use and protection of genomic data, particularly obtaining consent.
    • Accelerated success in drug and therapy developments once the technical challenges around analyzing massive genomic databases are resolved.
    • An increasing number of cloud service providers that create specialized products and services for genomic data and scientific research.
    • Scientists and researchers being taught to operate blockchain-based data storage and management systems.

    Questions to consider

    • How do you think genomic data on individuals can be misused?
    • How do you think the storage and management of genomic data will change, and what impact will this have on healthcare and research?

    Insight references

    The following popular and institutional links were referenced for this insight: