Supersized AI models: Giant computing systems are reaching the tipping point

IMAGE CREDIT:
Image credit
iStock

Supersized AI models: Giant computing systems are reaching the tipping point

Supersized AI models: Giant computing systems are reaching the tipping point

Subheading text
Machine learning mathematical models are getting larger and more sophisticated annually, but experts think these expansive algorithms are about to peak.
    • Author:
    • Author name
      Quantumrun Foresight
    • June 2, 2023

    Since 2012, significant advances in artificial intelligence (AI) have occurred regularly, mainly driven by increasing computing power (“compute” for short). One of the biggest models, launched in 2020, utilized 600,000 times more compute than the first model from 2012. Researchers at OpenAI noted this trend in 2018 and warned that this growth rate would not be sustainable for long.

    Supersized AI models context

    Many machine learning (ML) developers use transformer models for deep learning (DL) because of their seemingly limitless potential. Examples of these models include Generative Pre-trained Transformer 2 (GPT-2), GPT-3, Bidirectional Encoder Representations from Transformers (BERT), and Turing Natural Language Generation (NLG). These algorithms often have real-world applications such as machine translation or time series prediction. 

    Artificial intelligence modes have to expand to accommodate more training data and become better at predictions. This requirement has led to the rise of supersized models with billions of parameters (variables used by algorithms to make predictions). These models are represented by OpenAI’s GPT-3 (and its ChatGPT interaction launched in December 2022), China-based PanGu-alpha, Nvidia’s Megatron-Turing NLG, and DeepMind’s Gopher. In 2020, training GPT-3 required a supercomputer that was among the five largest in the world. 

    However, these models tend to require massive amounts of energy-intensive training data. Deep learning has depended on its ability to use enormous compute power, but this will soon change. Training is expensive, there are limits to AI chips, and training large models clogs up processors, making it hard to manage them all. The larger the parameter, the costlier it is to train these models. Experts agree that there will come a point where supersized AI models may become too expensive and energy-intensive to train. 

    Disruptive impact

    In 2020, OpenAI estimated the minimum amount of compute required to train numerous models, factoring in the number of parameters and dataset size. These equations account for how ML requires that data to pass through the network many times, how compute for each pass rises as the number of parameters increases, and how much data is needed as the number of parameters grows.

    According to Open AI estimates, assuming that developers can achieve maximum efficiency,  building GPT-4 (100 times bigger than GPT-3 (17.5 trillion parameters)) would require 7,600 graphics processing units (GPUs) running for at least one year and cost approximately USD $200 million. A 100-trillion parameter model would need 83,000 GPUs to power it for a year, costing more than USD $2 billion.

    Nonetheless, tech firms have been collaborating and pouring investments in their ever-expanding supersized AI models as demand for ML solutions grows. For example, China-based Baidu and the Peng Cheng Lab released PCL-BAIDU Wenxin, with 280 billion parameters. PCL-BAIDU is already being used by Baidu’s news feeds, search engine, and digital assistant. 

    The latest Go-playing program version, which DeepMind created in December 2021, has 280 billion parameters. The Google Switch-Transformer-GLaM models have a staggering 1 trillion and 1.2 trillion parameters, respectively. Wu Dao 2.0 from the Beijing Academy of AI is even more massive and has been reported to have 1.75 trillion parameters. As smart cities and automation continue to push disruptions, experts are unsure how AI compute will support such a future. 

    Implications of supersized AI models

    Wider implications of supersized AI models may include: 

    • Increased investments and opportunities in developing AI computer chips that consume less energy. 
    • AI progress slowed down by the lack of computing power, leading to more funding for energy-conserving technologies and solutions.
    • ML developers creating alternative models aside from transformers, which can lead to discoveries and innovation for more efficient algorithms.
    • AI solutions focusing on application-centric problems, adjusting compute accordingly or modifying as needed instead of just supersizing.
    • More complex datasets allowing AI programs to perform better predictions, including weather forecasts, space discovery, medical diagnoses, and international trading.

    Questions to comment on

    • If you work in the AI sector, what are some progress in developing better ML models?
    • What are the other potential benefits of models with extensive training data to learn from?

    Insight references

    The following popular and institutional links were referenced for this insight: