Current deep neural network systems such as ChatGPT could soon be trained with a 100-fold increase in energy efficiency, with “substantially several more orders of magnitude for future improvement.” Scientists from MIT and other institutions have demonstrated a new optical neural network training method that crushes state-of-the-art electronic microprocessors.
Not only that, but the demonstrated system also reached a compute density about two orders of magnitude higher than systems by Nvidia, Google, or Graphcore.
Basically, that means that the most advanced models could be trained with a hundred times less energy, taking up much less space at the same speed.
Artificial neural networks imitate the way that biological brains process information. These AI systems, built to learn, combine, and summarize information from large data sets, are reshaping the field of information processing. Current applications include image, object, speech recognition, gaming, medicine, and physical chemistry.
Current AI models, reaching hundreds of billions of artificial neurons, demonstrate exponential growth and present a challenge to current hardware capabilities.
The paper showed that an optical neural network (ONN) approach “with high clock rates, parallelism, and low-loss data transmission” could overcome the current limitations.
“Our technique opens an avenue to large-scale optoelectronic processors to accelerate machine learning tasks from data centers to decentralized edge devices,” the paper reads.
The ONN approach holds great promise to alleviate the bottlenecks of traditional processors such as transistor count, energy consumption in data movement, and semiconductor size. ONNs use light, which can carry lots of information at once due to large optical bandwidth and low-loss data transmission. Also, many photonic circuits may be integrated to scale the systems.
To move the light around for calculations, the MIT-led team exploited many laser beams, the method described as “neuron encoding with volume-manufactured micron-scale vertical-cavity surface-emitting laser.”
“Our scheme is similar to the “axon-synapse-dendrite” architecture in biological neurons,” researchers explain.
They believe that the demonstrated system is scalable through mature wafer-scale fabrication processes and photonic integration.
Dirk Englund, an associate professor in MIT’s Department of Electrical Engineering and Computer Science and leader of the work, explained to SciTechDaily that models such as ChatGPT are limited in their size by the power of today’s supercomputers. Therefore it’s not economically viable to train larger models.
“Our new technology could make it possible to leapfrog to machine-learning models that otherwise would not be reachable in the near future,” he claimed.
The paper, entitled “Deep Learning with Coherent VCSEL Neural Networks” was published by a large team of scientists. The work was supported by the Army Research Office, NTT Research, and the NTT Netcast award, and financial support was provided by the Volkswagen Foundation. Three researchers from the team have filed a patent related to the technology.
More from Cybernews:
Subscribe to our newsletter