Nvidia and Microsoft revealed their largest and most powerful monolithic transformer language model trained to date: Megatron-Turing Natural Language Generation (MT-NLG), complete with a staggering 530 billion parameters built together, according to a press release.
MT-NLG outperforms prior transformer-based systems by both companies. MT-NLG is substantially larger and more complex than Microsoft’s Turing-NLG model and Nvidia’s Megatron-LM, with three times as many parameters spread across 105 layers.
As the successor to Turing NLG 17B and Megatron-LM, MT-NLG has achieved unrivaled accuracy in a wide range of natural language tasks such as completion prediction, reading comprehension, commonsense reasoning, natural language inferences, and word sense disambiguation.
One of the world's largest and most powerful generative language models
MLT-NLG was trained on Microsoft Azure NDv4 and Nvidia's Selene machine learning supercomputer, which is composed of 560 DGX A100 servers, each with eight A100 80GB GPUs, on a massive dataset known as The Pile. The model is comprised of multiple smaller datasets totaling 825 GB of text obtained from the internet. These sources range from Wikipedia articles and academic journal repositories to news clips.
Thanks to all that, MT-NLG outperforms its predecessors in a wide range of natural language tasks, including auto-completion of phrases, answering, and reading and reasoning. It can also complete similar tasks with little to no fine-tuning, a process known as few-shot or zero-shot learning.
Because of the vast amount of data used to train the model, the researchers haven't been able to scrub the dataset of words that should never be used yet. The MT-NLG picks up stereotypes and biases from the data on which it is trained, and this means that, unfortunately, MT-NLG can produce offensive outputs that are potentially racist or sexist.
Researchers at Microsoft and NVIDIA are committed to tackling this issue, and while it is unknown whether MT-NLG will be commercially available, the press release firmly states that the usage of MT-NLG in production settings must guarantee that necessary mechanisms are put in place to reduce and limit possible harm to users.
"The quality and results that we have obtained today are a big step forward in the journey towards unlocking the full promise of AI in natural language. The innovations of DeepSpeed and Megatron-LM will benefit existing and future AI model development and make large AI models cheaper and faster to train," the press statement reads.