Cerebras Systems Expands PyTorch Support, Delivers Capability for Giant Model Training

Cerebras Systems Expands PyTorch Support, Delivers Capability for Giant Model Training.

Expanded Software Platform Enables Developers to Seamlessly Scale Large Language Models.

SUNNYVALE, Calif – Cerebras Systems, the pioneer in high-performance artificial intelligence (AI) computing, today released version 1.2 of the Cerebras Software Platform, CSoft, with expanded support for PyTorch and TensorFlow. In addition, customers can now quickly and easily train models with billions of parameters via Cerebras’ weight streaming technology.

PyTorch is the leading machine learning framework. It is used by developers to accelerate the path from research prototyping to production deployment. As model size increases and as transformer models become more popular, it is essential that machine learning practitioners have access to fast, easy to set up and use compute solutions like the Cerebras CS-2. With the CS-2 running CSoft, the developer community has a powerful tool to enable new breakthroughs in AI.

“From the start, our goal was to seamlessly support whichever machine learning framework our customers wanted to write in,” said Emad Barsoum, Senior Director, AI Framework, at Cerebras Systems. “Our customers write in TensorFlow and in PyTorch, and our software stack, CSoft, makes it quick and easy to express your models in the framework of your choice. By doing so, our customers gain access to the 850,000 AI optimized cores and 40 Gigabytes of on-chip memory in the Cerebras CS-2.”

The Cerebras CS-2 is the world’s fastest AI system. It is powered by the largest processor ever built – the Cerebras Wafer-Scale Engine 2 (WSE-2). The Cerebras WSE-2 delivers more AI optimized compute cores, more fast memory, and more fabric bandwidth than any other deep learning processor in existence. Purpose built for AI work, the CS-2 runs CSoft which enables machine learning practitioners to write their models in the opensource frameworks of TensorFlow or PyTorch and, without modification, run the model on the Cerebras CS-2. In fact, a model that was written for a graphics processing unit or a central processing unit can run under CSoft on the Cerebras CS-2 without any changes. With the CS-2 and CSoft, practitioners can seamlessly scale up from small models like BERT to the largest models in existence like GPT-3.

Large models have demonstrated state-of-the-art accuracy on many language processing and understanding tasks. Training these large models using GPU is challenging and time-consuming. Training from scratch on new datasets often takes weeks and 10s of megawatts of power on large clusters of legacy equipment. Moreover, as the size of the cluster grows, power, cost, and complexity grow exponentially. Programming clusters of graphics processing units requires rare skills, different machine learning frameworks, and specialized tools that require weeks of engineering time to each iteration.

The CS-2 was designed specifically to address these issues. Even the largest model takes only a few minutes to set up, and the CS-2 outperforms clusters with hundreds of graphics processing units. The CS-2 allows users to explore more ideas in less time because it takes less time to set up, configure, and train.

Cerebras is delivering industry-leading AI solutions to a growing roster of customers in the enterprise, government, and high-performance computing segments, including GlaxoSmithKline, AstraZeneca, TotalEnergies, nference, Argonne National Laboratory, Lawrence Livermore National Laboratory, Pittsburgh Supercomputing Center, Edinburgh Parallel Computing Centre (EPCC), and Tokyo Electron Devices, with customers in North America, Asia, Europe, and the Middle East.

For more information about the Cerebras Software Platform, please visit https://cerebras.net/software/.

Cerebras Systems Expands PyTorch Support, Delivers Capability for Giant Model Training

Bymaheshmore1

By maheshmore1

Related Post

NVIDIA Project DIGITS Debuts as World’s Smallest AI Supercomputer Capable of Running 200B-Parameter Models

IBM’s breakthrough research in optics technology could dramatically improve how data centers train and run generative AI models

Canon ships its next generation chip making machine to Texas Institute for Electronics

You missed

AI-powered social innovations are poised to change the way we interact with urban environments – IBM

NASA’s aircraft guidance system Soxnav for planes flying at high speeds up to 500 mph

Telefonica Tech to deploy IBM’s quantum-safe technology infrastructure at its headquarters

Microsoft to invest $3B in India cloud and AI infrastructure