Lemmy.World

170,917 readers
9,299 users here now

The World's Internet Frontpage Lemmy.World is a general-purpose Lemmy instance of various topics, for the entire world to use.

Be polite and follow the rules ⚖ https://legal.lemmy.world/tos

Get started

See the Getting Started Guide

Donations 💗

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Join the team 😎

Check out our team page to join

Questions / Issues

Questions/issues post to
To open a ticket
Reporting is to be done via the reporting button under a post/comment.
Additional Report Info HERE
Please note, you will NOT be able to comment or post while on a VPN or Tor connection

More Lemmy.World

Follow us for server news 🐘

Chat 🗨

Alternative UIs

https://a.lemmy.world - Alexandrite UI
https://photon.lemmy.world - Photon UI
https://m.lemmy.world - Voyager mobile UI
https://old.lemmy.world - A familiar UI

Monitoring / Stats 🌐

Service Status 🔥

https://status.lemmy.world

Lemmy.World is part of the FediHosting Foundation

founded 2 years ago

ADMINS

MPT-30B: Raising the bar for open-source foundation models (www.mosaicml.com)

submitted 2 years ago by AlmightySnoo to c/machinelearning

1 comments fedilink

and another commercially viable open-source LLM!

MoasicML open sources new 8k context length MPT-30B language model under Apache 2.0 license (22.06.2023 blogpost) (www.mosaicml.com)

submitted 2 years ago* (last edited 2 years ago) by [email protected] to c/[email protected]

18 comments fedilink

MPT-30B: Raising the bar for open-source, commercially available foundation AI/LLMs (www.mosaicml.com)

submitted 2 years ago* (last edited 2 years ago) by Blaed to c/[email protected]

0 comments fedilink

cross-posted from: https://lemmy.world/post/441601

MPT is back, and this time - better than ever. This is amazing news for free, open-source artificial intelligence and everyone at home wanting to engage with emerging LLM technologies.

Why?

MPT is a commercially licensable model, meaning anyone can use it in their business. This was big news when they initially announced MPT-7B. Now they are back with a higher parameter model, raising the bar for the quality of available AI to choose from.

I highly recommend supporting Mosaic by visiting their website for the full article. It's fantastically written with much more information.

For anyone without the time, here are a few highlight excerpts:

MPT-30B: Raising the bar for open-source foundation models

Introducing MPT-30B, a new, more powerful member of our Foundation Series of open-source models, trained with an 8k context length on H100s.

MPT-30B Family

Mosaic Pretrained Transformer (MPT) models are GPT-style decoder-only transformers with several improvements including higher speed, greater stability, and longer context lengths. Thanks to these improvements, customers can train MPT models efficiently (40-60% MFU) without diverging from loss spikes and can serve MPT models with both standard HuggingFace pipelines and FasterTransformer. MPT-30B (Base)

MPT-30B is a commercial Apache 2.0 licensed, open-source foundation model that exceeds the quality of GPT-3 (from the original paper) and is competitive with other open-source models such as LLaMa-30B and Falcon-40B.

Using our publicly available LLM Foundry codebase, we trained MPT-30B over the course of 2 months, transitioning between multiple different A100 clusters as hardware availability changed (at MosaicML, we share compute with our customers!), with an average MFU of >46%. In mid-June, after we received our first batch of 256xH100s from CoreWeave, we seamlessly moved MPT-30B to the new cluster to resume training on H100s with an average MFU of >35%. To the best of our knowledge, MPT-30B is the first public model to be (partially) trained on H100s! We found that throughput increased by 2.44x per GPU and we expect this speedup to increase as software matures for the H100.

As mentioned earlier, MPT-30B was trained with a long context window of 8k tokens (vs. 2k for LLaMa and Falcon) and can handle arbitrarily long context windows via ALiBi or with finetuning. To build 8k support into MPT-30B efficiently, we first pre-trained on 1T tokens using sequences that were 2k tokens long, and continued training for an additional 50B tokens using sequences that were 8k tokens long.

The data mix used for MPT-30B pretraining is very similar to MPT-7B (see the MPT-7B blog post for details). For the 2k context window pre-training we used 1T tokens from the same 10 data subsets as the MPT-7B model (Table 1), but in slightly different proportions.

It is only a matter of time before a version of this model is quantized or converted into GGML formats for the masses to tinker with and explore its possibilities.

Not sure what any of this means? That's okay, it's a lot to absorb. I suggest starting with UnderstandGPT, a great resource for getting acquainted with AI/LLMs in < 10 minutes.

Want to tinker with AI? Consider visiting this crash course, which shows you how to install and download software that can get you chatting with a few of these models today.

Don't want to run AI, but stay in the know? Consider subscribing to /c/FOSAI where we do everything we can to effectively educate and share access to free, open-source artificial intelligence in its many emerging forms. We keep up with the latest updates so you don't have to.

For anyone wanting to tinker with this soon - this particular version of MPT will likely hit open-source communities within the next month or so.

In the meantime, it's smaller sibling model MPT-7B can be chat with using many of the software detailed in this guide.

In my experience, gpt4all worked the easiest out-of-the-box for Mosaic models, but some of the recent updates to the other platforms like oobabooga and koboldcpp have proven to be effective running those as well.

GL, HF, and happy devving!

MPT-30B: Raising the bar for open-source, commercially available foundation AI/LLMs (www.mosaicml.com)

submitted 2 years ago* (last edited 2 years ago) by Blaed to c/fosai

1 comments fedilink

MPT is back, and this time - better than ever. This is amazing news for free, open-source artificial intelligence and everyone at home wanting to engage with emerging LLM technologies.

Why?

MPT is a commercially licensable model, meaning anyone can use it in their business. This was big news when they initially announced MPT-7B. Now they are back with a higher parameter model, raising the bar for the quality of available AI to choose from.

I highly recommend supporting Mosaic by visiting their website for the full article. It's fantastically written with much more information.

For anyone without the time, here are a few highlight excerpts:

MPT-30B: Raising the bar for open-source foundation models

Introducing MPT-30B, a new, more powerful member of our Foundation Series of open-source models, trained with an 8k context length on H100s.

MPT-30B Family

Mosaic Pretrained Transformer (MPT) models are GPT-style decoder-only transformers with several improvements including higher speed, greater stability, and longer context lengths. Thanks to these improvements, customers can train MPT models efficiently (40-60% MFU) without diverging from loss spikes and can serve MPT models with both standard HuggingFace pipelines and FasterTransformer. MPT-30B (Base)

MPT-30B is a commercial Apache 2.0 licensed, open-source foundation model that exceeds the quality of GPT-3 (from the original paper) and is competitive with other open-source models such as LLaMa-30B and Falcon-40B.

Using our publicly available LLM Foundry codebase, we trained MPT-30B over the course of 2 months, transitioning between multiple different A100 clusters as hardware availability changed (at MosaicML, we share compute with our customers!), with an average MFU of >46%. In mid-June, after we received our first batch of 256xH100s from CoreWeave, we seamlessly moved MPT-30B to the new cluster to resume training on H100s with an average MFU of >35%. To the best of our knowledge, MPT-30B is the first public model to be (partially) trained on H100s! We found that throughput increased by 2.44x per GPU and we expect this speedup to increase as software matures for the H100.

As mentioned earlier, MPT-30B was trained with a long context window of 8k tokens (vs. 2k for LLaMa and Falcon) and can handle arbitrarily long context windows via ALiBi or with finetuning. To build 8k support into MPT-30B efficiently, we first pre-trained on 1T tokens using sequences that were 2k tokens long, and continued training for an additional 50B tokens using sequences that were 8k tokens long.

The data mix used for MPT-30B pretraining is very similar to MPT-7B (see the MPT-7B blog post for details). For the 2k context window pre-training we used 1T tokens from the same 10 data subsets as the MPT-7B model (Table 1), but in slightly different proportions.

It is only a matter of time before a version of this model is quantized or converted into GGML formats for the masses to tinker with and explore its possibilities.

Not sure what any of this means? That's okay, it's a lot to absorb. I suggest starting with UnderstandGPT, a great resource for getting acquainted with AI/LLMs in < 10 minutes.

Want to tinker with AI? Consider visiting this crash course, which shows you how to install and download software that can get you chatting with a few of these models today.

Don't want to run AI, but stay in the know? Consider subscribing to /c/FOSAI where we do everything we can to effectively educate and share access to free, open-source artificial intelligence in its many emerging forms. We keep up with the latest updates so you don't have to.

For anyone wanting to tinker with this soon - this particular version of MPT will likely hit open-source communities within the next month or so.

In the meantime, it's smaller sibling model MPT-7B can be chat with using many of the software detailed in this guide.

In my experience, gpt4all worked the easiest out-of-the-box for Mosaic models, but some of the recent updates to the other platforms like oobabooga and koboldcpp have proven to be effective running those as well.

GL, HF, and happy devving!

MPT-30B: Raising the bar for open-source foundation models (www.mosaicml.com)

submitted 2 years ago* (last edited 2 years ago) by [email protected] to c/[email protected]

0 comments fedilink

https://huggingface.co/mosaicml

https://twitter.com/MosaicML/status/1671894543070035970

MoasicML open sources new 8k context length MPT-30B language model under Apache 2.0 license (www.mosaicml.com)

submitted 2 years ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/llamaShill on 2023-06-22 19:45:41+00:00.