Lemmy.World

168,330 readers
7,151 users here now

The World's Internet Frontpage Lemmy.World is a general-purpose Lemmy instance of various topics, for the entire world to use.

Be polite and follow the rules โš– https://legal.lemmy.world/tos

Get started

See the Getting Started Guide

Donations ๐Ÿ’—

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Liberapay patrons

GitHub Sponsors

Join the team ๐Ÿ˜Ž

Check out our team page to join

Questions / Issues

More Lemmy.World

Follow us for server news ๐Ÿ˜

Mastodon Follow

Chat ๐Ÿ—จ

Discord

Matrix

Alternative UIs

Monitoring / Stats ๐ŸŒ

Service Status ๐Ÿ”ฅ

https://status.lemmy.world

Mozilla HTTP Observatory Grade

Lemmy.World is part of the FediHosting Foundation

founded 1 year ago
ADMINS
1
 
 

and another commercially viable open-source LLM!

2
3
 
 

cross-posted from: https://lemmy.world/post/441601

MPT is back, and this time - better than ever. This is amazing news for free, open-source artificial intelligence and everyone at home wanting to engage with emerging LLM technologies.

Why?

MPT is a commercially licensable model, meaning anyone can use it in their business. This was big news when they initially announced MPT-7B. Now they are back with a higher parameter model, raising the bar for the quality of available AI to choose from.

I highly recommend supporting Mosaic by visiting their website for the full article. It's fantastically written with much more information.

For anyone without the time, here are a few highlight excerpts:

MPT-30B: Raising the bar for open-source foundation models

Introducing MPT-30B, a new, more powerful member of our Foundation Series of open-source models, trained with an 8k context length on H100s.

MPT-30B Family

Mosaic Pretrained Transformer (MPT) models are GPT-style decoder-only transformers with several improvements including higher speed, greater stability, and longer context lengths. Thanks to these improvements, customers can train MPT models efficiently (40-60% MFU) without diverging from loss spikes and can serve MPT models with both standard HuggingFace pipelines and FasterTransformer. MPT-30B (Base)

MPT-30B is a commercial Apache 2.0 licensed, open-source foundation model that exceeds the quality of GPT-3 (from the original paper) and is competitive with other open-source models such as LLaMa-30B and Falcon-40B.

Using our publicly available LLM Foundry codebase, we trained MPT-30B over the course of 2 months, transitioning between multiple different A100 clusters as hardware availability changed (at MosaicML, we share compute with our customers!), with an average MFU of >46%. In mid-June, after we received our first batch of 256xH100s from CoreWeave, we seamlessly moved MPT-30B to the new cluster to resume training on H100s with an average MFU of >35%. To the best of our knowledge, MPT-30B is the first public model to be (partially) trained on H100s! We found that throughput increased by 2.44x per GPU and we expect this speedup to increase as software matures for the H100.

As mentioned earlier, MPT-30B was trained with a long context window of 8k tokens (vs. 2k for LLaMa and Falcon) and can handle arbitrarily long context windows via ALiBi or with finetuning. To build 8k support into MPT-30B efficiently, we first pre-trained on 1T tokens using sequences that were 2k tokens long, and continued training for an additional 50B tokens using sequences that were 8k tokens long.

The data mix used for MPT-30B pretraining is very similar to MPT-7B (see the MPT-7B blog post for details). For the 2k context window pre-training we used 1T tokens from the same 10 data subsets as the MPT-7B model (Table 1), but in slightly different proportions.

It is only a matter of time before a version of this model is quantized or converted into GGML formats for the masses to tinker with and explore its possibilities.

Not sure what any of this means? That's okay, it's a lot to absorb. I suggest starting with UnderstandGPT, a great resource for getting acquainted with AI/LLMs in < 10 minutes.

Want to tinker with AI? Consider visiting this crash course, which shows you how to install and download software that can get you chatting with a few of these models today.

Don't want to run AI, but stay in the know? Consider subscribing to /c/FOSAI where we do everything we can to effectively educate and share access to free, open-source artificial intelligence in its many emerging forms. We keep up with the latest updates so you don't have to.

For anyone wanting to tinker with this soon - this particular version of MPT will likely hit open-source communities within the next month or so.

In the meantime, it's smaller sibling model MPT-7B can be chat with using many of the software detailed in this guide.

In my experience, gpt4all worked the easiest out-of-the-box for Mosaic models, but some of the recent updates to the other platforms like oobabooga and koboldcpp have proven to be effective running those as well.

GL, HF, and happy devving!

4
 
 

MPT is back, and this time - better than ever. This is amazing news for free, open-source artificial intelligence and everyone at home wanting to engage with emerging LLM technologies.

Why?

MPT is a commercially licensable model, meaning anyone can use it in their business. This was big news when they initially announced MPT-7B. Now they are back with a higher parameter model, raising the bar for the quality of available AI to choose from.

I highly recommend supporting Mosaic by visiting their website for the full article. It's fantastically written with much more information.

For anyone without the time, here are a few highlight excerpts:

MPT-30B: Raising the bar for open-source foundation models

Introducing MPT-30B, a new, more powerful member of our Foundation Series of open-source models, trained with an 8k context length on H100s.

MPT-30B Family

Mosaic Pretrained Transformer (MPT) models are GPT-style decoder-only transformers with several improvements including higher speed, greater stability, and longer context lengths. Thanks to these improvements, customers can train MPT models efficiently (40-60% MFU) without diverging from loss spikes and can serve MPT models with both standard HuggingFace pipelines and FasterTransformer. MPT-30B (Base)

MPT-30B is a commercial Apache 2.0 licensed, open-source foundation model that exceeds the quality of GPT-3 (from the original paper) and is competitive with other open-source models such as LLaMa-30B and Falcon-40B.

Using our publicly available LLM Foundry codebase, we trained MPT-30B over the course of 2 months, transitioning between multiple different A100 clusters as hardware availability changed (at MosaicML, we share compute with our customers!), with an average MFU of >46%. In mid-June, after we received our first batch of 256xH100s from CoreWeave, we seamlessly moved MPT-30B to the new cluster to resume training on H100s with an average MFU of >35%. To the best of our knowledge, MPT-30B is the first public model to be (partially) trained on H100s! We found that throughput increased by 2.44x per GPU and we expect this speedup to increase as software matures for the H100.

As mentioned earlier, MPT-30B was trained with a long context window of 8k tokens (vs. 2k for LLaMa and Falcon) and can handle arbitrarily long context windows via ALiBi or with finetuning. To build 8k support into MPT-30B efficiently, we first pre-trained on 1T tokens using sequences that were 2k tokens long, and continued training for an additional 50B tokens using sequences that were 8k tokens long.

The data mix used for MPT-30B pretraining is very similar to MPT-7B (see the MPT-7B blog post for details). For the 2k context window pre-training we used 1T tokens from the same 10 data subsets as the MPT-7B model (Table 1), but in slightly different proportions.

It is only a matter of time before a version of this model is quantized or converted into GGML formats for the masses to tinker with and explore its possibilities.

Not sure what any of this means? That's okay, it's a lot to absorb. I suggest starting with UnderstandGPT, a great resource for getting acquainted with AI/LLMs in < 10 minutes.

Want to tinker with AI? Consider visiting this crash course, which shows you how to install and download software that can get you chatting with a few of these models today.

Don't want to run AI, but stay in the know? Consider subscribing to /c/FOSAI where we do everything we can to effectively educate and share access to free, open-source artificial intelligence in its many emerging forms. We keep up with the latest updates so you don't have to.

For anyone wanting to tinker with this soon - this particular version of MPT will likely hit open-source communities within the next month or so.

In the meantime, it's smaller sibling model MPT-7B can be chat with using many of the software detailed in this guide.

In my experience, gpt4all worked the easiest out-of-the-box for Mosaic models, but some of the recent updates to the other platforms like oobabooga and koboldcpp have proven to be effective running those as well.

GL, HF, and happy devving!

5
6
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/llamaShill on 2023-06-22 19:45:41+00:00.

view more: next โ€บ