Blaed

joined 2 years ago
MODERATOR OF
7
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

🤖 Happy FOSAI Friday! 🚀

Friday, September 29, 2023

HyperTech News Report #0002

Hello Everyone!

Welcome back to the HyperTech News Report! This week we're seeing some really exciting developments in futuristic technologies. With more tools and methods releasing by the day, I feel we're in for a renaissance in software. I hope hardware is soon to follow.. but I am here for it! So are you. Brace yourselves. Change is coming! This next year will be very interesting to watch unfold.

Table of Contents

Community Changelog

  • Cleaned up some old content (let me know if you notice something that should be archived or updated)

Image of the Week

This image of the week comes from a DALL-E 3 demonstration by Will Depue. This depicts a popular image for diffusion models benchmarks - the astronaut riding a horse in space. Apparently this was hard to get right, and others have had trouble replicating it - but it seems to have been generated by DALL-E 3 nevertheless. Curious to see how it stacks up against other diffusers when its more widely available.

New Foundation Model!

There have been many new models hitting HuggingFace on the daily. The recent influx has made it hard to benchmark and keep up with these models - so I will be highlighting a hand select curated week-by-week, exploring these with more focus (a few at a time).

If you have any model favorites (or showcase suggestions) let me know what they are in the comments below and I'll add them to the growing catalog!

This week we're taking a look at Mistral - a new foundation model with a sliding attention mechanism that gives it advantages over other models. Better yet - the mistral.ai team released this new model under the Apache 2.0 license. Massive shoutout to this team, this is huge for anyone who wants more options (commercially) outside of Llama 2 and Falcon families.

From Mistralai:

The best 7B, Apache 2.0.. Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases. Mistral 7B is better than Llama 2 13B on all benchmarks, has natural coding abilities, and 8k sequence length. It’s released under Apache 2.0 licence, and we made it easy to deploy on any cloud.

Learn More

Mistralai

TheBloke (Quantized)

More About GPTQ

More About GGUF

Metaverse Developments

Mark Zuckerberg had his third round interview on the Lex Fridman podcast - but this time, in the updated Metaverse. This is pretty wild. We seem to have officially left uncanny valley territory. There are still clearly bugs and improvements to be made - but imagine the possibilities of this mixed reality technology (paired with VR LLM applications).

The type of experiences we can begin to explore in these digital realms are going to evolve into things of true sci-fi in our near future. This is all very exciting stuff to look forward to as AI proliferates markets and drives innovation.

What do you think? Zuck looks more human in the metaverse than in real life.. mission.. success?

Click here for the podcast episode.

NVIDIA NeMo Guardrails

If you haven't heard about NeMo Guardrails, you should check it out. It is a new library and approach for aligning models and completing functions for LLMs. It is similar to LangChain and LlamaIndex, but uses an in-house developed language from NVIDIA called 'colang' for configuration, with NeMo Guardrail libraries in python friendly syntax.

This is still a new and unexplored tool, but could provide some interesting results with some creative applications. It is also particularly powerful if you need to align enterprise LLMs for clients or stakeholders.

Learn More

Tutorial Highlights

Mistral 7B - Small But Mighty 🚀 🚀

Chatbots with RAG: LangChain Full Walkthrough

NVIDIA NeMo Guardrails: Full Walkthrough for Chatbots / AI

Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to [email protected] where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

5
HyperTech Preview (self.fosai)
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

Good Morning Everyone!

This week is going to be an exciting one! There is a lot happening in the background that I wish I could share. Instead of spoiling any surprises, allow me to leave you with a preview of what I want to accomplish with HyperionTechnologies (a.k.a. HyperTech or HYPERION).


HyperTech Models

I have wanted to fine-tune and deploy a large language model since interacting with ChatGPT when it first released. I intend to fulfill this promise to myself before the end of the year.

I'm also not one to wait! After some hours hacking away late last night, I kicked off fine-tuning a model on an A100 and V100. Both failed halfway through their training (runtime timed out). I will be kicking off another attempt later this week! I will plan to do a full post on how that goes when it happens.

I am using a basic sharded Llama-7B model for this first practice run, mostly for experimenting with this a new process that I'm adapting into my HyperTech Workshop flow. Once I figure it out and reverse engineer what I need, expect more fine-tunings from HyperTech!


HyperTech Resources

Fun fact - everything HyperTech is really just personal workflows branded under the thematic.

If I am consistently engaging with something helpful to me, I will convert it into a workflow and often open-source it through HyperTech. From processes to templates, training workflows or datasets. Whatever muse slots itself into the grand vision that is HYPERION.

One of these resources is available to you today, which is simply a GitHub repo template called the HyperTech Workshop - a file structure template tailored to generative AI/ML/DL developers and enthusiasts. It has many notebooks in .ipynb and jupyter friendly formats (among other links and tools strewn about the digital workspace).

Know that there are many more resources on the horizon! These are few of many to come.


HyperTech News Reports

You already saw this last Friday, but here's the link in case you missed the latest news report!

These will continue to be handwritten and journaled at the regular!


HyperTech Projects

  • Fine-tune, deploy, and open-source a new model series (Llama, Falcon, etc.)
  • Build an integrated workshop workflow
  • Build a custom ubuntu distro.
  • Build a custom dataset.
  • REDACTED
  • REDACTED
  • REDACTED

HyperTech Ethos

I am going to make another post about this later, but I want to assure everyone this new HyperTech project is borderline satire. I am not looking to turn anyone here into a product.

This company is a highly experimental sci-fi R&D project that I legitimately started for fun (and the future). Don't take any of it too seriously.

If you find any of my resources fun, interesting, curious or helpful - I've already succeeded in my mission.


Thank you!

I appreciate you reading this post!

I hope you have a great rest of your day.

Blaed

p.s. You're doing a great job.

 

cross-posted from: https://lemmy.world/post/5549499

🤖 Happy FOSAI Friday! 🚀

Friday, September 22, 2023

HyperTech News Report #0001

Hello Everyone!

This series is a new vehicle for [email protected] news reports. In these posts I'll go over projects or news I stumble across week-over-week. I will try to keep Fridays consistent with this series, covering most of what I have been (but at regular cadence). For this week, I am going to do my best catching us up on a few old (and new) hot topics you may or may not have heard about already.

Table of Contents

Community Changelog

Image of the Week

A Stable Diffusion + ControlNet image garnered a ton of attention on social media this last week. This image has brought more recognition to the possibilities of these tools and helps shed a more positive light on the capabilities of generative models.

Read More

Introducing HyperTech

HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

HyperTech Workshop (V0.1.0)

I am excited to announce my technology company: HyperTech. The first project of HyperionTechnologies is a digital workshop that comes in the form of a GitHub repo template for AI/ML/DL developers. HyperTech is a for-fun sci-fi company I started to explore AI development (among other emerging technologies I find curious and interesting). It is a satire corpo sandbox I have designed around my personal journey inside and outside of [email protected] with highly experimental projects and workflows. I will be using this company and setting/narrative/thematic to drive some of the future (and totally optional) content of our community. Any tooling, templates, or examples made along the way are entirely for you to learn from or reverse engineer for your own purpose or amusement. I'll be doing a dedicated post to HyperTech later this weekend. Keep your eye out for that if you're curious. The future is now. The future is bright. The future is HYPERION. (don't take this project too seriously).

New GGUF Models

Within this last month or so, llama.cpp have begun to standardize a new model format - the .GGUF model - which is much more optimized than its now legacy (and deprecated predecessor - GGML). This is a big deal for anyone running GGML models. GGUF is basically superior in all ways. Check out llama.cpp's notes about this change on their official GitHub. I have used a few GGUF models myself and have found them much more performant than any GGML counterpart. TheBloke has already converted many of his older models into this new format (which is compatible with anything utilizing llama.cpp).

More About GGUF:

It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models. Basically: 1.) No more breaking changes 2.) Support for non-llama models. (falcon, rwkv, bloom, etc.) and 3.) No more fiddling around with rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps. Prompt formats could also be set automatically.

Falcon 180B

Many of you have probably already heard of this, but Falcon 180B was recently announced - and I haven't covered it here yet so it's worth mentioning in this post. Check out the full article regarding its release here on HuggingFace. Can't wait to see what comes next! This will open up a lot of doors for us to explore.

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model. The dataset for Falcon 180B consists predominantly of web data from RefinedWeb (~85%). In addition, it has been trained on a mix of curated data such as conversations, technical papers, and a small fraction of code (~3%). This pretraining dataset is big enough that even 3.5 trillion tokens constitute less than an epoch.

The released chat model is fine-tuned on chat and instruction datasets with a mix of several large-scale conversational datasets.

‼️ Commercial Usage: Falcon 180b can be commercially used but under very restrictive conditions, excluding any "hosting use". We recommend to check the license and consult your legal team if you are interested in using it for commercial purposes.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

LLama 3 Rumors

Speaking of big open-source models - Llama 3 is rumored to be under training or development. Llama 2 was clearly an improvement over its predecessor. I wonder how Llama 3 & 4 will stack in this race to AGI. I forget that we're still early to this party. At this rate of development, I believe we're bound to see it within the decade.

Meta plans to rival GPT-4 with a rumored free Llama 3- According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license.- Jason Wei, an engineer associated with OpenAI, has indicated that Meta possesses the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.- Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans.

DALM

I recently stumbled across DALM - a new domain adapted language modeling toolkit which is supposed to enable a workflow that trains a retrieval augmented generation (RAG) pipeline from end-to-end. According to their results, the DALM specific training process leads to a much higher response quality when it comes to retrieval augmented generation. I haven't had a chance to tinker with this a lot, but I'd keep an eye on it if you're engaging with RAG workflows.

DALM Manifesto:

A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview.

For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

DALL-E 3

OpenAI announced DALL-E 3 that will have direct native compatibility within ChatGPT. This means users should be able to naturally and semantically iterate over images and features over time, adjusting the output from the same chat interface throughout their conversation. This will enable many users to seamlessly incorporate image diffusion into their chat workflows.

I think this is huge, mostly because it illustrates a new technique that removes some of the barriers that prompt engineers have to solve (it reads prompts differently than other diffusers). Not to mention you are permitted to sell, keep, and commercialize any image DALL-E generates.

I am curious to see if open-source workflows can follow a similar approach and have iterative design workflows that seamlessly integrate with a chat interface. That, paired with manual tooling from things like ControlNet would be a powerful pairing that could spark a lot of creativity. Don't get me wrong, sometimes I really like manual and node-based workflows, but I believe semantic computation is the future. Regardless of how 'open' OpenAI truly is, these breakthroughs help chart the path forward for everyone else still catching up.

More About DALL-E 3:

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall. Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.

DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October. As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don't need our permission to reprint, sell or merchandise them.

Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to [email protected] where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

 

cross-posted from: https://lemmy.world/post/5549499

🤖 Happy FOSAI Friday! 🚀

Friday, September 22, 2023

HyperTech News Report #0001

Hello Everyone!

This series is a new vehicle for [email protected] news reports. In these posts I'll go over projects or news I stumble across week-over-week. I will try to keep Fridays consistent with this series, covering most of what I have been (but at regular cadence). For this week, I am going to do my best catching us up on a few old (and new) hot topics you may or may not have heard about already.

Table of Contents

Community Changelog

Image of the Week

A Stable Diffusion + ControlNet image garnered a ton of attention on social media this last week. This image has brought more recognition to the possibilities of these tools and helps shed a more positive light on the capabilities of generative models.

Read More

Introducing HyperTech

HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

HyperTech Workshop (V0.1.0)

I am excited to announce my technology company: HyperTech. The first project of HyperionTechnologies is a digital workshop that comes in the form of a GitHub repo template for AI/ML/DL developers. HyperTech is a for-fun sci-fi company I started to explore AI development (among other emerging technologies I find curious and interesting). It is a satire corpo sandbox I have designed around my personal journey inside and outside of [email protected] with highly experimental projects and workflows. I will be using this company and setting/narrative/thematic to drive some of the future (and totally optional) content of our community. Any tooling, templates, or examples made along the way are entirely for you to learn from or reverse engineer for your own purpose or amusement. I'll be doing a dedicated post to HyperTech later this weekend. Keep your eye out for that if you're curious. The future is now. The future is bright. The future is HYPERION. (don't take this project too seriously).

New GGUF Models

Within this last month or so, llama.cpp have begun to standardize a new model format - the .GGUF model - which is much more optimized than its now legacy (and deprecated predecessor - GGML). This is a big deal for anyone running GGML models. GGUF is basically superior in all ways. Check out llama.cpp's notes about this change on their official GitHub. I have used a few GGUF models myself and have found them much more performant than any GGML counterpart. TheBloke has already converted many of his older models into this new format (which is compatible with anything utilizing llama.cpp).

More About GGUF:

It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models. Basically: 1.) No more breaking changes 2.) Support for non-llama models. (falcon, rwkv, bloom, etc.) and 3.) No more fiddling around with rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps. Prompt formats could also be set automatically.

Falcon 180B

Many of you have probably already heard of this, but Falcon 180B was recently announced - and I haven't covered it here yet so it's worth mentioning in this post. Check out the full article regarding its release here on HuggingFace. Can't wait to see what comes next! This will open up a lot of doors for us to explore.

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model. The dataset for Falcon 180B consists predominantly of web data from RefinedWeb (~85%). In addition, it has been trained on a mix of curated data such as conversations, technical papers, and a small fraction of code (~3%). This pretraining dataset is big enough that even 3.5 trillion tokens constitute less than an epoch.

The released chat model is fine-tuned on chat and instruction datasets with a mix of several large-scale conversational datasets.

‼️ Commercial Usage: Falcon 180b can be commercially used but under very restrictive conditions, excluding any "hosting use". We recommend to check the license and consult your legal team if you are interested in using it for commercial purposes.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

LLama 3 Rumors

Speaking of big open-source models - Llama 3 is rumored to be under training or development. Llama 2 was clearly an improvement over its predecessor. I wonder how Llama 3 & 4 will stack in this race to AGI. I forget that we're still early to this party. At this rate of development, I believe we're bound to see it within the decade.

Meta plans to rival GPT-4 with a rumored free Llama 3- According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license.- Jason Wei, an engineer associated with OpenAI, has indicated that Meta possesses the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.- Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans.

DALM

I recently stumbled across DALM - a new domain adapted language modeling toolkit which is supposed to enable a workflow that trains a retrieval augmented generation (RAG) pipeline from end-to-end. According to their results, the DALM specific training process leads to a much higher response quality when it comes to retrieval augmented generation. I haven't had a chance to tinker with this a lot, but I'd keep an eye on it if you're engaging with RAG workflows.

DALM Manifesto:

A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview.

For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

DALL-E 3

OpenAI announced DALL-E 3 that will have direct native compatibility within ChatGPT. This means users should be able to naturally and semantically iterate over images and features over time, adjusting the output from the same chat interface throughout their conversation. This will enable many users to seamlessly incorporate image diffusion into their chat workflows.

I think this is huge, mostly because it illustrates a new technique that removes some of the barriers that prompt engineers have to solve (it reads prompts differently than other diffusers). Not to mention you are permitted to sell, keep, and commercialize any image DALL-E generates.

I am curious to see if open-source workflows can follow a similar approach and have iterative design workflows that seamlessly integrate with a chat interface. That, paired with manual tooling from things like ControlNet would be a powerful pairing that could spark a lot of creativity. Don't get me wrong, sometimes I really like manual and node-based workflows, but I believe semantic computation is the future. Regardless of how 'open' OpenAI truly is, these breakthroughs help chart the path forward for everyone else still catching up.

More About DALL-E 3:

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall. Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.

DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October. As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don't need our permission to reprint, sell or merchandise them.

Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to [email protected] where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

 

cross-posted from: https://lemmy.world/post/5549499

🤖 Happy FOSAI Friday! 🚀

Friday, September 22, 2023

HyperTech News Report #0001

Hello Everyone!

This series is a new vehicle for [email protected] news reports. In these posts I'll go over projects or news I stumble across week-over-week. I will try to keep Fridays consistent with this series, covering most of what I have been (but at regular cadence). For this week, I am going to do my best catching us up on a few old (and new) hot topics you may or may not have heard about already.

Table of Contents

Community Changelog

Image of the Week

A Stable Diffusion + ControlNet image garnered a ton of attention on social media this last week. This image has brought more recognition to the possibilities of these tools and helps shed a more positive light on the capabilities of generative models.

Read More

Introducing HyperTech

HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

HyperTech Workshop (V0.1.0)

I am excited to announce my technology company: HyperTech. The first project of HyperionTechnologies is a digital workshop that comes in the form of a GitHub repo template for AI/ML/DL developers. HyperTech is a for-fun sci-fi company I started to explore AI development (among other emerging technologies I find curious and interesting). It is a satire corpo sandbox I have designed around my personal journey inside and outside of [email protected] with highly experimental projects and workflows. I will be using this company and setting/narrative/thematic to drive some of the future (and totally optional) content of our community. Any tooling, templates, or examples made along the way are entirely for you to learn from or reverse engineer for your own purpose or amusement. I'll be doing a dedicated post to HyperTech later this weekend. Keep your eye out for that if you're curious. The future is now. The future is bright. The future is HYPERION. (don't take this project too seriously).

New GGUF Models

Within this last month or so, llama.cpp have begun to standardize a new model format - the .GGUF model - which is much more optimized than its now legacy (and deprecated predecessor - GGML). This is a big deal for anyone running GGML models. GGUF is basically superior in all ways. Check out llama.cpp's notes about this change on their official GitHub. I have used a few GGUF models myself and have found them much more performant than any GGML counterpart. TheBloke has already converted many of his older models into this new format (which is compatible with anything utilizing llama.cpp).

More About GGUF:

It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models. Basically: 1.) No more breaking changes 2.) Support for non-llama models. (falcon, rwkv, bloom, etc.) and 3.) No more fiddling around with rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps. Prompt formats could also be set automatically.

Falcon 180B

Many of you have probably already heard of this, but Falcon 180B was recently announced - and I haven't covered it here yet so it's worth mentioning in this post. Check out the full article regarding its release here on HuggingFace. Can't wait to see what comes next! This will open up a lot of doors for us to explore.

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model. The dataset for Falcon 180B consists predominantly of web data from RefinedWeb (~85%). In addition, it has been trained on a mix of curated data such as conversations, technical papers, and a small fraction of code (~3%). This pretraining dataset is big enough that even 3.5 trillion tokens constitute less than an epoch.

The released chat model is fine-tuned on chat and instruction datasets with a mix of several large-scale conversational datasets.

‼️ Commercial Usage: Falcon 180b can be commercially used but under very restrictive conditions, excluding any "hosting use". We recommend to check the license and consult your legal team if you are interested in using it for commercial purposes.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

LLama 3 Rumors

Speaking of big open-source models - Llama 3 is rumored to be under training or development. Llama 2 was clearly an improvement over its predecessor. I wonder how Llama 3 & 4 will stack in this race to AGI. I forget that we're still early to this party. At this rate of development, I believe we're bound to see it within the decade.

Meta plans to rival GPT-4 with a rumored free Llama 3- According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license.- Jason Wei, an engineer associated with OpenAI, has indicated that Meta possesses the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.- Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans.

DALM

I recently stumbled across DALM - a new domain adapted language modeling toolkit which is supposed to enable a workflow that trains a retrieval augmented generation (RAG) pipeline from end-to-end. According to their results, the DALM specific training process leads to a much higher response quality when it comes to retrieval augmented generation. I haven't had a chance to tinker with this a lot, but I'd keep an eye on it if you're engaging with RAG workflows.

DALM Manifesto:

A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview.

For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

DALL-E 3

OpenAI announced DALL-E 3 that will have direct native compatibility within ChatGPT. This means users should be able to naturally and semantically iterate over images and features over time, adjusting the output from the same chat interface throughout their conversation. This will enable many users to seamlessly incorporate image diffusion into their chat workflows.

I think this is huge, mostly because it illustrates a new technique that removes some of the barriers that prompt engineers have to solve (it reads prompts differently than other diffusers). Not to mention you are permitted to sell, keep, and commercialize any image DALL-E generates.

I am curious to see if open-source workflows can follow a similar approach and have iterative design workflows that seamlessly integrate with a chat interface. That, paired with manual tooling from things like ControlNet would be a powerful pairing that could spark a lot of creativity. Don't get me wrong, sometimes I really like manual and node-based workflows, but I believe semantic computation is the future. Regardless of how 'open' OpenAI truly is, these breakthroughs help chart the path forward for everyone else still catching up.

More About DALL-E 3:

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall. Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.

DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October. As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don't need our permission to reprint, sell or merchandise them.

Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to [email protected] where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

11
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

🤖 Happy FOSAI Friday! 🚀

Friday, September 22, 2023

HyperTech News Report #0001

Hello Everyone!

This series is a new vehicle for [email protected] news reports. In these posts I'll go over projects or news I stumble across week-over-week. I will try to keep Fridays consistent with this series, covering most of what I have been (but at regular cadence). For this week, I am going to do my best catching us up on a few old (and new) hot topics you may or may not have heard about already.

Table of Contents

Community Changelog

Image of the Week

A Stable Diffusion + ControlNet image garnered a ton of attention on social media this last week. This image has brought more recognition to the possibilities of these tools and helps shed a more positive light on the capabilities of generative models.

Read More

Introducing HyperTech

HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

HyperTech Workshop (V0.1.0)

I am excited to announce my technology company: HyperTech. The first project of HyperionTechnologies is a digital workshop that comes in the form of a GitHub repo template for AI/ML/DL developers. HyperTech is a for-fun sci-fi company I started to explore AI development (among other emerging technologies I find curious and interesting). It is a satire corpo sandbox I have designed around my personal journey inside and outside of [email protected] with highly experimental projects and workflows. I will be using this company and setting/narrative/thematic to drive some of the future (and totally optional) content of our community. Any tooling, templates, or examples made along the way are entirely for you to learn from or reverse engineer for your own purpose or amusement. I'll be doing a dedicated post to HyperTech later this weekend. Keep your eye out for that if you're curious. The future is now. The future is bright. The future is HYPERION. (don't take this project too seriously).

New GGUF Models

Within this last month or so, llama.cpp have begun to standardize a new model format - the .GGUF model - which is much more optimized than its now legacy (and deprecated predecessor - GGML). This is a big deal for anyone running GGML models. GGUF is basically superior in all ways. Check out llama.cpp's notes about this change on their official GitHub. I have used a few GGUF models myself and have found them much more performant than any GGML counterpart. TheBloke has already converted many of his older models into this new format (which is compatible with anything utilizing llama.cpp).

More About GGUF:

It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models. Basically: 1.) No more breaking changes 2.) Support for non-llama models. (falcon, rwkv, bloom, etc.) and 3.) No more fiddling around with rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps. Prompt formats could also be set automatically.

Falcon 180B

Many of you have probably already heard of this, but Falcon 180B was recently announced - and I haven't covered it here yet so it's worth mentioning in this post. Check out the full article regarding its release here on HuggingFace. Can't wait to see what comes next! This will open up a lot of doors for us to explore.

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model. The dataset for Falcon 180B consists predominantly of web data from RefinedWeb (~85%). In addition, it has been trained on a mix of curated data such as conversations, technical papers, and a small fraction of code (~3%). This pretraining dataset is big enough that even 3.5 trillion tokens constitute less than an epoch.

The released chat model is fine-tuned on chat and instruction datasets with a mix of several large-scale conversational datasets.

‼️ Commercial Usage: Falcon 180b can be commercially used but under very restrictive conditions, excluding any "hosting use". We recommend to check the license and consult your legal team if you are interested in using it for commercial purposes.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

LLama 3 Rumors

Speaking of big open-source models - Llama 3 is rumored to be under training or development. Llama 2 was clearly an improvement over its predecessor. I wonder how Llama 3 & 4 will stack in this race to AGI. I forget that we're still early to this party. At this rate of development, I believe we're bound to see it within the decade.

Meta plans to rival GPT-4 with a rumored free Llama 3- According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license.- Jason Wei, an engineer associated with OpenAI, has indicated that Meta possesses the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.- Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans.

DALM

I recently stumbled across DALM - a new domain adapted language modeling toolkit which is supposed to enable a workflow that trains a retrieval augmented generation (RAG) pipeline from end-to-end. According to their results, the DALM specific training process leads to a much higher response quality when it comes to retrieval augmented generation. I haven't had a chance to tinker with this a lot, but I'd keep an eye on it if you're engaging with RAG workflows.

DALM Manifesto:

A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview.

For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

DALL-E 3

OpenAI announced DALL-E 3 that will have direct native compatibility within ChatGPT. This means users should be able to naturally and semantically iterate over images and features over time, adjusting the output from the same chat interface throughout their conversation. This will enable many users to seamlessly incorporate image diffusion into their chat workflows.

I think this is huge, mostly because it illustrates a new technique that removes some of the barriers that prompt engineers have to solve (it reads prompts differently than other diffusers). Not to mention you are permitted to sell, keep, and commercialize any image DALL-E generates.

I am curious to see if open-source workflows can follow a similar approach and have iterative design workflows that seamlessly integrate with a chat interface. That, paired with manual tooling from things like ControlNet would be a powerful pairing that could spark a lot of creativity. Don't get me wrong, sometimes I really like manual and node-based workflows, but I believe semantic computation is the future. Regardless of how 'open' OpenAI truly is, these breakthroughs help chart the path forward for everyone else still catching up.

More About DALL-E 3:

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall. Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.

DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October. As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don't need our permission to reprint, sell or merchandise them.

Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to [email protected] where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

 

Hey everyone!

I don't think I've shared this one before, so allow me to introduce you to 'LM Studio' - a new application that is tailored to LLM developers and enthusiasts.

Check it out!


With LM Studio, you can ...

🤖 - Run LLMs on your laptop, entirely offline

👾 - Use models through the in-app Chat UI or an OpenAI compatible local server

📂 - Download any compatible model files from HuggingFace 🤗 repositories

🔭 - Discover new & noteworthy LLMs in the app's home page LM Studio supports any ggml Llama, MPT, and StarCoder model on Hugging Face (Llama 2, Orca, Vicuna, Nous Hermes, WizardCoder, MPT, etc.)

Minimum requirements: M1/M2 Mac, or a Windows PC with a processor that supports AVX2. Linux is under development.

Made possible thanks to the llama.cpp project.

We are expanding our team. See our careers page.


Love seeing these new tools come out! Especially with the new gguf format being widely adopted.

The regularly updated and curated list of new LLM releases they provide through this platform is enough for me to keep it installed.

I'll be tinkering plenty when I have the time this week. I'll be sure to let everyone know how it goes! In the meantime, if you do end up giving LM Studio a try - let us know your thoughts and experience with it in the comments below.

[–] Blaed 1 points 1 year ago* (last edited 1 year ago)

This is all very good feedback! I appreciate everyone who has commented so far. I will leave this post pinned for the remainder of the year for anyone (new member or old) to share their thoughts and what else they (you) think we should explore next with [email protected].

12
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

Hello everyone!

I have finally found some time for myself, and I felt it was right to commit some of it here. These last few months have been a wild ride for all of us, I'm sure. I wanted to start this post to regroup and rekindle our interests to see wherever they may take us next.

So, what do you want to see out of our community in the near future? I am opening up the floor for you to leave your comment below and let me know how we can make things a little more interesting here.

[email protected]


📰 Community Feature #1 ✅

AI/ML/DL News / Papers / Projects

Where it all began. I’ll continue covering a wide breadth of stuff I find interesting in more curated posts, including links to any cool papers, projects, or news I find curious.


🎓 Community Feature #2 ✅

Open Source Learning Resources

I have a few updates coming to https://fosai.xyz soon. I’m thinking about expanding it in other directions, let me know if there’s something on this site you want me to explore or expand upon. Everything on there is also consolidated as resources on our sidebar.


📚 Community Feature #3 ✅

Collaborative Learning

When someone new posts a question or needs help - a lot of you are quick to jump in and respond before I get a chance to see it. I think that’s awesome (that so many of you are willing to reply to questions so fast). Big thanks to everyone who engages with those posts. I really appreciate the collaborative support! I don’t have all the time in the world, so those who can help guide others into this wonderful would of AI are encouraged to continue answering questions and helping each other. It means a lot to me (and the person asking the question!).


🕹️ Community Feature #4 ☑️

Games and Stories ❔

Anyone care about games or stories made out of / built on AI stacks? I can share a few projects that might be fun to explore, more custom to exploring a general sense of how AI may evolve in games, story, and entertainment. I could start a casual series building a simulated narrative contextual to [email protected] and we can study its evolution over time together. I have a narrative and setting for this (PROJECT HYPERION), but I have no idea if anyone would be interested in reading this series. It would be delivered like an augmented reality magazine that teaches you concepts from this site in a Discord Server based around these simulated game world rules. This was a highly experimental project I started, but shelved. I can dust this off and share it here if there’s enough intrigue. Read more about PROJECT HYPERION in the notes at the bottom of this post.


🔬 Community Feature #5 ☑️

Random Prototypes❔

I tinker a lot, many times with incoherent prototypes. At this point, I think it might be worth exploring specific use cases and spending more time prototyping concepts you all vote on? In this series, you could ask me to prototype something (or explore the possibilities of achieving some experimental goal) we can benchmark and measure together post-by-post. These random prototypes would be random projects that we explore purely for fun or particular business ideas/applications. They could be example implementations or satire representations of projects built to help learn with a more hands-on approach.


🧪 Community Feature #6 ☑️

Benchmarks❔

I have considered doing a benchmark series, but this would take a bit of setup for the best results. At least for me to do it consistently the way I’d want… That being said, I could spend time kicking this series off too (if there was enough interest). Here I would try different LLMs across a consistent bench we devise as a community, whether local or a cloud-based environment. From there, we can explore any size quantization, parameter, model architecture, or implementation approach and see how they stack up side-by-side.


Got an idea❔Vote for it below! ☑️

Do you have an idea? Something you want to see out of [email protected]? Share your series or vote on your favorite idea in the comments below!


Other Notes & Food for Thought 📓

  • Growth: I'm not looking to rapidly scale or grow, this community should remain holistic and engaging.
  • Quality Over Quantity: Recent feedback has made me reconsider how to share updates. Going forward, all content will be handwritten and hand reported (as it was). I will be focusing on quality over quantity. Shoutout to [email protected] for the honest feedback.
  • Best AI/ML/DL Forum of the Fediverse: I've always thought of this community being a gateway to AI/ML/DL that is wide open for anyone who wants to learn it (similar to /r/LocalLLaMA, but outside of Reddit). I think we should strive to keep answering questions, but keep asking them too. We are braving this new frontier with many others we should recognize and help along the way.
  • PROJECT HYPERION: The Game and Story I wrote based on this community is called PROJECT HYPERION. It exists only on my laptop at the moment. This would be played as an augmented reality Discord Roleplay Server where you would interact with simulated characters and residents of a digital sci-fi city called HYPERION. With each act, a magazine series accompanies the world, exploring the fictional story of HYPERION, shared at regular intervals (for free). These digital mentors and story characters would teach you material, concepts, and projects from [email protected] but through the medium that is HYPERION, living their own simulated lives asynchronous from your chats with them. It is blurring the line of education and entertainment, video games and AR/VR. This has been on my mind for awhile, and may become something larger than fosai, whether it starts here or later in the future. If we vote for this concept, know that it would come in the form of our first Discord server and may evolve into other things if it finds traction. HYPERION is kind of like an AR GameInformer, but for AI/ML/DL contextualized to a fictional world based on real science and breakthroughs. I can work on this if you want me to, but I don't want to take the detour if it's not worth the time. I had a lot of fun designing it, and that was enough. But I'd be more than happy to share this little world of mine if you wanted to hear more about it.

Thank you 🌎

Remember, this community is as much yours (all of you subscribed) as it is mine! Speak up if there's something in particular you want to see out of it.

I am already more than happy with our size and what we have covered so far, but I know we will grow as the future unfolds.

I'll leave this pinned for while so you can let me know how you want to shape this future and we'll head that direction together.

Blaed

[–] Blaed 3 points 1 year ago* (last edited 1 year ago) (1 children)

This is really great feedback, thank you for commenting. Don't worry, I am not at all offended. I'm glad you told me, I can absolutely go back to handwriting each post. I honestly prefer to do it that way, sometimes I can't tell what everyone wants to hear (until they tell me) so I try new things. Sometimes experimentations fail, and that's okay!

If there's anything at all you want to see in particular, let me know! I'd be more than happy shedding light on a particular subject. In the meantime, we'll go back to more curated content. It does take a lot of time to write those posts, but it seems its worth the effort. Thanks again for letting me know, I really do appreciate the feedback. Don't hesitate to call me out if you feel I've strayed the path. This community is as much yours (all of you subscribed) as it is mine.

 

How is AI Changing the Audiobook Landscape?

Source: MarkTechPost

AI in Audiobook


What's Happening?

Audiobooks are gaining popularity as they provide a convenient way to consume information. They're not just for avid readers; they're also crucial for those with visual impairments, children, and language learners. However, traditional audiobook production can be time-consuming and costly.


How is AI Helping?

AI, through neural text-to-speech technology, is revolutionizing this space. It's enabling the quick and cost-effective conversion of e-books into high-quality audiobooks.


Learn More

For a deep dive into the technology and its impact, Click Here to Learn More.

 

DeepMind - Building Interactive Agents in Video Game Worlds

Source

Most artificial intelligence (AI) researchers now believe that writing computer code which can capture the nuances of situated interactions is impossible. Alternatively, modern machine learning (ML) researchers have focused on learning about these types of interactions from data.

To explore these learning-based approaches and quickly build agents that can make sense of human instructions and safely perform actions in open-ended conditions, we created a research framework within a video game environment.

Today, we’re publishing a paper and collection of videos, showing our early steps in building video game AIs that can understand fuzzy human concepts – and therefore, can begin to interact with people on their own terms.

Learning in “the playhouse”

Our framework begins with people interacting with other people in the video game world. Using imitation learning, we imbued agents with a broad but unrefined set of behaviours. This "behaviour prior" is crucial for enabling interactions that can be judged by humans. Without this initial imitation phase, agents are entirely random and virtually impossible to interact with. Further human judgement of the agent’s behaviour and optimisation of these judgements by reinforcement learning (RL) produces better agents, which can then be improved again.

We built agents by (1) imitating human-human interactions, and then improving agents though a cycle of (2) human-agent interaction and human feedback, (3) reward model training, and (4) reinforcement learning. First we built a simple video game world based on the concept of a child's “playhouse.” This environment provided a safe setting for humans and agents to interact and made it easy to rapidly collect large volumes of these interaction data. The house featured a variety of rooms, furniture, and objects configured in new arrangements for each interaction. We also created an interface for interaction.

Both the human and agent have an avatar in the game that enables them to move within – and manipulate – the environment. They can also chat with each other in real-time and collaborate on activities, such as carrying objects and handing them to each other, building a tower of blocks, or cleaning a room together. Human participants set the contexts for the interactions by navigating through the world, setting goals, and asking questions for agents. In total, the project collected more than 25 years of real-time interactions between agents and hundreds of (human) participants.

Observing behaviours that emerge The agents we trained are capable of a huge range of tasks, some of which were not anticipated by the researchers who built them. For instance, we discovered that these agents can build rows of objects using two alternating colours or retrieve an object from a house that’s similar to another object the user is holding.

These surprises emerge because language permits a nearly endless set of tasks and questions via the composition of simple meanings. Also, as researchers, we do not specify the details of agent behaviour. Instead, the hundreds of humans who engage in interactions came up with tasks and questions during the course of these interactions.

Click Here to Learn More

20
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

BAIR Shares LMD - The Fusion of GPT-4 and Stable Diffusion

By Long Lian, Boyi Li, Adam Yala, Trevor Darrell


Quick Summary

How does it work?: Text Prompt → Large Language Model (LLM) → Intermediate Representation → Stable Diffusion → Final Image.

The Problem: Existing diffusion models excel at text-to-image synthesis but often fail to accurately capture spatial relationships, negations, numeracy, and attribute assignments in the prompt.

Our Solution: Introducing LLM-grounded Diffusion (LMD), a method that significantly improves prompt understanding in these challenging scenarios.

Visualizations
Figure 1: LMD enhances prompt understanding in text-to-image models.


The Nitty-Gritty

Our Approach

We sidestep the high cost and time investment of training new models by using pretrained Large Language Models (LLMs) and diffusion models in a unique two-step process.

  1. LLM as Layout Generator: An LLM generates a scene layout with bounding boxes and object descriptions based on the prompt.
  2. Diffusion Model Controller: This takes the LLM output and creates images conditioned on the layout.

Both stages use frozen pretrained models, minimizing training costs.
Read the full paper on arXiv

Process Overview
Figure 2: The two-stage process of LMD.

Additional Features

  • Dialog-Based Scene Specification: Enables interactive prompt modifications.
  • Language Support: Capable of processing prompts in languages that aren't natively supported by the underlying diffusion model.

Additional Abilities
Figure 3: LMD's multi-language and dialog-based capabilities.


Why Does This Matter?

We demonstrate LMD's superiority over existing diffusion models, particularly in generating images that accurately match complex text prompts involving language and spatial reasoning.

Performance Comparison
Figure 4: LMD vs Base Diffusion Model.


Further Reading and Citation

For an in-depth understanding, visit the website and read the full paper.

@article{lian2023llmgrounded,
    title={LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models},
    author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor},
    journal={arXiv preprint arXiv:2305.13655},
    year={2023}
}
[–] Blaed 2 points 1 year ago

All is well. My gut instinct was right. I missed the verification email from my (now) decommissioned domain. It has been updated, admins were very helpful in this regard.

This seems like a relevant time to mention that I have plans hosting our own Lemmy instance, geared towards FOSAI, Machine Learning, Deep Learning, open-source education, and other subprojects I’m excited to share with everyone (hopefully) later this year. Time is a fickle thing, so I can’t make any hard commitments until I figure a few things out - just know there is more to fosai than this community. All will be revealed in due time.

Nothing about this community will change. If anything, I see it growing to become a friendly gateway for others who share similar interests and want to dive further into the material.

If you’ve gotten this far, thank you for being a part of [email protected]. It means a lot to me that any of you would give these words attention.

Whatever it is, I hope you find what you’re looking for. Just know I’m only getting started. You are all early to the party. So much more has yet to be explored, so much more has yet to be seen.

The future is now. The future is bright. The future is ________.

Zzzzzzzz

[–] Blaed 1 points 1 year ago

Hi everyone - I'll admit, I wasn't as on top of news this August as I was planning to be. I am going to unpin this and try to figure out a more efficient workflow for September and the coming Fall.

If you're reading this, thank you for finding your way to [email protected]. Don't hesitate to let me know if there's anything you'd like to see more of in particular.

[–] Blaed 4 points 1 year ago* (last edited 1 year ago)

In my opinion writing a paper is good practice no matter the results. It might help you discern more valuable insights from your testing or approach.

In this situation, you have almost nothing to lose! I say go for it. Do both. Start a paper draft now and iterate upon it as you benchmark more results. Often times writing and reflecting on your own research reinforces some of the concepts you're tackling. All the more reason to write something up, even if you don't release it.

If you do end up writing one, be sure to share it here!

[–] Blaed 2 points 1 year ago (1 children)

What is your vim setup for python? I need a better dev setup for python. Pycharm and VS Code have too much BS in the background and I am never letting their infinitely long list of network connections through my whitelist firewall. I started down that path once; never again. I know about VS Codium, and tried it, but all documentation and examples I came across only work with the proprietary junk. Geany is better than that nonsense. I used Thony with Micropython at one point, but didn’t get very far with that. I tried the Gnome Builder IDE recently. It has a vim like exit troll. You can’t save files in Builder and the instructions to enable saving calls for modifying a configuration file on the host while giving absolutely no info about where the file is located. I need a solid IDE that isn’t dicking around in network/telemetry or configured by the bridge troll from Monty Python’s Holy Grail.

I am usually just running the script I'm working on post-editor in whatever command line interface I find myself in. That could be zsh, bash. or something random I found that week. If I have the time, I like to setup zsh, or ohmyzsh depending on my OS, paired with power10k and custom color schemes.

For Windows, I usually set something like this up,

For Mac or Linux (Ubuntu) I like to use vim and/or tmux + rectangle.

As a practice, I often try as many new editors as I can, week by week or month by month. It helps keep me on my toes, but when I'm looking for a stable experience I typically default to VSCode behind my firewall. I feel your pains with the allow listing, but it's the choice if I have something I'm working on and want to take my time on it. Otherwise, I've hopped between some of these. Check them out. Might not be for you, but they're fun to try:


What is a good starting point for practical CLI LLMs. I need something useful more than some toolchain project. I’ve gone down this rabbit hole far too many times with embedded hardware. I like the idea of trying something that is not an AVR, but by the time I get the toolchains setup and deal with all the proprietary bs, I’m already burned out on the project. In this space, I like to play around in the code with some small objective, but that is the full extent of my capabilities most of the time. Like I just spent most of today confused as hell with how a python tempfile works before I realized the temp file creation has its own scope… Not my brightest moment

What sort of toolchain project were you exploring? I'm curious to hear about that. In all honesty, the reason I have so many GitHub Stars is a.) I am a curious person in general and b.) I've been looking for practical and pragmatic use cases for LLMs within my own life too. This has proven to be more difficult than I initially thought given the rapid developments of the space and the many obstacles you have to overcome between design, infrastructure, and model deployment.

That being said, I recently came across Cohere, who have an SDK & API for calling their 'command' models. Unfortunately, the models are proprietary but they have a few projects on GitHub that interesting to explore. In my experience, local LLMs aren't quite at the level of production-grade deployments people expect out of something with the perplexity of ChatGPT-4 (or 3). The tradeoff is data privacy, but the compromise is performance. What I am liking about Cohere is that they focus on bringing the models to you so that data can remain private, with all of the benefits of the API and hosting capabilities of a cloud-based LLM.

For anyone starting a business in AI, being automation agencies or consulting services and integration engineers - I think this is important to consider. At least for enterprise or commercial sectors.

Home projects? Well, that's another story entirely. I'll take the performance tradeoff for running a creative or functional model on my own hardware and network private and 100% local.

A fun project I've been exploring is deploying your own locally hosted inference cloud API, which you could call from any CLI you're developing on if you're connected to your private network. This way, you get an OpenAI-like API you can tinker with, while hot swapping models on your cloud inference platform to test different capabilities.

At this point, you are only limited by the power you can pump into your inference cloud. A colleague of mine has a server of his that has 1TB RAM, 200+ CPU Cores, and x4 GPUs we're working on setting up with passthrough, pooling available VRAM. We're hoping to comfortably run 40B GPTQ or high parameter GGML models using this home server rig.

Assuming you get a private LLM cloud working at home, you can do all sorts of things. You can pass documents through something like Llamaindex or Llangchain, taking personal notes or home information and turning it into semantically available knowledge. This would be available to you on any CLI on your network, maybe through something like LocalAI

These are really big ideas, some that have taken me months to put together and test - but they've been really exciting to see actually work in small ways that feel fun and futuristic. The only problem is that so many of these libraries are changing with rapid development that projects frequently break with a simple change of library or lack of documentation due to a compatibility issue with a vague library that is new and not fully built out and supported.

I don't know if that answers your question(s), but I'm around if you want to ask about anything else!

[–] Blaed 2 points 1 year ago

I will say, if I'm on Reddit, it's for /r/LocalLLaMA. For the most part, everyone there has been very helpful and supportive of the tech. Glad to see parts of it on Lemmy! Thanks for sharing, I'm going to add it to the community page on FOSAI ▲ XYZ.

[–] Blaed 2 points 1 year ago (1 children)

I love the Dune choice. Thanks for posting!

Do you have the time to share your training workflow? That, or any tips for others who might want to try something similar to what you have accomplished here? Curious to hear what you have learned and what you might try different next time.

[–] Blaed 1 points 1 year ago

I'll be leaving this up for the duration of August! Feel free to use the comments here to ask away any question not already addressed.

[–] Blaed 5 points 1 year ago (1 children)

This is really interesting, thanks for sharing!

For our hardware heads, do you think you could share a spec sheet of your device?

Love seeing 70B parameter models being ran at home! Imagine where we’ll be a year from now..

[–] Blaed 3 points 1 year ago

p.s. Feel free to treat this as a living post and share in these comments news that you find throughout this month. I will be updating this for the entirety of August until we cross into the next monthly news cycle in September.

view more: ‹ prev next ›