HyperTech News Report #0002 - A New Challenger Approaches! (self.fosai)

submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai

1 comments fedilink

🤖 Happy FOSAI Friday! 🚀

Friday, September 29, 2023

HyperTech News Report #0002

Hello Everyone!

Welcome back to the HyperTech News Report! This week we're seeing some really exciting developments in futuristic technologies. With more tools and methods releasing by the day, I feel we're in for a renaissance in software. I hope hardware is soon to follow.. but I am here for it! So are you. Brace yourselves. Change is coming! This next year will be very interesting to watch unfold.

Community Changelog

Cleaned up some old content (let me know if you notice something that should be archived or updated)

Image of the Week

This image of the week comes from a DALL-E 3 demonstration by Will Depue. This depicts a popular image for diffusion models benchmarks - the astronaut riding a horse in space. Apparently this was hard to get right, and others have had trouble replicating it - but it seems to have been generated by DALL-E 3 nevertheless. Curious to see how it stacks up against other diffusers when its more widely available.

New Foundation Model!

There have been many new models hitting HuggingFace on the daily. The recent influx has made it hard to benchmark and keep up with these models - so I will be highlighting a hand select curated week-by-week, exploring these with more focus (a few at a time).

If you have any model favorites (or showcase suggestions) let me know what they are in the comments below and I'll add them to the growing catalog!

This week we're taking a look at Mistral - a new foundation model with a sliding attention mechanism that gives it advantages over other models. Better yet - the mistral.ai team released this new model under the Apache 2.0 license. Massive shoutout to this team, this is huge for anyone who wants more options (commercially) outside of Llama 2 and Falcon families.

From Mistralai:

The best 7B, Apache 2.0.. Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases. Mistral 7B is better than Llama 2 13B on all benchmarks, has natural coding abilities, and 8k sequence length. It’s released under Apache 2.0 licence, and we made it easy to deploy on any cloud.

Learn More

Mistralai

TheBloke (Quantized)

https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPT

More About GPTQ

https://github.com/ggerganov/llama.cpp/pull/1827

More About GGUF

https://github.com/ggerganov/llama.cpp/pull/2398#issuecomment-1682837610

Metaverse Developments

Mark Zuckerberg had his third round interview on the Lex Fridman podcast - but this time, in the updated Metaverse. This is pretty wild. We seem to have officially left uncanny valley territory. There are still clearly bugs and improvements to be made - but imagine the possibilities of this mixed reality technology (paired with VR LLM applications).

The type of experiences we can begin to explore in these digital realms are going to evolve into things of true sci-fi in our near future. This is all very exciting stuff to look forward to as AI proliferates markets and drives innovation.

What do you think? Zuck looks more human in the metaverse than in real life.. mission.. success?

Click here for the podcast episode.

NVIDIA NeMo Guardrails

If you haven't heard about NeMo Guardrails, you should check it out. It is a new library and approach for aligning models and completing functions for LLMs. It is similar to LangChain and LlamaIndex, but uses an in-house developed language from NVIDIA called 'colang' for configuration, with NeMo Guardrail libraries in python friendly syntax.

This is still a new and unexplored tool, but could provide some interesting results with some creative applications. It is also particularly powerful if you need to align enterprise LLMs for clients or stakeholders.

Learn More

Tutorial Highlights

Mistral 7B - Small But Mighty 🚀 🚀

https://www.youtube.com/watch?v=z4wPiallZcI&ab_channel=PromptEngineering

Chatbots with RAG: LangChain Full Walkthrough

https://www.youtube.com/watch?v=LhnCsygAvzY&ab_channel=JamesBriggs

NVIDIA NeMo Guardrails: Full Walkthrough for Chatbots / AI

https://www.youtube.com/watch?v=SwqusllMCnE&t=1s&ab_channel=JamesBriggs

Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to [email protected] where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

`Blaed`

11

What are some of your favorite models? (self.fosai)

submitted 1 year ago by Blaed to c/fosai

6 comments fedilink

5

HyperTech Preview (self.fosai)

submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai

2 comments fedilink

Good Morning Everyone!

This week is going to be an exciting one! There is a lot happening in the background that I wish I could share. Instead of spoiling any surprises, allow me to leave you with a preview of what I want to accomplish with HyperionTechnologies (a.k.a. HyperTech or HYPERION).

HyperTech Models

I have wanted to fine-tune and deploy a large language model since interacting with ChatGPT when it first released. I intend to fulfill this promise to myself before the end of the year.

I'm also not one to wait! After some hours hacking away late last night, I kicked off fine-tuning a model on an A100 and V100. Both failed halfway through their training (runtime timed out). I will be kicking off another attempt later this week! I will plan to do a full post on how that goes when it happens.

I am using a basic sharded Llama-7B model for this first practice run, mostly for experimenting with this a new process that I'm adapting into my HyperTech Workshop flow. Once I figure it out and reverse engineer what I need, expect more fine-tunings from HyperTech!

HyperTech Resources

Fun fact - everything HyperTech is really just personal workflows branded under the thematic.

If I am consistently engaging with something helpful to me, I will convert it into a workflow and often open-source it through HyperTech. From processes to templates, training workflows or datasets. Whatever muse slots itself into the grand vision that is HYPERION.

One of these resources is available to you today, which is simply a GitHub repo template called the HyperTech Workshop - a file structure template tailored to generative AI/ML/DL developers and enthusiasts. It has many notebooks in .ipynb and jupyter friendly formats (among other links and tools strewn about the digital workspace).

Know that there are many more resources on the horizon! These are few of many to come.

HyperTech News Reports

You already saw this last Friday, but here's the link in case you missed the latest news report!

These will continue to be handwritten and journaled at the regular!

HyperTech Projects

Fine-tune, deploy, and open-source a new model series (Llama, Falcon, etc.)
Build an integrated workshop workflow
Build a custom ubuntu distro.
Build a custom dataset.
REDACTED
REDACTED
REDACTED

HyperTech Ethos

I am going to make another post about this later, but I want to assure everyone this new HyperTech project is borderline satire. I am not looking to turn anyone here into a product.

This company is a highly experimental sci-fi R&D project that I legitimately started for fun (and the future). Don't take any of it too seriously.

If you find any of my resources fun, interesting, curious or helpful - I've already succeeded in my mission.

Thank you!

I appreciate you reading this post!

I hope you have a great rest of your day.

Blaed

p.s. You're doing a great job.

8

HyperTech News Report #0001 - Happy FOSAI Friday! (self.artificial_intel)

submitted 1 year ago by Blaed to c/[email protected]

0 comments fedilink

cross-posted from: https://lemmy.world/post/5549499

🤖 Happy FOSAI Friday! 🚀

Friday, September 22, 2023

HyperTech News Report #0001

Hello Everyone!

This series is a new vehicle for [email protected] news reports. In these posts I'll go over projects or news I stumble across week-over-week. I will try to keep Fridays consistent with this series, covering most of what I have been (but at regular cadence). For this week, I am going to do my best catching us up on a few old (and new) hot topics you may or may not have heard about already.

Table of Contents

Introducing HyperTech

New GGUF Models

Falcon 180B

Llama 3 Rumors

DALM RAG Toolkit

DALL-E 3

Community Changelog

Updated all resources on FOSAI ▲ XYZ.

Added new content to FOSAI ▲ XYZ.

Added new content and resources to the [email protected] sidebar.

Added HyperTech to [email protected], reflecting personal workflows and processes.

All changes should be visible within the next 48 hours.

Image of the Week

A Stable Diffusion + ControlNet image garnered a ton of attention on social media this last week. This image has brought more recognition to the possibilities of these tools and helps shed a more positive light on the capabilities of generative models.

Read More

Introducing HyperTech

HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

HyperTech Workshop (V0.1.0)

I am excited to announce my technology company: HyperTech. The first project of HyperionTechnologies is a digital workshop that comes in the form of a GitHub repo template for AI/ML/DL developers. HyperTech is a for-fun sci-fi company I started to explore AI development (among other emerging technologies I find curious and interesting). It is a satire corpo sandbox I have designed around my personal journey inside and outside of [email protected] with highly experimental projects and workflows. I will be using this company and setting/narrative/thematic to drive some of the future (and totally optional) content of our community. Any tooling, templates, or examples made along the way are entirely for you to learn from or reverse engineer for your own purpose or amusement. I'll be doing a dedicated post to HyperTech later this weekend. Keep your eye out for that if you're curious. The future is now. The future is bright. The future is HYPERION. (don't take this project too seriously).

New GGUF Models

Within this last month or so, llama.cpp have begun to standardize a new model format - the .GGUF model - which is much more optimized than its now legacy (and deprecated predecessor - GGML). This is a big deal for anyone running GGML models. GGUF is basically superior in all ways. Check out llama.cpp's notes about this change on their official GitHub. I have used a few GGUF models myself and have found them much more performant than any GGML counterpart. TheBloke has already converted many of his older models into this new format (which is compatible with anything utilizing llama.cpp).

More About GGUF:

It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models. Basically: 1.) No more breaking changes 2.) Support for non-llama models. (falcon, rwkv, bloom, etc.) and 3.) No more fiddling around with rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps. Prompt formats could also be set automatically.

Falcon 180B

Many of you have probably already heard of this, but Falcon 180B was recently announced - and I haven't covered it here yet so it's worth mentioning in this post. Check out the full article regarding its release here on HuggingFace. Can't wait to see what comes next! This will open up a lot of doors for us to explore.

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model. The dataset for Falcon 180B consists predominantly of web data from RefinedWeb (~85%). In addition, it has been trained on a mix of curated data such as conversations, technical papers, and a small fraction of code (~3%). This pretraining dataset is big enough that even 3.5 trillion tokens constitute less than an epoch.

The released chat model is fine-tuned on chat and instruction datasets with a mix of several large-scale conversational datasets.

‼️ Commercial Usage: Falcon 180b can be commercially used but under very restrictive conditions, excluding any "hosting use". We recommend to check the license and consult your legal team if you are interested in using it for commercial purposes.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

LLama 3 Rumors

Speaking of big open-source models - Llama 3 is rumored to be under training or development. Llama 2 was clearly an improvement over its predecessor. I wonder how Llama 3 & 4 will stack in this race to AGI. I forget that we're still early to this party. At this rate of development, I believe we're bound to see it within the decade.

Meta plans to rival GPT-4 with a rumored free Llama 3- According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license.- Jason Wei, an engineer associated with OpenAI, has indicated that Meta possesses the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.- Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans.

DALM

I recently stumbled across DALM - a new domain adapted language modeling toolkit which is supposed to enable a workflow that trains a retrieval augmented generation (RAG) pipeline from end-to-end. According to their results, the DALM specific training process leads to a much higher response quality when it comes to retrieval augmented generation. I haven't had a chance to tinker with this a lot, but I'd keep an eye on it if you're engaging with RAG workflows.

DALM Manifesto:

A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview.

For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

DALL-E 3

OpenAI announced DALL-E 3 that will have direct native compatibility within ChatGPT. This means users should be able to naturally and semantically iterate over images and features over time, adjusting the output from the same chat interface throughout their conversation. This will enable many users to seamlessly incorporate image diffusion into their chat workflows.

I think this is huge, mostly because it illustrates a new technique that removes some of the barriers that prompt engineers have to solve (it reads prompts differently than other diffusers). Not to mention you are permitted to sell, keep, and commercialize any image DALL-E generates.

I am curious to see if open-source workflows can follow a similar approach and have iterative design workflows that seamlessly integrate with a chat interface. That, paired with manual tooling from things like ControlNet would be a powerful pairing that could spark a lot of creativity. Don't get me wrong, sometimes I really like manual and node-based workflows, but I believe semantic computation is the future. Regardless of how 'open' OpenAI truly is, these breakthroughs help chart the path forward for everyone else still catching up.

More About DALL-E 3:

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall. Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.

DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October. As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don't need our permission to reprint, sell or merchandise them.

Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to [email protected] where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

12

HyperTech News Report #0001 - Happy FOSAI Friday! (self.technology)

submitted 1 year ago by Blaed to c/[email protected]

0 comments fedilink

cross-posted from: https://lemmy.world/post/5549499

🤖 Happy FOSAI Friday! 🚀

Friday, September 22, 2023

HyperTech News Report #0001

Hello Everyone!

This series is a new vehicle for [email protected] news reports. In these posts I'll go over projects or news I stumble across week-over-week. I will try to keep Fridays consistent with this series, covering most of what I have been (but at regular cadence). For this week, I am going to do my best catching us up on a few old (and new) hot topics you may or may not have heard about already.

Table of Contents

Introducing HyperTech

New GGUF Models

Falcon 180B

Llama 3 Rumors

DALM RAG Toolkit

DALL-E 3

Community Changelog

Updated all resources on FOSAI ▲ XYZ.

Added new content to FOSAI ▲ XYZ.

Added new content and resources to the [email protected] sidebar.

Added HyperTech to [email protected], reflecting personal workflows and processes.

All changes should be visible within the next 48 hours.

Image of the Week

A Stable Diffusion + ControlNet image garnered a ton of attention on social media this last week. This image has brought more recognition to the possibilities of these tools and helps shed a more positive light on the capabilities of generative models.

Read More

Introducing HyperTech

HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

HyperTech Workshop (V0.1.0)

I am excited to announce my technology company: HyperTech. The first project of HyperionTechnologies is a digital workshop that comes in the form of a GitHub repo template for AI/ML/DL developers. HyperTech is a for-fun sci-fi company I started to explore AI development (among other emerging technologies I find curious and interesting). It is a satire corpo sandbox I have designed around my personal journey inside and outside of [email protected] with highly experimental projects and workflows. I will be using this company and setting/narrative/thematic to drive some of the future (and totally optional) content of our community. Any tooling, templates, or examples made along the way are entirely for you to learn from or reverse engineer for your own purpose or amusement. I'll be doing a dedicated post to HyperTech later this weekend. Keep your eye out for that if you're curious. The future is now. The future is bright. The future is HYPERION. (don't take this project too seriously).

New GGUF Models

Within this last month or so, llama.cpp have begun to standardize a new model format - the .GGUF model - which is much more optimized than its now legacy (and deprecated predecessor - GGML). This is a big deal for anyone running GGML models. GGUF is basically superior in all ways. Check out llama.cpp's notes about this change on their official GitHub. I have used a few GGUF models myself and have found them much more performant than any GGML counterpart. TheBloke has already converted many of his older models into this new format (which is compatible with anything utilizing llama.cpp).

More About GGUF:

It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models. Basically: 1.) No more breaking changes 2.) Support for non-llama models. (falcon, rwkv, bloom, etc.) and 3.) No more fiddling around with rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps. Prompt formats could also be set automatically.

Falcon 180B

Many of you have probably already heard of this, but Falcon 180B was recently announced - and I haven't covered it here yet so it's worth mentioning in this post. Check out the full article regarding its release here on HuggingFace. Can't wait to see what comes next! This will open up a lot of doors for us to explore.

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model. The dataset for Falcon 180B consists predominantly of web data from RefinedWeb (~85%). In addition, it has been trained on a mix of curated data such as conversations, technical papers, and a small fraction of code (~3%). This pretraining dataset is big enough that even 3.5 trillion tokens constitute less than an epoch.

The released chat model is fine-tuned on chat and instruction datasets with a mix of several large-scale conversational datasets.

‼️ Commercial Usage: Falcon 180b can be commercially used but under very restrictive conditions, excluding any "hosting use". We recommend to check the license and consult your legal team if you are interested in using it for commercial purposes.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

LLama 3 Rumors

Speaking of big open-source models - Llama 3 is rumored to be under training or development. Llama 2 was clearly an improvement over its predecessor. I wonder how Llama 3 & 4 will stack in this race to AGI. I forget that we're still early to this party. At this rate of development, I believe we're bound to see it within the decade.

Meta plans to rival GPT-4 with a rumored free Llama 3- According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license.- Jason Wei, an engineer associated with OpenAI, has indicated that Meta possesses the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.- Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans.

DALM

I recently stumbled across DALM - a new domain adapted language modeling toolkit which is supposed to enable a workflow that trains a retrieval augmented generation (RAG) pipeline from end-to-end. According to their results, the DALM specific training process leads to a much higher response quality when it comes to retrieval augmented generation. I haven't had a chance to tinker with this a lot, but I'd keep an eye on it if you're engaging with RAG workflows.

DALM Manifesto:

A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview.

For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

DALL-E 3

OpenAI announced DALL-E 3 that will have direct native compatibility within ChatGPT. This means users should be able to naturally and semantically iterate over images and features over time, adjusting the output from the same chat interface throughout their conversation. This will enable many users to seamlessly incorporate image diffusion into their chat workflows.

I think this is huge, mostly because it illustrates a new technique that removes some of the barriers that prompt engineers have to solve (it reads prompts differently than other diffusers). Not to mention you are permitted to sell, keep, and commercialize any image DALL-E generates.

I am curious to see if open-source workflows can follow a similar approach and have iterative design workflows that seamlessly integrate with a chat interface. That, paired with manual tooling from things like ControlNet would be a powerful pairing that could spark a lot of creativity. Don't get me wrong, sometimes I really like manual and node-based workflows, but I believe semantic computation is the future. Regardless of how 'open' OpenAI truly is, these breakthroughs help chart the path forward for everyone else still catching up.

More About DALL-E 3:

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall. Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.

DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October. As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don't need our permission to reprint, sell or merchandise them.

Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to [email protected] where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

15

HyperTech News Report #0001 - Happy FOSAI Friday! (self.technology)

submitted 1 year ago by Blaed to c/technology

0 comments fedilink

cross-posted from: https://lemmy.world/post/5549499

🤖 Happy FOSAI Friday! 🚀

Friday, September 22, 2023

HyperTech News Report #0001

Hello Everyone!

This series is a new vehicle for [email protected] news reports. In these posts I'll go over projects or news I stumble across week-over-week. I will try to keep Fridays consistent with this series, covering most of what I have been (but at regular cadence). For this week, I am going to do my best catching us up on a few old (and new) hot topics you may or may not have heard about already.

Table of Contents

Introducing HyperTech

New GGUF Models

Falcon 180B

Llama 3 Rumors

DALM RAG Toolkit

DALL-E 3

Community Changelog

Updated all resources on FOSAI ▲ XYZ.

Added new content to FOSAI ▲ XYZ.

Added new content and resources to the [email protected] sidebar.

Added HyperTech to [email protected], reflecting personal workflows and processes.

All changes should be visible within the next 48 hours.

Image of the Week

A Stable Diffusion + ControlNet image garnered a ton of attention on social media this last week. This image has brought more recognition to the possibilities of these tools and helps shed a more positive light on the capabilities of generative models.

Read More

Introducing HyperTech

HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

HyperTech Workshop (V0.1.0)

I am excited to announce my technology company: HyperTech. The first project of HyperionTechnologies is a digital workshop that comes in the form of a GitHub repo template for AI/ML/DL developers. HyperTech is a for-fun sci-fi company I started to explore AI development (among other emerging technologies I find curious and interesting). It is a satire corpo sandbox I have designed around my personal journey inside and outside of [email protected] with highly experimental projects and workflows. I will be using this company and setting/narrative/thematic to drive some of the future (and totally optional) content of our community. Any tooling, templates, or examples made along the way are entirely for you to learn from or reverse engineer for your own purpose or amusement. I'll be doing a dedicated post to HyperTech later this weekend. Keep your eye out for that if you're curious. The future is now. The future is bright. The future is HYPERION. (don't take this project too seriously).

New GGUF Models

Within this last month or so, llama.cpp have begun to standardize a new model format - the .GGUF model - which is much more optimized than its now legacy (and deprecated predecessor - GGML). This is a big deal for anyone running GGML models. GGUF is basically superior in all ways. Check out llama.cpp's notes about this change on their official GitHub. I have used a few GGUF models myself and have found them much more performant than any GGML counterpart. TheBloke has already converted many of his older models into this new format (which is compatible with anything utilizing llama.cpp).

More About GGUF:

It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models. Basically: 1.) No more breaking changes 2.) Support for non-llama models. (falcon, rwkv, bloom, etc.) and 3.) No more fiddling around with rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps. Prompt formats could also be set automatically.

Falcon 180B

Many of you have probably already heard of this, but Falcon 180B was recently announced - and I haven't covered it here yet so it's worth mentioning in this post. Check out the full article regarding its release here on HuggingFace. Can't wait to see what comes next! This will open up a lot of doors for us to explore.

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model. The dataset for Falcon 180B consists predominantly of web data from RefinedWeb (~85%). In addition, it has been trained on a mix of curated data such as conversations, technical papers, and a small fraction of code (~3%). This pretraining dataset is big enough that even 3.5 trillion tokens constitute less than an epoch.

The released chat model is fine-tuned on chat and instruction datasets with a mix of several large-scale conversational datasets.

‼️ Commercial Usage: Falcon 180b can be commercially used but under very restrictive conditions, excluding any "hosting use". We recommend to check the license and consult your legal team if you are interested in using it for commercial purposes.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

LLama 3 Rumors

Speaking of big open-source models - Llama 3 is rumored to be under training or development. Llama 2 was clearly an improvement over its predecessor. I wonder how Llama 3 & 4 will stack in this race to AGI. I forget that we're still early to this party. At this rate of development, I believe we're bound to see it within the decade.

Meta plans to rival GPT-4 with a rumored free Llama 3- According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license.- Jason Wei, an engineer associated with OpenAI, has indicated that Meta possesses the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.- Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans.

DALM

I recently stumbled across DALM - a new domain adapted language modeling toolkit which is supposed to enable a workflow that trains a retrieval augmented generation (RAG) pipeline from end-to-end. According to their results, the DALM specific training process leads to a much higher response quality when it comes to retrieval augmented generation. I haven't had a chance to tinker with this a lot, but I'd keep an eye on it if you're engaging with RAG workflows.

DALM Manifesto:

A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview.

For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

DALL-E 3

OpenAI announced DALL-E 3 that will have direct native compatibility within ChatGPT. This means users should be able to naturally and semantically iterate over images and features over time, adjusting the output from the same chat interface throughout their conversation. This will enable many users to seamlessly incorporate image diffusion into their chat workflows.

I think this is huge, mostly because it illustrates a new technique that removes some of the barriers that prompt engineers have to solve (it reads prompts differently than other diffusers). Not to mention you are permitted to sell, keep, and commercialize any image DALL-E generates.

I am curious to see if open-source workflows can follow a similar approach and have iterative design workflows that seamlessly integrate with a chat interface. That, paired with manual tooling from things like ControlNet would be a powerful pairing that could spark a lot of creativity. Don't get me wrong, sometimes I really like manual and node-based workflows, but I believe semantic computation is the future. Regardless of how 'open' OpenAI truly is, these breakthroughs help chart the path forward for everyone else still catching up.

More About DALL-E 3:

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall. Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.

DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October. As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don't need our permission to reprint, sell or merchandise them.

Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to [email protected] where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

11

HyperTech News Report #0001 - Happy FOSAI Friday! (self.fosai)

submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai

2 comments fedilink

🤖 Happy FOSAI Friday! 🚀

Friday, September 22, 2023

HyperTech News Report #0001

Hello Everyone!

This series is a new vehicle for [email protected] news reports. In these posts I'll go over projects or news I stumble across week-over-week. I will try to keep Fridays consistent with this series, covering most of what I have been (but at regular cadence). For this week, I am going to do my best catching us up on a few old (and new) hot topics you may or may not have heard about already.

Community Changelog

Updated all resources on FOSAI ▲ XYZ.
Added new content to FOSAI ▲ XYZ.
Added new content and resources to the [email protected] sidebar.
Added HyperTech to [email protected], reflecting personal workflows and processes.
All changes should be visible within the next 48 hours.

Image of the Week

A Stable Diffusion + ControlNet image garnered a ton of attention on social media this last week. This image has brought more recognition to the possibilities of these tools and helps shed a more positive light on the capabilities of generative models.

Introducing HyperTech

HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

HyperTech Workshop (V0.1.0)

I am excited to announce my technology company: HyperTech. The first project of HyperionTechnologies is a digital workshop that comes in the form of a GitHub repo template for AI/ML/DL developers. HyperTech is a for-fun sci-fi company I started to explore AI development (among other emerging technologies I find curious and interesting). It is a satire corpo sandbox I have designed around my personal journey inside and outside of [email protected] with highly experimental projects and workflows. I will be using this company and setting/narrative/thematic to drive some of the future (and totally optional) content of our community. Any tooling, templates, or examples made along the way are entirely for you to learn from or reverse engineer for your own purpose or amusement. I'll be doing a dedicated post to HyperTech later this weekend. Keep your eye out for that if you're curious. The future is now. The future is bright. The future is HYPERION. (don't take this project too seriously).

New GGUF Models

Within this last month or so, llama.cpp have begun to standardize a new model format - the .GGUF model - which is much more optimized than its now legacy (and deprecated predecessor - GGML). This is a big deal for anyone running GGML models. GGUF is basically superior in all ways. Check out llama.cpp's notes about this change on their official GitHub. I have used a few GGUF models myself and have found them much more performant than any GGML counterpart. TheBloke has already converted many of his older models into this new format (which is compatible with anything utilizing llama.cpp).

More About GGUF:

It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models. Basically: 1.) No more breaking changes 2.) Support for non-llama models. (falcon, rwkv, bloom, etc.) and 3.) No more fiddling around with rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps. Prompt formats could also be set automatically.

Falcon 180B

Many of you have probably already heard of this, but Falcon 180B was recently announced - and I haven't covered it here yet so it's worth mentioning in this post. Check out the full article regarding its release here on HuggingFace. Can't wait to see what comes next! This will open up a lot of doors for us to explore.

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model. The dataset for Falcon 180B consists predominantly of web data from RefinedWeb (~85%). In addition, it has been trained on a mix of curated data such as conversations, technical papers, and a small fraction of code (~3%). This pretraining dataset is big enough that even 3.5 trillion tokens constitute less than an epoch.

The released chat model is fine-tuned on chat and instruction datasets with a mix of several large-scale conversational datasets.

‼️ Commercial Usage: Falcon 180b can be commercially used but under very restrictive conditions, excluding any "hosting use". We recommend to check the license and consult your legal team if you are interested in using it for commercial purposes.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

LLama 3 Rumors

Speaking of big open-source models - Llama 3 is rumored to be under training or development. Llama 2 was clearly an improvement over its predecessor. I wonder how Llama 3 & 4 will stack in this race to AGI. I forget that we're still early to this party. At this rate of development, I believe we're bound to see it within the decade.

Meta plans to rival GPT-4 with a rumored free Llama 3- According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license.- Jason Wei, an engineer associated with OpenAI, has indicated that Meta possesses the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.- Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans.

DALM

I recently stumbled across DALM - a new domain adapted language modeling toolkit which is supposed to enable a workflow that trains a retrieval augmented generation (RAG) pipeline from end-to-end. According to their results, the DALM specific training process leads to a much higher response quality when it comes to retrieval augmented generation. I haven't had a chance to tinker with this a lot, but I'd keep an eye on it if you're engaging with RAG workflows.

DALM Manifesto:

A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview.

For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

DALL-E 3

OpenAI announced DALL-E 3 that will have direct native compatibility within ChatGPT. This means users should be able to naturally and semantically iterate over images and features over time, adjusting the output from the same chat interface throughout their conversation. This will enable many users to seamlessly incorporate image diffusion into their chat workflows.

I think this is huge, mostly because it illustrates a new technique that removes some of the barriers that prompt engineers have to solve (it reads prompts differently than other diffusers). Not to mention you are permitted to sell, keep, and commercialize any image DALL-E generates.

I am curious to see if open-source workflows can follow a similar approach and have iterative design workflows that seamlessly integrate with a chat interface. That, paired with manual tooling from things like ControlNet would be a powerful pairing that could spark a lot of creativity. Don't get me wrong, sometimes I really like manual and node-based workflows, but I believe semantic computation is the future. Regardless of how 'open' OpenAI truly is, these breakthroughs help chart the path forward for everyone else still catching up.

More About DALL-E 3:

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall. Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.

DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October. As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don't need our permission to reprint, sell or merchandise them.

Author's Note

This post was authored by the moderator of [email protected] - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to [email protected] where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

`Blaed`

23

LM Studio - A new tool to discover, download, and run local LLMs (self.fosai)

submitted 1 year ago by Blaed to c/fosai

2 comments fedilink

Hey everyone!

I don't think I've shared this one before, so allow me to introduce you to 'LM Studio' - a new application that is tailored to LLM developers and enthusiasts.

Check it out!

With LM Studio, you can ...

🤖 - Run LLMs on your laptop, entirely offline

👾 - Use models through the in-app Chat UI or an OpenAI compatible local server

📂 - Download any compatible model files from HuggingFace 🤗 repositories

🔭 - Discover new & noteworthy LLMs in the app's home page LM Studio supports any ggml Llama, MPT, and StarCoder model on Hugging Face (Llama 2, Orca, Vicuna, Nous Hermes, WizardCoder, MPT, etc.)

Minimum requirements: M1/M2 Mac, or a Windows PC with a processor that supports AVX2. Linux is under development.

Made possible thanks to the llama.cpp project.

We are expanding our team. See our careers page.

Love seeing these new tools come out! Especially with the new gguf format being widely adopted.

The regularly updated and curated list of new LLM releases they provide through this platform is enough for me to keep it installed.

I'll be tinkering plenty when I have the time this week. I'll be sure to let everyone know how it goes! In the meantime, if you do end up giving LM Studio a try - let us know your thoughts and experience with it in the comments below.

What do you want to see next? in c/fosai

[–] Blaed 1 points 1 year ago* (last edited 1 year ago)

This is all very good feedback! I appreciate everyone who has commented so far. I will leave this post pinned for the remainder of the year for anyone (new member or old) to share their thoughts and what else they (you) think we should explore next with [email protected].

12

What do you want to see next? (self.fosai)

submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai

4 comments fedilink

Hello everyone!

I have finally found some time for myself, and I felt it was right to commit some of it here. These last few months have been a wild ride for all of us, I'm sure. I wanted to start this post to regroup and rekindle our interests to see wherever they may take us next.

So, what do you want to see out of our community in the near future? I am opening up the floor for you to leave your comment below and let me know how we can make things a little more interesting here.

[email protected]

📰 Community Feature #1 ✅

AI/ML/DL News / Papers / Projects

Where it all began. I’ll continue covering a wide breadth of stuff I find interesting in more curated posts, including links to any cool papers, projects, or news I find curious.

🎓 Community Feature #2 ✅

Open Source Learning Resources

I have a few updates coming to https://fosai.xyz soon. I’m thinking about expanding it in other directions, let me know if there’s something on this site you want me to explore or expand upon. Everything on there is also consolidated as resources on our sidebar.

📚 Community Feature #3 ✅

Collaborative Learning

When someone new posts a question or needs help - a lot of you are quick to jump in and respond before I get a chance to see it. I think that’s awesome (that so many of you are willing to reply to questions so fast). Big thanks to everyone who engages with those posts. I really appreciate the collaborative support! I don’t have all the time in the world, so those who can help guide others into this wonderful would of AI are encouraged to continue answering questions and helping each other. It means a lot to me (and the person asking the question!).

🕹️ Community Feature #4 ☑️

Games and Stories ❔

Anyone care about games or stories made out of / built on AI stacks? I can share a few projects that might be fun to explore, more custom to exploring a general sense of how AI may evolve in games, story, and entertainment. I could start a casual series building a simulated narrative contextual to [email protected] and we can study its evolution over time together. I have a narrative and setting for this (PROJECT HYPERION), but I have no idea if anyone would be interested in reading this series. It would be delivered like an augmented reality magazine that teaches you concepts from this site in a Discord Server based around these simulated game world rules. This was a highly experimental project I started, but shelved. I can dust this off and share it here if there’s enough intrigue. Read more about PROJECT HYPERION in the notes at the bottom of this post.

🔬 Community Feature #5 ☑️

Random Prototypes❔

I tinker a lot, many times with incoherent prototypes. At this point, I think it might be worth exploring specific use cases and spending more time prototyping concepts you all vote on? In this series, you could ask me to prototype something (or explore the possibilities of achieving some experimental goal) we can benchmark and measure together post-by-post. These random prototypes would be random projects that we explore purely for fun or particular business ideas/applications. They could be example implementations or satire representations of projects built to help learn with a more hands-on approach.

🧪 Community Feature #6 ☑️

Benchmarks❔

I have considered doing a benchmark series, but this would take a bit of setup for the best results. At least for me to do it consistently the way I’d want… That being said, I could spend time kicking this series off too (if there was enough interest). Here I would try different LLMs across a consistent bench we devise as a community, whether local or a cloud-based environment. From there, we can explore any size quantization, parameter, model architecture, or implementation approach and see how they stack up side-by-side.

Got an idea❔Vote for it below! ☑️

Do you have an idea? Something you want to see out of [email protected]? Share your series or vote on your favorite idea in the comments below!

Other Notes & Food for Thought 📓

Growth: I'm not looking to rapidly scale or grow, this community should remain holistic and engaging.
Quality Over Quantity: Recent feedback has made me reconsider how to share updates. Going forward, all content will be handwritten and hand reported (as it was). I will be focusing on quality over quantity. Shoutout to [email protected] for the honest feedback.
Best AI/ML/DL Forum of the Fediverse: I've always thought of this community being a gateway to AI/ML/DL that is wide open for anyone who wants to learn it (similar to /r/LocalLLaMA, but outside of Reddit). I think we should strive to keep answering questions, but keep asking them too. We are braving this new frontier with many others we should recognize and help along the way.
PROJECT HYPERION: The Game and Story I wrote based on this community is called PROJECT HYPERION. It exists only on my laptop at the moment. This would be played as an augmented reality Discord Roleplay Server where you would interact with simulated characters and residents of a digital sci-fi city called HYPERION. With each act, a magazine series accompanies the world, exploring the fictional story of HYPERION, shared at regular intervals (for free). These digital mentors and story characters would teach you material, concepts, and projects from [email protected] but through the medium that is HYPERION, living their own simulated lives asynchronous from your chats with them. It is blurring the line of education and entertainment, video games and AR/VR. This has been on my mind for awhile, and may become something larger than fosai, whether it starts here or later in the future. If we vote for this concept, know that it would come in the form of our first Discord server and may evolve into other things if it finds traction. HYPERION is kind of like an AR GameInformer, but for AI/ML/DL contextualized to a fictional world based on real science and breakthroughs. I can work on this if you want me to, but I don't want to take the detour if it's not worth the time. I had a lot of fun designing it, and that was enough. But I'd be more than happy to share this little world of mine if you wanted to hear more about it.

Thank you 🌎

Remember, this community is as much yours (all of you subscribed) as it is mine! Speak up if there's something in particular you want to see out of it.

I am already more than happy with our size and what we have covered so far, but I know we will grow as the future unfolds.

I'll leave this pinned for while so you can let me know how you want to shape this future and we'll head that direction together.

Blaed

Latest News in Machine Learning - Analytics Vidhya Edition in c/fosai

[–] Blaed 3 points 1 year ago* (last edited 1 year ago) (1 children)

This is really great feedback, thank you for commenting. Don't worry, I am not at all offended. I'm glad you told me, I can absolutely go back to handwriting each post. I honestly prefer to do it that way, sometimes I can't tell what everyone wants to hear (until they tell me) so I try new things. Sometimes experimentations fail, and that's okay!

If there's anything at all you want to see in particular, let me know! I'd be more than happy shedding light on a particular subject. In the meantime, we'll go back to more curated content. It does take a lot of time to write those posts, but it seems its worth the effort. Thanks again for letting me know, I really do appreciate the feedback. Don't hesitate to call me out if you feel I've strayed the path. This community is as much yours (all of you subscribed) as it is mine.

9

How is AI Changing the Audiobook Landscape? (self.fosai)

submitted 1 year ago by Blaed to c/fosai

1 comments fedilink

How is AI Changing the Audiobook Landscape?

Source: MarkTechPost

AI in Audiobook

What's Happening?

Audiobooks are gaining popularity as they provide a convenient way to consume information. They're not just for avid readers; they're also crucial for those with visual impairments, children, and language learners. However, traditional audiobook production can be time-consuming and costly.

How is AI Helping?

AI, through neural text-to-speech technology, is revolutionizing this space. It's enabling the quick and cost-effective conversion of e-books into high-quality audiobooks.

Learn More

For a deep dive into the technology and its impact, Click Here to Learn More.

7

DeepMind - Building Interactive Agents in Video Game Worlds (self.fosai)

submitted 1 year ago by Blaed to c/fosai

0 comments fedilink

DeepMind - Building Interactive Agents in Video Game Worlds

Source

Most artificial intelligence (AI) researchers now believe that writing computer code which can capture the nuances of situated interactions is impossible. Alternatively, modern machine learning (ML) researchers have focused on learning about these types of interactions from data.

To explore these learning-based approaches and quickly build agents that can make sense of human instructions and safely perform actions in open-ended conditions, we created a research framework within a video game environment.

Today, we’re publishing a paper and collection of videos, showing our early steps in building video game AIs that can understand fuzzy human concepts – and therefore, can begin to interact with people on their own terms.

Learning in “the playhouse”

Our framework begins with people interacting with other people in the video game world. Using imitation learning, we imbued agents with a broad but unrefined set of behaviours. This "behaviour prior" is crucial for enabling interactions that can be judged by humans. Without this initial imitation phase, agents are entirely random and virtually impossible to interact with. Further human judgement of the agent’s behaviour and optimisation of these judgements by reinforcement learning (RL) produces better agents, which can then be improved again.

We built agents by (1) imitating human-human interactions, and then improving agents though a cycle of (2) human-agent interaction and human feedback, (3) reward model training, and (4) reinforcement learning. First we built a simple video game world based on the concept of a child's “playhouse.” This environment provided a safe setting for humans and agents to interact and made it easy to rapidly collect large volumes of these interaction data. The house featured a variety of rooms, furniture, and objects configured in new arrangements for each interaction. We also created an interface for interaction.

Both the human and agent have an avatar in the game that enables them to move within – and manipulate – the environment. They can also chat with each other in real-time and collaborate on activities, such as carrying objects and handing them to each other, building a tower of blocks, or cleaning a room together. Human participants set the contexts for the interactions by navigating through the world, setting goals, and asking questions for agents. In total, the project collected more than 25 years of real-time interactions between agents and hundreds of (human) participants.

Observing behaviours that emerge The agents we trained are capable of a huge range of tasks, some of which were not anticipated by the researchers who built them. For instance, we discovered that these agents can build rows of objects using two alternating colours or retrieve an object from a house that’s similar to another object the user is holding.

These surprises emerge because language permits a nearly endless set of tasks and questions via the composition of simple meanings. Also, as researchers, we do not specify the details of agent behaviour. Instead, the hundreds of humans who engage in interactions came up with tasks and questions during the course of these interactions.

Click Here to Learn More

20

Berkeley AI Shares LMD - The Fusion of GPT-4 and Stable Diffusion (self.fosai)

submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai

1 comments fedilink

BAIR Shares LMD - The Fusion of GPT-4 and Stable Diffusion

By Long Lian, Boyi Li, Adam Yala, Trevor Darrell

Quick Summary

How does it work?: Text Prompt → Large Language Model (LLM) → Intermediate Representation → Stable Diffusion → Final Image.

The Problem: Existing diffusion models excel at text-to-image synthesis but often fail to accurately capture spatial relationships, negations, numeracy, and attribute assignments in the prompt.

Our Solution: Introducing LLM-grounded Diffusion (LMD), a method that significantly improves prompt understanding in these challenging scenarios.

Visualizations
Figure 1: LMD enhances prompt understanding in text-to-image models.

The Nitty-Gritty

Our Approach

We sidestep the high cost and time investment of training new models by using pretrained Large Language Models (LLMs) and diffusion models in a unique two-step process.

LLM as Layout Generator: An LLM generates a scene layout with bounding boxes and object descriptions based on the prompt.
Diffusion Model Controller: This takes the LLM output and creates images conditioned on the layout.

Both stages use frozen pretrained models, minimizing training costs.
Read the full paper on arXiv

Process Overview
Figure 2: The two-stage process of LMD.

Additional Features

Dialog-Based Scene Specification: Enables interactive prompt modifications.
Language Support: Capable of processing prompts in languages that aren't natively supported by the underlying diffusion model.

Additional Abilities
Figure 3: LMD's multi-language and dialog-based capabilities.

Why Does This Matter?

We demonstrate LMD's superiority over existing diffusion models, particularly in generating images that accurately match complex text prompts involving language and spatial reasoning.

Performance Comparison
Figure 4: LMD vs Base Diffusion Model.