Blaed

joined 2 years ago
MODERATOR OF
 

Latest News in Machine Learning - Analytics Vidhya Edition

A roundup of exciting developments in the world of machine learning, curated by FOSAI's new semi-automated news report! Let me know if you'd like to see the format of these new reports changed. I'll be experimenting with a few templates to see which stick the most.


Table of Contents

  1. Variational Autoencoders for Anomaly Detection
  2. AI and Image Generation Aesthetics
  3. Text to Sound with Large Language Models
  4. RLHF for High-Performance Decision-Making
  5. Generative Models in Semi-Supervised Learning
  6. Serverless Large Language Models with RunPod
  7. ChatGPT Plugins for Educational Enhancement
  8. Python in Excel for Advanced Analytics
  9. Harnessing Zero-shot and Few-shot Prompting in LLMs

Variational Autoencoders for Anomaly Detection

Intro: Explore the practical applications of generative AI in anomaly detection using Variational Autoencoders (VAEs).
Read More


AI and Image Generation Aesthetics

Intro: Dive into the creative and technical aspects of AI-powered artistic expression, including Neural Style Transfer and GANs.
Read More


Text to Sound with Large Language Models

Intro: Discover how AI can transform a musician's voice command into melodious guitar sounds through ‘Musician's Intent Recognition’.
Read More


RLHF for High-Performance Decision-Making

Intro: Learn about RLHF, an emerging field blending Reinforcement Learning and human feedback for optimizing complex system performance.
Read More


Generative Models in Semi-Supervised Learning

Intro: Understand how leveraging generative models can maximize the utility of limited labeled data in semi-supervised learning scenarios.
Read More


Serverless Large Language Models with RunPod

Intro: Explore how serverless computing and Generative AI can work in harmony, especially for developers lacking local high-GPU resources.
Read More


ChatGPT Plugins for Educational Enhancement

Intro: ChatGPT Plugins are customizing the educational experience, allowing users to browse the web and access specialized knowledge.
Read More


Python in Excel for Advanced Analytics

Intro: Microsoft integrates Python into Excel, enhancing its capabilities in data analysis, machine learning, and predictive analytics.
Read More


Harnessing Zero-shot and Few-shot Prompting in LLMs

Intro: Uncover the potential of Large Language Models in tasks like question-answering, creative writing, and critical analysis.
Read More


That's the roundup for now. Stay tuned for more updates from this new semi-automated workflow.

11
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

I have returned!

I was right, it was a simple oversight on an email verification I missed. I have now regained access thanks to the admins. No funny business to be seen with the Ban of Blaed!

I know I said a lot of content is coming, and it is! But this week in particular is going to be rather busy for me. Please stay tuned and patient until we clear Thursday. I have a huge presentation to give for a full-stack Machine Learning application I have been developing for quite some time (since starting [email protected]!). It is crucial I take the time to polish some of the work needed to hit it home.

Let's see if everything I've learned up to this point has paid off! Keep in mind that I knew literally nothing about Machine Learning, Deep Learning, or AI prior to starting this project. All I have had to work with is a high school diploma and access to the internet. My study/source material has been everything I have shared with each and every one of you here so far (plus some of my own intuitions).

The point of me saying this is that the fact that if I can do it, so can you!

Excelsior,

Blaed

EDIT: Easter egg

 

Training Diffusion Models with Reinforcement Learning

Source

Training Diffusion Models with Reinforcement Learning Teaser

Written by: Kevin Black


Diffusion models have recently emerged as the de facto standard for generating complex, high-dimensional outputs. They are known for producing stunning AI art and hyper-realistic synthetic images, but have also been applied in drug design and continuous control. The core principle of diffusion models involves the iterative transformation of random noise into a sample, often guided by a maximum likelihood estimation approach.

However, many use cases aim not merely to replicate training data, but to achieve specific objectives. In this post, we discuss how reinforcement learning (RL) can train diffusion models to meet these unique goals. Specifically, we finetune Stable Diffusion for various objectives, incorporating feedback from a large vision-language model to enhance the model’s output quality. This demonstrates the potential for powerful AI models to enhance one another without human intervention.


Illustration of the RLAIF objective using the LLaVA VLM A diagram illustrating the prompt-image alignment objective using LLaVA, a large vision-language model.


Denoising Diffusion Policy Optimization

In adapting diffusion to RL, we make a fundamental assumption: for a given sample (e.g. an image), a reward function can evaluate its quality. The goal is for the diffusion model to maximize this reward function. Traditional diffusion models lean on maximum likelihood estimation (MLE) to generate samples, but in the RL context, we use the reward-weighted regression (RWR) method, inspired by existing RL algorithms.

However, this method presents challenges. Our denoising diffusion policy optimization (DDPO) algorithm overcomes these by considering the entire denoising sequence. By viewing the diffusion process as a Markov decision process (MDP), we leverage advanced RL algorithms that focus on multi-step MDPs, using exact likelihood calculations for each denoising step.

We've applied policy gradient algorithms due to their past success in language model finetuning. This led to two DDPO variants: DDPOSF (using the REINFORCE policy gradient) and DDPOIS (following the proximal policy optimization (PPO) method).


Finetuning Stable Diffusion Using DDPO

We finetuned Stable Diffusion v1-4 using DDPOIS for the following reward functions:

  • Compressibility: Ease of image compression using JPEG.
  • Incompressibility: Difficulty of image compression using JPEG.
  • Aesthetic Quality: Evaluated by the LAION aesthetic predictor.
  • Prompt-Image Alignment: Uses LLaVA to describe the image, then matches this description to the prompt using BERTScore.

For finetuning, we provided prompts like “a(n) [animal]” for the first three tasks. For prompt-image alignment, we gave prompts like “a(n) [animal] [activity]”.

Results on aesthetic, compressibility, and incompressibility

We also explored DDPO's application in prompt-image alignment, observing a trend toward a more cartoonish style.

Results on prompt-image alignment


There is a lot more they covered on their website!

Click here for the full report.

 

GPT-4 + Stable-Diffusion = ?: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

Source

TL;DR: Text Prompt -> LLM -> Intermediate Representation (such as an image layout) -> Stable Diffusion -> Image.

Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, despite their impressive capabilities, diffusion models, such as Stable Diffusion, often struggle to accurately follow the prompts when spatial or common sense reasoning is required.

The following figure lists four scenarios in which Stable Diffusion falls short in generating images that accurately correspond to the given prompts, namely negation, numeracy, attribute assignment, spatial relationships. In contrast, our method, LLM-grounded Diffusion (LMD), delivers much better prompt understanding in text-to-image generation in those scenarios.

Visualizations

Figure 1: LLM-grounded Diffusion enhances the prompt understanding ability of text-to-image diffusion models.

One possible solution to address this issue is of course to gather a vast multi-modal dataset comprising intricate captions and train a large diffusion model with a large language encoder. This approach comes with significant costs: It is time-consuming and expensive to train both large language models (LLMs) and diffusion models.

Our Solution

To efficiently solve this problem with minimal cost (i.e., no training costs), we instead equip diffusion models with enhanced spatial and common sense reasoning by using off-the-shelf frozen LLMs in a novel two-stage generation process.

First, we adapt an LLM to be a text-guided layout generator through in-context learning. When provided with an image prompt, an LLM outputs a scene layout in the form of bounding boxes along with corresponding individual descriptions. Second, we steer a diffusion model with a novel controller to generate images conditioned on the layout. Both stages utilize frozen pretrained models without any LLM or diffusion model parameter optimization. We invite readers to read the paper on arXiv for additional details.

Text to layout

Figure 2: LMD is a text-to-image generative model with a novel two-stage generation process: a text-to-layout generator with an LLM + in-context learning and a novel layout-guided stable diffusion. Both stages are training-free.

LMD’s Additional Capabilities

Additionally, LMD naturally allows dialog-based multi-round scene specification, enabling additional clarifications and subsequent modifications for each prompt. Furthermore, LMD is able to handle prompts in a language that is not well-supported by the underlying diffusion model.

Additional abilities

Figure 3: Incorporating an LLM for prompt understanding, our method is able to perform dialog-based scene specification and generation from prompts in a language (Chinese in the example above) that the underlying diffusion model does not support.

Given an LLM that supports multi-round dialog (e.g., GPT-3.5 or GPT-4), LMD allows the user to provide additional information or clarifications to the LLM by querying the LLM after the first layout generation in the dialog and generate images with the updated layout in the subsequent response from the LLM. For example, a user could request to add an object to the scene or change the existing objects in location or descriptions (the left half of Figure 3).

Furthermore, by giving an example of a non-English prompt with a layout and background description in English during in-context learning, LMD accepts inputs of non-English prompts and will generate layouts, with descriptions of boxes and the background in English for subsequent layout-to-image generation. As shown in the right half of Figure 3, this allows generation from prompts in a language that the underlying diffusion models do not support.

Visualizations

We validate the superiority of our design by comparing it with the base diffusion model (SD 2.1) that LMD uses under the hood. We invite readers to our work for more evaluation and comparisons.

Main Visualizations

Figure 4: LMD outperforms the base diffusion model in accurately generating images according to prompts that necessitate both language and spatial reasoning. LMD also enables counterfactual text-to-image generation that the base diffusion model is not able to generate (the last row).

For more details about LLM-grounded Diffusion (LMD), visit our website and read the paper on arXiv.

Click Here to Learn More


Disclaimer: This report was generated by ChatGPT from third-party sources and may be subject to change. Always refer to the original source for the most up-to-date information. This automated news report is still experimental but open to improvements! More news soon!

4
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

In this video, we'll learn how to build Large Language Model (LLM) + Retrieval Augmented Generation (RAG) pipelines using open-source models from Hugging Face deployed on AWS SageMaker. We use the MiniLM sentence transformer to power our semantic search component with Pinecone.

 

Hello everyone!

I am working on figuring out better workflows bringing back more consistent post schedules. In the meantime, I'd like to leave you with a new update from LocalAI & Continue.

Check these projects out! More info from the Continue & LocalAI teams below:

Continue

The open-source autopilot for software development A VS Code extension that brings the power of ChatGPT to your IDE

LocalAI

LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU.

Combining the Power of Continue + LocalAI!


Note

From this release the llama backend supports only gguf files (see 943 ). LocalAI however still supports ggml files. We ship a version of llama.cpp before that change in a separate backend, named llama-stable to allow still loading ggml files. If you were specifying the llama backend manually to load ggml files from this release you should use llama-stable instead, or do not specify a backend at all (LocalAI will automatically handle this).

Continue

logo

This document presents an example of integration with continuedev/continue.

Screenshot

For a live demonstration, please click on the link below:

Integration Setup Walkthrough

  1. As outlined in continue's documentation, install the Visual Studio Code extension from the marketplace and open it.

  2. In this example, LocalAI will download the gpt4all model and set it up as "gpt-3.5-turbo". Refer to the docker-compose.yaml file for details.

    # Clone LocalAI
    git clone https://github.com/go-skynet/LocalAI
    
    cd LocalAI/examples/continue
    
    # Start with docker-compose
    docker-compose up --build -d
    
  3. Type /config within Continue's VSCode extension, or edit the file located at ~/.continue/config.py on your system with the following configuration:

    from continuedev.src.continuedev.libs.llm.openai import OpenAI, OpenAIServerInfo
    
    config = ContinueConfig(
       ...
       models=Models(
            default=OpenAI(
               api_key="my-api-key",
               model="gpt-3.5-turbo",
               openai_server_info=OpenAIServerInfo(
                  api_base="http://localhost:8080",
                  model="gpt-3.5-turbo"
               )
            )
       ),
    )
    

This setup enables you to make queries directly to your model running in the Docker container. Note that the api_key does not need to be properly set up; it is included here as a placeholder.

If editing the configuration seems confusing, you may copy and paste the provided default config.py file over the existing one in ~/.continue/config.py after initializing the extension in the VSCode IDE.

Additional Resources

 

cross-posted from: https://lemmy.world/post/3879861

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B

Hello everyone! This post marks an exciting moment for [email protected] and everyone in the open-source large language model and AI community.

We appear to have a new contender on the block, a model apparently capable of surpassing OpenAI's state of the art ChatGPT-4 in coding evals (evaluations).

This is huge. Not too long ago I made an offhand comment on us catching up to GPT-4 within a year. I did not expect that prediction to end up being reality in half the time. Let's hope this isn't a one-off scenario and that we see a new wave of open-source models that begin to challenge OpenAI.

Buckle up, it's going to get interesting!

Here's some notes from the blog, which you should visit and read in its entirety:


Blog Post

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset.

The CodeLlama models released yesterday demonstrate impressive performance on HumanEval.

  • CodeLlama-34B achieved 48.8% pass@1 on HumanEval
  • CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval

We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens.

Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and found no contaminated examples. 

The methodology is:

  • For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters.
  • A match was identified if any sampled substring was a substring of the processed training example.

For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models:

  • Phind-CodeLlama-34B-v1 achieved 67.6% pass@1 on HumanEval
  • Phind-CodeLlama-34B-Python-v1 achieved 69.5% pass@1 on HumanEval

Download

We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results.


If you get a chance to try either of these models out, let us know how it goes in the comments below!

If you found anything about this post interesting, consider subscribing to [email protected].

Cheers to the power of open-source! May we continue the fight for optimization, efficiency, and performance.

 

cross-posted from: https://lemmy.world/post/3879861

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B

Hello everyone! This post marks an exciting moment for [email protected] and everyone in the open-source large language model and AI community.

We appear to have a new contender on the block, a model apparently capable of surpassing OpenAI's state of the art ChatGPT-4 in coding evals (evaluations).

This is huge. Not too long ago I made an offhand comment on us catching up to GPT-4 within a year. I did not expect that prediction to end up being reality in half the time. Let's hope this isn't a one-off scenario and that we see a new wave of open-source models that begin to challenge OpenAI.

Buckle up, it's going to get interesting!

Here's some notes from the blog, which you should visit and read in its entirety:


Blog Post

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset.

The CodeLlama models released yesterday demonstrate impressive performance on HumanEval.

  • CodeLlama-34B achieved 48.8% pass@1 on HumanEval
  • CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval

We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens.

Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and found no contaminated examples. 

The methodology is:

  • For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters.
  • A match was identified if any sampled substring was a substring of the processed training example.

For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models:

  • Phind-CodeLlama-34B-v1 achieved 67.6% pass@1 on HumanEval
  • Phind-CodeLlama-34B-Python-v1 achieved 69.5% pass@1 on HumanEval

Download

We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results.


If you get a chance to try either of these models out, let us know how it goes in the comments below!

If you found anything about this post interesting, consider subscribing to [email protected].

Cheers to the power of open-source! May we continue the fight for optimization, efficiency, and performance.

 

cross-posted from: https://lemmy.world/post/3879861

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B

Hello everyone! This post marks an exciting moment for [email protected] and everyone in the open-source large language model and AI community.

We appear to have a new contender on the block, a model apparently capable of surpassing OpenAI's state of the art ChatGPT-4 in coding evals (evaluations).

This is huge. Not too long ago I made an offhand comment on us catching up to GPT-4 within a year. I did not expect that prediction to end up being reality in half the time. Let's hope this isn't a one-off scenario and that we see a new wave of open-source models that begin to challenge OpenAI.

Buckle up, it's going to get interesting!

Here's some notes from the blog, which you should visit and read in its entirety:


Blog Post

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset.

The CodeLlama models released yesterday demonstrate impressive performance on HumanEval.

  • CodeLlama-34B achieved 48.8% pass@1 on HumanEval
  • CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval

We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens.

Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and found no contaminated examples. 

The methodology is:

  • For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters.
  • A match was identified if any sampled substring was a substring of the processed training example.

For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models:

  • Phind-CodeLlama-34B-v1 achieved 67.6% pass@1 on HumanEval
  • Phind-CodeLlama-34B-Python-v1 achieved 69.5% pass@1 on HumanEval

Download

We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results.


If you get a chance to try either of these models out, let us know how it goes in the comments below!

If you found anything about this post interesting, consider subscribing to [email protected].

Cheers to the power of open-source! May we continue the fight for optimization, efficiency, and performance.

41
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

Introducing CodeLlama-34B - One of the First Open-Source Models to Beat OpenAI's ChatGPT-4!

Hello everyone! This post marks an exciting moment for [email protected] and everyone in the open-source large language model and AI community.

We appear to have a new contender on the block, a model apparently capable of surpassing OpenAI's state of the art ChatGPT-4 in coding evals (evaluations).

This is huge. Not too long ago I made an offhand comment on us catching up to GPT-4 within a year. I did not expect that prediction to end up being reality in half the time. Let's hope this isn't a one-off scenario and that we see a new wave of open-source models that begin to challenge OpenAI.

Buckle up, it's going to get interesting!

Here's some notes from the blog, which you should visit and read in its entirety:


Blog Post

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset.

The CodeLlama models released yesterday demonstrate impressive performance on HumanEval.

  • CodeLlama-34B achieved 48.8% pass@1 on HumanEval
  • CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval

We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens.

Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and found no contaminated examples. 

The methodology is:

  • For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters.
  • A match was identified if any sampled substring was a substring of the processed training example.

For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models:

  • Phind-CodeLlama-34B-v1 achieved 67.6% pass@1 on HumanEval
  • Phind-CodeLlama-34B-Python-v1 achieved 69.5% pass@1 on HumanEval

Download

We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results.


If you get a chance to try either of these models out, let us know how it goes in the comments below!

If you found anything about this post interesting, consider subscribing to [email protected].

Cheers to the power of open-source! May we continue the fight for optimization, efficiency, and performance.

15
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

GPT-3.5 Turbo fine-tuning and API updates

Developers can now bring their own data to customize GPT-3.5 Turbo for their use cases.

Personally, I wish OpenAI opened more, but this is a step in the right direction. The costs to fine-tune continue to lower, with more and more access being given to this path of customization.

Here's the news:

Fine-tuning for GPT-3.5 Turbo is now available, with fine-tuning for GPT-4 coming this fall. This update gives developers the ability to customize models that perform better for their use cases and run these custom models at scale. Early tests have shown a fine-tuned version of GPT-3.5 Turbo can match, or even outperform, base GPT-4-level capabilities on certain narrow tasks. As with all our APIs, data sent in and out of the fine-tuning API is owned by the customer and is not used by OpenAI, or any other organization, to train other models.


Fine-tuning use cases

Since the release of GPT-3.5 Turbo, developers and businesses have asked for the ability to customize the model to create unique and differentiated experiences for their users. With this launch, developers can now run supervised fine-tuning to make this model perform better for their use cases.

In our private beta, fine-tuning customers have been able to meaningfully improve model performance across common use cases, such as:

Improved steerability: Fine-tuning allows businesses to make the model follow instructions better, such as making outputs terse or always responding in a given language. For instance, developers can use fine-tuning to ensure that the model always responds in German when prompted to use that language.

Reliable output formatting: Fine-tuning improves the model's ability to consistently format responses—a crucial aspect for applications demanding a specific response format, such as code completion or composing API calls. A developer can use fine-tuning to more reliably convert user prompts into high-quality JSON snippets that can be used with their own systems.

Custom tone: Fine-tuning is a great way to hone the qualitative feel of the model output, such as its tone, so it better fits the voice of businesses’ brands. A business with a recognizable brand voice can use fine-tuning for the model to be more consistent with their tone.

In addition to increased performance, fine-tuning also enables businesses to shorten their prompts while ensuring similar performance. Fine-tuning with GPT-3.5-Turbo can also handle 4k tokens—double our previous fine-tuned models. Early testers have reduced prompt size by up to 90% by fine-tuning instructions into the model itself, speeding up each API call and cutting costs.

Fine-tuning is most powerful when combined with other techniques such as prompt engineering, information retrieval, and function calling. Check out our fine-tuning guide to learn more. Support for fine-tuning with function calling and gpt-3.5-turbo-16k will be coming later this fall.


Fine-tuning steps

Step 1

Prepare your data

{ "messages": [ { "role": "system", "content": "You are an assistant that occasionally misspells words" }, { "role": "user", "content": "Tell me a story." }, { "role": "assistant", "content": "One day a student went to schoool." } ] }

Step 2

Upload files

curl -https://api.openai.com/v1/files \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -F "purpose=fine-tune" \ -F "file=@path_to_your_file"

Step 3

Create a fine-tuning job

curl https://api.openai.com/v1/fine_tuning/jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{ "training_file": "TRAINING_FILE_ID", "model": "gpt-3.5-turbo-0613", }'

Once a model finishes the fine-tuning process, it is available to be used in production right away and has the same shared rate limits as the underlying model.

Step 4

Use a fine-tuned model

curl https://api.openai.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{ "model": "ft:gpt-3.5-turbo:org_id", "messages": [ { "role": "system", "content": "You are an assistant that occasionally misspells words" }, { "role": "user", "content": "Hello! What is fine-tuning?" } ] }'

OpenAI will also be debuting a fine-tuning UI in the near future, which will give developers easier access to information about ongoing fine-tuning jobs, completed model snapshots, and more.


Safety

It is very important to us that the deployment of fine-tuning is safe. To preserve the default model's safety features through the fine-tuning process, fine-tuning training data is passed through our Moderation API and a GPT-4 powered moderation system to detect unsafe training data that conflict with our safety standards.

Pricing

Fine-tuning costs are broken down into two buckets: the initial training cost and usage cost:

  • Training: $0.008 / 1K Tokens
  • Usage input: $0.012 / 1K Tokens
  • Usage output: $0.016 / 1K Tokens

For example, a gpt-3.5-turbo fine-tuning job with a training file of 100,000 tokens that is trained for 3 epochs would have an expected cost of $2.40.


Updated GPT-3 models

In July, we announced that the original GPT-3 base models (ada, babbage, curie, and davinci) would be turned off on January 4th, 2024. Today, we are making babbage-002 and davinci-002 available as replacements for these models, either as base or fine-tuned models. Customers can access those models by querying the Completions API.

These models can be fine-tuned with our new API endpoint /v1/fine_tuning/jobs. This new endpoint offers pagination and more extensibility to support the future evolution of the fine-tuning API. Transitioning from /v1/fine-tunes to the updated endpoint is straightforward and more details can be found in our new fine-tuning guide. This deprecates the old /v1/fine-tunes endpoint, which will be turned off on January 4th, 2024.

Pricing for base and fine-tuned GPT-3 models is as follows:


Read more on the updated documentation below.

 

Dolma

Dolma is an open dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials. It was created as a training corpus for OLMo, AI2 language model.

Designed as a foundational training corpus for the AI2 language model, OLMo, Dolma offers an expansive playground for developers, researchers, and innovators.

Usage

This repository contains tools for generating and inspecting Dolma. To get started, install the Dolma Python library from PyPI.

pip install dolma

The dolma CLI can be access using the dolma command. To see the available commands, use the --help flag.

dolma --help

At the moment, the CLI supports three commands: tag, dedupe, and mix.

For all commands, configurations can be specified from command line, or by passing a YAML or JSON file using the -c flag. For example:

dolma -c config.yaml dedupe --dedupe.name "test"

What Sets Dolma Apart?

Versatility: With its extensive and varied content, Dolma provides ample opportunities for experimentation and research across different AI domains.

Ease of Use: The Dolma Python library can be quickly installed from PyPI, allowing you to jump into your projects without delay.

Robust Tools: This repository equips you with tools for generating, tagging, deduplicating, and mixing the dataset, tailoring it to your specific needs.

Getting Started with Dolma

Utilize the dolma CLI to explore the available commands and configurations. From deduplication with dedupe to document mixing with mix, Dolma opens up numerous possibilities. Full usage instructions are available in the repository.

Contribute and Collaborate

Dolma isn't just a dataset; it's a community-driven initiative that welcomes contributions and ideas. Check out the development guide to see how you can get involved.


There are some questions about the fair usage and license of this dataset, but if those concerns are cleared this may be a dataset worth starring or looking into.

[–] Blaed 2 points 1 year ago (1 children)

Went ahead and consolidated this resource to the main LLM Guide.

I will do another post later this week answering your initial questions!

[–] Blaed 1 points 1 year ago* (last edited 1 year ago)

Shoutout to @cll7793 for contributing to this guide!

See their original post here.

[–] Blaed 2 points 1 year ago* (last edited 1 year ago)

Hey thanks for sharing your post. While I am somewhat concerned about laws and regulations behind this innovative tech, I think we're a bit ahead of the curve here and don't have any real or immediate threat on the horizon. At least for now...

FOSAI is an idea that no one can take from me, from you, from us. Much like FOSS, it's a principle as much as it is a technology. I will advocate for this in the light of optimism and hope as much as I can in whatever theater this technology presents itself to me. At the moment, that's here, but if that changes - I will be sure to bring this torch with me wherever I find myself.

Don't get me wrong, the last thing I want to do is have to jump through legal hoops and hurdles to deploy an open-source model - but congress and regulation moves so slow, I have a strong feeling that many of us will be able to do exactly what we want to do without as much oversight as we might expect from hearings like this.

All the more reason to get involved with the tech now!

A good example of this is how Congress has done very little to actually solve digital piracy or rampant depression and loneliness that has come with the advent of social media. If they can't put up regulations with regular software, I have little worries they'll do anything seriously restricting for people like us.

In my opinion, there's no 'going back' to a pre-AI/LLM world. You cannot control this growth. The only way is forward with each and everyone one of us empowered by this tech - building a brighter tomorrow because we finally have the ability and know-how to close the gap between our social disparities.

Remember, apes together strong!

Jokes aside, we are in the calm before the storm of innovation that I believe will be this next decade. Let's hope we can have our way for quite some time, without restrictions from out-of-touch governments!

Momentum, growth, and innovation are our allies in this.

[–] Blaed 2 points 1 year ago* (last edited 1 year ago) (3 children)

Excellent resource. Thanks for sharing! Appreciate the time you put aside to make this for everyone.

These sort of posts typically take a lot of time for me to make. As of late, I find myself with less and less time to make them as I dive further into my development projects (more educational resources I plan on debuting here when they're ready). But you are absolutely right. The only way to properly grow Lemmy is to continue putting out quality content people want to see.

For our community, it looks like that's a lot of learning and technical resources like this. Or whatever is missing and locked out of the other AI/LLM communities between Lemmy and Reddit (and anywhere else on the web really).

If it's alright with you, I'm going to pin this for the rest of the community and add it to our sidebar!

[–] Blaed 2 points 2 years ago

Really appreciate the info and insights. Helps me adjust and test my benchmarks a ton. It’s remarkable what we’re able to do with consumer hardware now. It’s exciting to imagine where we’ll be at even a year from now!

Let us know if you find a better setup and workflow in the future. Sounds pretty effective though. Curious to see how it powers up for you throughout the rest of the year.

Thanks again. All this info is very helpful for others looking to get something similar running.

[–] Blaed 2 points 2 years ago

Not 100% sure about the demo without clear tagging, but it appears officially out on huggingface so I’m sure we’ll have plenty of other demonstrations hit the web soon (if not this one).

This is all pretty fresh so I’m sure the missing tag denoting v1.5 it was just a small oversight from whoever manages the tool.

Very curious to see more benchmarks and user feedback though! A lot of people liked Vicuna. Let us know your experiences if you get a chance to interact with the model.

[–] Blaed 1 points 2 years ago

GL, HF, and happy devving!

[–] Blaed 3 points 2 years ago* (last edited 2 years ago) (1 children)

I used to feel the same way until I found some very interesting performance results from 3B and 7B parameter models.

Granted, it wasn’t anything I’d deploy to production - but using the smaller models to prototype quick ideas is great before having to rent a gpu and spend time working with the bigger models.

Give a few models a try! You might be pleasantly surprised. There’s plenty to choose from too. You will get wildly different results depending on your use case and prompting approach.

Let us know if you end up finding one you like! I think it is only a matter of time before we’re running 40B+ parameters at home (casually).

[–] Blaed 1 points 2 years ago

Appreciate you catching this so fast. Megathread is up!

[–] Blaed 3 points 2 years ago* (last edited 2 years ago)

Hey, thanks for commenting. You're not alone. I started my Machine Learning journey ~6 months ago in early 2023 without any knowledge of the underlying tech. Granted, I have some experience with infrastructure - but it has taken me a few months to absorb certain concepts and get things working the manual way too. 100% worth it though. I'm glad some of the resources I've found along the way are helping you and anyone else who comes across our community. It's an exciting time to be in this field and the perfect time to jump in.

Love to hear about your 1080 champing through inference. I have a 1080 TI I still hold onto for sentimental reasons... I have considered dusting it off as a standalone inference server. Glad to know it can reach 7B models. That's awesome.

I had no idea Stable Diffusion had a text2video extension.. I'll admit, I'm a big fan of SD, but don't have as much time to commit to it as I'd like. It's definitely something I plan on making more resources on after I reach a few of my text-based LLM goals.

I foresee some very exciting ecosystems in our near future, ones that combine text2image2video workflows to create some really innovating applications. That being said, if you ever run into something cool, don't hesitate to share it with us here!

[–] Blaed 1 points 2 years ago* (last edited 2 years ago)

You could try reducing your memory overhead by going down to 3B parameters. If you want to avoid that - maybe experiment with different models between both GPTQ & GGML formats?

If you're willing to spend a few dollars an hour, you could drastically increase overall memory and power and see if you can get it running on a rented GPU through something like vast.ai or runpod.ai. Might be worth exploring for any test of yours that might need extra oomph.

Given time, I think many of these models will become easier to run as new optimization and runtime methods begin to emerge.

[–] Blaed 2 points 2 years ago
view more: ‹ prev next ›