Blaed

joined 2 years ago
MODERATOR OF
 

cross-posted from: https://lemmy.world/post/3549390

stable-diffusion.cpp

Introducing stable-diffusion.cpp, a pure C/C++ inference engine for Stable Diffusion! This is a really awesome implementation to help speed up home inference of diffusion models.

Tailored for developers and AI enthusiasts, this repository offers a high-performance solution for creating and manipulating images using various quantization techniques and accelerated inference.


Key Features:

  • Efficient Implementation: Utilizing plain C/C++, it operates seamlessly like llama.cpp and is built on the ggml framework.
  • Multiple Precision Support: Choose between 16-bit, 32-bit float, and 4-bit to 8-bit integer quantization.
  • Optimized Performance: Experience memory-efficient CPU inference with AVX, AVX2, and AVX512 support for x86 architectures.
  • Versatile Modes: From original txt2img to img2img modes and negative prompt handling, customize your processing needs.
  • Cross-Platform Compatibility: Runs smoothly on Linux, Mac OS, and Windows.

Getting Started

Cloning, building, and running are made simple, and detailed examples are provided for both text-to-image and image-to-image generation. With an array of options for precision and comprehensive usage guidelines, you can easily adapt the code for your specific project requirements.

git clone --recursive https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp
  • If you have already cloned the repository, you can use the following command to update the repository to the latest code.
cd stable-diffusion.cpp
git pull origin master
git submodule update

More Details

  • Plain C/C++ implementation based on ggml, working in the same way as llama.cpp
  • 16-bit, 32-bit float support
  • 4-bit, 5-bit and 8-bit integer quantization support
  • Accelerated memory-efficient CPU inference
    • Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image
  • AVX, AVX2 and AVX512 support for x86 architectures
  • Original txt2img and img2img mode
  • Negative prompt
  • stable-diffusion-webui style tokenizer (not all the features, only token weighting for now)
  • Sampling method
    • Euler A
  • Supported platforms
    • Linux
    • Mac OS
    • Windows

This is a really exciting repo. I'll be honest, I don't think I am as well versed in what's going on for diffusion inference - but I do know more efficient and effective methods running those models are always welcome by people frequently using diffusers. Especially for those who need to multi-task and maintain performance headroom.

 

stable-diffusion.cpp

Introducing stable-diffusion.cpp, a pure C/C++ inference engine for Stable Diffusion! This is a really awesome implementation to help speed up home inference of diffusion models.

Tailored for developers and AI enthusiasts, this repository offers a high-performance solution for creating and manipulating images using various quantization techniques and accelerated inference.


Key Features:

  • Efficient Implementation: Utilizing plain C/C++, it operates seamlessly like llama.cpp and is built on the ggml framework.
  • Multiple Precision Support: Choose between 16-bit, 32-bit float, and 4-bit to 8-bit integer quantization.
  • Optimized Performance: Experience memory-efficient CPU inference with AVX, AVX2, and AVX512 support for x86 architectures.
  • Versatile Modes: From original txt2img to img2img modes and negative prompt handling, customize your processing needs.
  • Cross-Platform Compatibility: Runs smoothly on Linux, Mac OS, and Windows.

Getting Started

Cloning, building, and running are made simple, and detailed examples are provided for both text-to-image and image-to-image generation. With an array of options for precision and comprehensive usage guidelines, you can easily adapt the code for your specific project requirements.

git clone --recursive https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp
  • If you have already cloned the repository, you can use the following command to update the repository to the latest code.
cd stable-diffusion.cpp
git pull origin master
git submodule update

More Details

  • Plain C/C++ implementation based on ggml, working in the same way as llama.cpp
  • 16-bit, 32-bit float support
  • 4-bit, 5-bit and 8-bit integer quantization support
  • Accelerated memory-efficient CPU inference
    • Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image
  • AVX, AVX2 and AVX512 support for x86 architectures
  • Original txt2img and img2img mode
  • Negative prompt
  • stable-diffusion-webui style tokenizer (not all the features, only token weighting for now)
  • Sampling method
    • Euler A
  • Supported platforms
    • Linux
    • Mac OS
    • Windows

This is a really exciting repo. I'll be honest, I don't think I am as well versed in what's going on for diffusion inference - but I do know more efficient and effective methods running those models are always welcome by people frequently using diffusers. Especially for those who need to multi-task and maintain performance headroom.

 

cross-posted from: https://lemmy.world/post/3439370

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions

A wild new GitHub Repo has appeared!

Today we cover Cheetah - an exciting new take on interleaving image and text context & instruction.

For higher quality images, please visit the main projects repo to see their code and approach in all of their glory.

I4 Benchmark

To facilitate research in interleaved vision-language instruction following, we build I4 (semantically Interconnected, Interleaved Image-Text Instruction-Following), an extensive large-scale benchmark of 31 tasks with diverse instructions in a unified instruction-response format, covering 20 diverse scenarios.

I4 has three important properties:

  • Interleaved vision-language context: all the instructions contain sequences of inter-related images and texts, such as storyboards with scripts, textbooks with diagrams.
  • Diverse forms of complex instructions: the instructions range from predicting dialogue for comics, to discovering differences between surveillance images, and to conversational embodied tasks.
  • Vast range of instruction-following scenarios: the benchmark covers multiple application scenarios, including cartoons, industrial images, driving recording, etc.

Cheetor: a multi-modal large language model empowered by controllable knowledge re-injection

Cheetor is a Transformer-based multi-modal large language model empowered by controllable knowledge re-injection, which can effectively handle a wide variety of interleaved vision-language instructions.

Cases

Cheetor demonstrates strong abilities to perform reasoning over complicated interleaved vision-language instructions. For instance, in (a), Cheetor is able to keenly identify the connections between the images and thereby infer the reason that causes this unusual phenomenon. In (b, c), Cheetor can reasonably infer the relations among the images and understand the metaphorical implications they want to convey. In (e, f), Cheetor exhibits the ability to comprehend absurd objects through multi-modal conversations with humans.

Getting Started

1. Installation

Git clone our repository and creating conda environment:

git clone https://github.com/DCDmllm/Cheetah.git
cd Cheetah/Cheetah
conda create -n cheetah python=3.8
conda activate cheetah
pip install -r requirement.txt

2. Prepare Vicuna Weights and Llama2 weights

The current version of Cheetor supports Vicuna-7B and LLaMA2-7B as the language model. Please first follow the instructions to prepare Vicuna-v0 7B weights and follow the instructions to prepare LLaMA-2-Chat 7B weights.

Then modify the llama_model in the Cheetah/cheetah/configs/models/cheetah_vicuna.yaml to the folder that contains Vicuna weights and modify the llama_model in the Cheetah/cheetah/configs/models/cheetah_llama2.yaml to the folder that contains LLaMA2 weights.

3. Prepare the pretrained checkpoint for Cheetor

Download the pretrained checkpoints of Cheetah according to the language model you prepare:

Checkpoint Aligned with Vicuna 7B Checkpoint Aligned with LLaMA2 7B
Download Download

For the checkpoint aligned with Vicuna 7B, please set the path to the pretrained checkpoint in the evaluation config file in Cheetah/eval_configs/cheetah_eval_vicuna.yaml at Line 10.

For the checkpoint aligned with LLaMA2 7B, please set the path to the pretrained checkpoint in the evaluation config file in Cheetah/eval_configs/cheetah_eval_llama2.yaml at Line 10.

Besides, Cheetor reuses the pretrained Q-former from BLIP-2 that matches FlanT5-XXL.

4. How to use Cheetor

Examples of using our Cheetah model are provided in files Cheetah/test_cheetah_vicuna.py and Cheetah/test_cheetah_llama2.py. You can test your own samples following the format shown in these two files. And you can run the test code in the following way (taking the Vicuna version of Cheetah as an example):

python test_cheetah_vicuna.py --cfg-path eval_configs/cheetah_eval_vicuna.yaml --gpu-id 0

And in the near future, we will also demonstrate how to launch the gradio demo of Cheetor locally.


ChatGPT-4 Breakdown:

Imagine a brilliant detective who has a unique skill: they can understand stories told not just through spoken or written words, but also by examining pictures, diagrams, or comics. This detective doesn't just listen or read; they also observe and link the visual clues with the narrative. When given a comic strip without dialogues or a textbook diagram with some text, they can deduce what's happening, understanding both the pictures and words as one unified story.

In the world of artificial intelligence, "Cheetor" is that detective. It's a sophisticated program designed to interpret and respond to a mix of images and texts, enabling it to perform tasks that require both vision and language understanding.

Projects to Try with Cheetor:

Comic Story Creator: Input: A series of related images or sketches. Cheetor’s Task: Predict and generate suitable dialogues or narratives to turn those images into a comic story.

Education Assistant: Input: A page from a textbook containing both diagrams and some accompanying text. Cheetor’s Task: Answer questions based on the content, ensuring it considers both the visual and written information.

Security Analyst: Input: Surveillance footage or images with accompanying notes or captions. Cheetor’s Task: Identify discrepancies or anomalies, integrating visual cues with textual information.

Drive Safety Monitor: Input: Video snippets from a car's dashcam paired with audio transcriptions or notes. Cheetor’s Task: Predict potential hazards or traffic violations by understanding both the visual and textual data.

Art Interpreter: Input: Art pieces or abstract images with associated artist's notes. Cheetor’s Task: Explain or interpret the art, merging the visual elements with the artist's intentions or story behind the work.


This is a really interesting strategy and implementation! A model that can interpret both natural language text with high quality image recognition and computer vision can lead to all sorts of wild new applications. I am excited to see where this goes in the open-source community and how it develops the rest of this year.

 

cross-posted from: https://lemmy.world/post/3439370

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions

A wild new GitHub Repo has appeared!

Today we cover Cheetah - an exciting new take on interleaving image and text context & instruction.

For higher quality images, please visit the main projects repo to see their code and approach in all of their glory.

I4 Benchmark

To facilitate research in interleaved vision-language instruction following, we build I4 (semantically Interconnected, Interleaved Image-Text Instruction-Following), an extensive large-scale benchmark of 31 tasks with diverse instructions in a unified instruction-response format, covering 20 diverse scenarios.

I4 has three important properties:

  • Interleaved vision-language context: all the instructions contain sequences of inter-related images and texts, such as storyboards with scripts, textbooks with diagrams.
  • Diverse forms of complex instructions: the instructions range from predicting dialogue for comics, to discovering differences between surveillance images, and to conversational embodied tasks.
  • Vast range of instruction-following scenarios: the benchmark covers multiple application scenarios, including cartoons, industrial images, driving recording, etc.

Cheetor: a multi-modal large language model empowered by controllable knowledge re-injection

Cheetor is a Transformer-based multi-modal large language model empowered by controllable knowledge re-injection, which can effectively handle a wide variety of interleaved vision-language instructions.

Cases

Cheetor demonstrates strong abilities to perform reasoning over complicated interleaved vision-language instructions. For instance, in (a), Cheetor is able to keenly identify the connections between the images and thereby infer the reason that causes this unusual phenomenon. In (b, c), Cheetor can reasonably infer the relations among the images and understand the metaphorical implications they want to convey. In (e, f), Cheetor exhibits the ability to comprehend absurd objects through multi-modal conversations with humans.

Getting Started

1. Installation

Git clone our repository and creating conda environment:

git clone https://github.com/DCDmllm/Cheetah.git
cd Cheetah/Cheetah
conda create -n cheetah python=3.8
conda activate cheetah
pip install -r requirement.txt

2. Prepare Vicuna Weights and Llama2 weights

The current version of Cheetor supports Vicuna-7B and LLaMA2-7B as the language model. Please first follow the instructions to prepare Vicuna-v0 7B weights and follow the instructions to prepare LLaMA-2-Chat 7B weights.

Then modify the llama_model in the Cheetah/cheetah/configs/models/cheetah_vicuna.yaml to the folder that contains Vicuna weights and modify the llama_model in the Cheetah/cheetah/configs/models/cheetah_llama2.yaml to the folder that contains LLaMA2 weights.

3. Prepare the pretrained checkpoint for Cheetor

Download the pretrained checkpoints of Cheetah according to the language model you prepare:

Checkpoint Aligned with Vicuna 7B Checkpoint Aligned with LLaMA2 7B
Download Download

For the checkpoint aligned with Vicuna 7B, please set the path to the pretrained checkpoint in the evaluation config file in Cheetah/eval_configs/cheetah_eval_vicuna.yaml at Line 10.

For the checkpoint aligned with LLaMA2 7B, please set the path to the pretrained checkpoint in the evaluation config file in Cheetah/eval_configs/cheetah_eval_llama2.yaml at Line 10.

Besides, Cheetor reuses the pretrained Q-former from BLIP-2 that matches FlanT5-XXL.

4. How to use Cheetor

Examples of using our Cheetah model are provided in files Cheetah/test_cheetah_vicuna.py and Cheetah/test_cheetah_llama2.py. You can test your own samples following the format shown in these two files. And you can run the test code in the following way (taking the Vicuna version of Cheetah as an example):

python test_cheetah_vicuna.py --cfg-path eval_configs/cheetah_eval_vicuna.yaml --gpu-id 0

And in the near future, we will also demonstrate how to launch the gradio demo of Cheetor locally.


ChatGPT-4 Breakdown:

Imagine a brilliant detective who has a unique skill: they can understand stories told not just through spoken or written words, but also by examining pictures, diagrams, or comics. This detective doesn't just listen or read; they also observe and link the visual clues with the narrative. When given a comic strip without dialogues or a textbook diagram with some text, they can deduce what's happening, understanding both the pictures and words as one unified story.

In the world of artificial intelligence, "Cheetor" is that detective. It's a sophisticated program designed to interpret and respond to a mix of images and texts, enabling it to perform tasks that require both vision and language understanding.

Projects to Try with Cheetor:

Comic Story Creator: Input: A series of related images or sketches. Cheetor’s Task: Predict and generate suitable dialogues or narratives to turn those images into a comic story.

Education Assistant: Input: A page from a textbook containing both diagrams and some accompanying text. Cheetor’s Task: Answer questions based on the content, ensuring it considers both the visual and written information.

Security Analyst: Input: Surveillance footage or images with accompanying notes or captions. Cheetor’s Task: Identify discrepancies or anomalies, integrating visual cues with textual information.

Drive Safety Monitor: Input: Video snippets from a car's dashcam paired with audio transcriptions or notes. Cheetor’s Task: Predict potential hazards or traffic violations by understanding both the visual and textual data.

Art Interpreter: Input: Art pieces or abstract images with associated artist's notes. Cheetor’s Task: Explain or interpret the art, merging the visual elements with the artist's intentions or story behind the work.


This is a really interesting strategy and implementation! A model that can interpret both natural language text with high quality image recognition and computer vision can lead to all sorts of wild new applications. I am excited to see where this goes in the open-source community and how it develops the rest of this year.

 

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions

A wild new GitHub Repo has appeared!

Today we cover Cheetah - an exciting new take on interleaving image and text context & instruction.

For higher quality images, please visit the main projects repo to see their code and approach in all of their glory.

I4 Benchmark

To facilitate research in interleaved vision-language instruction following, we build I4 (semantically Interconnected, Interleaved Image-Text Instruction-Following), an extensive large-scale benchmark of 31 tasks with diverse instructions in a unified instruction-response format, covering 20 diverse scenarios.

I4 has three important properties:

  • Interleaved vision-language context: all the instructions contain sequences of inter-related images and texts, such as storyboards with scripts, textbooks with diagrams.
  • Diverse forms of complex instructions: the instructions range from predicting dialogue for comics, to discovering differences between surveillance images, and to conversational embodied tasks.
  • Vast range of instruction-following scenarios: the benchmark covers multiple application scenarios, including cartoons, industrial images, driving recording, etc.

Cheetor: a multi-modal large language model empowered by controllable knowledge re-injection

Cheetor is a Transformer-based multi-modal large language model empowered by controllable knowledge re-injection, which can effectively handle a wide variety of interleaved vision-language instructions.

Cases

Cheetor demonstrates strong abilities to perform reasoning over complicated interleaved vision-language instructions. For instance, in (a), Cheetor is able to keenly identify the connections between the images and thereby infer the reason that causes this unusual phenomenon. In (b, c), Cheetor can reasonably infer the relations among the images and understand the metaphorical implications they want to convey. In (e, f), Cheetor exhibits the ability to comprehend absurd objects through multi-modal conversations with humans.

Getting Started

1. Installation

Git clone our repository and creating conda environment:

git clone https://github.com/DCDmllm/Cheetah.git
cd Cheetah/Cheetah
conda create -n cheetah python=3.8
conda activate cheetah
pip install -r requirement.txt

2. Prepare Vicuna Weights and Llama2 weights

The current version of Cheetor supports Vicuna-7B and LLaMA2-7B as the language model. Please first follow the instructions to prepare Vicuna-v0 7B weights and follow the instructions to prepare LLaMA-2-Chat 7B weights.

Then modify the llama_model in the Cheetah/cheetah/configs/models/cheetah_vicuna.yaml to the folder that contains Vicuna weights and modify the llama_model in the Cheetah/cheetah/configs/models/cheetah_llama2.yaml to the folder that contains LLaMA2 weights.

3. Prepare the pretrained checkpoint for Cheetor

Download the pretrained checkpoints of Cheetah according to the language model you prepare:

Checkpoint Aligned with Vicuna 7B Checkpoint Aligned with LLaMA2 7B
Download Download

For the checkpoint aligned with Vicuna 7B, please set the path to the pretrained checkpoint in the evaluation config file in Cheetah/eval_configs/cheetah_eval_vicuna.yaml at Line 10.

For the checkpoint aligned with LLaMA2 7B, please set the path to the pretrained checkpoint in the evaluation config file in Cheetah/eval_configs/cheetah_eval_llama2.yaml at Line 10.

Besides, Cheetor reuses the pretrained Q-former from BLIP-2 that matches FlanT5-XXL.

4. How to use Cheetor

Examples of using our Cheetah model are provided in files Cheetah/test_cheetah_vicuna.py and Cheetah/test_cheetah_llama2.py. You can test your own samples following the format shown in these two files. And you can run the test code in the following way (taking the Vicuna version of Cheetah as an example):

python test_cheetah_vicuna.py --cfg-path eval_configs/cheetah_eval_vicuna.yaml --gpu-id 0

And in the near future, we will also demonstrate how to launch the gradio demo of Cheetor locally.


ChatGPT-4 Breakdown:

Imagine a brilliant detective who has a unique skill: they can understand stories told not just through spoken or written words, but also by examining pictures, diagrams, or comics. This detective doesn't just listen or read; they also observe and link the visual clues with the narrative. When given a comic strip without dialogues or a textbook diagram with some text, they can deduce what's happening, understanding both the pictures and words as one unified story.

In the world of artificial intelligence, "Cheetor" is that detective. It's a sophisticated program designed to interpret and respond to a mix of images and texts, enabling it to perform tasks that require both vision and language understanding.

Projects to Try with Cheetor:

Comic Story Creator: Input: A series of related images or sketches. Cheetor’s Task: Predict and generate suitable dialogues or narratives to turn those images into a comic story.

Education Assistant: Input: A page from a textbook containing both diagrams and some accompanying text. Cheetor’s Task: Answer questions based on the content, ensuring it considers both the visual and written information.

Security Analyst: Input: Surveillance footage or images with accompanying notes or captions. Cheetor’s Task: Identify discrepancies or anomalies, integrating visual cues with textual information.

Drive Safety Monitor: Input: Video snippets from a car's dashcam paired with audio transcriptions or notes. Cheetor’s Task: Predict potential hazards or traffic violations by understanding both the visual and textual data.

Art Interpreter: Input: Art pieces or abstract images with associated artist's notes. Cheetor’s Task: Explain or interpret the art, merging the visual elements with the artist's intentions or story behind the work.


This is a really interesting strategy and implementation! A model that can interpret both natural language text with high quality image recognition and computer vision can lead to all sorts of wild new applications. I am excited to see where this goes in the open-source community and how it develops the rest of this year.

24
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

"How is LLaMa.cpp possible?" great post by @finbarrtimbers

https://finbarr.ca/how-is-llama-cpp-possible/

llama.cpp surprised many people (myself included) with how quickly you can run large LLMs on small computers, e.g. 7B runs @ ~16 tok/s on a MacBook. Wait don't you need supercomputers to work with LLMs?

TLDR

At batch_size=1 (i.e. just generating a single stream of prediction on your computer), the inference is super duper memory-bound. The on-chip compute units are twiddling their thumbs while sucking model weights through a straw from DRAM. Every individual weight that is expensively loaded from DRAM onto the chip is only used for a single instant multiply to process each new input token. So the stat to look at is not FLOPS but the memory bandwidth.

Let's take a look: A100: 1935 GB/s memory bandwidth, 1248 TOPS MacBook M2: 100 GB/s, 7 TFLOPS The compute is ~200X but the memory bandwidth only ~20X. So the little M2 chip that could will only be about ? > ~20X slower than a mighty A100. This is ~10X faster than you might naively expect just looking at ops.

The situation becomes a lot more different when you inference at a very high batch size (e.g. ~160+), such as when you're hosting an LLM engine simultaneously serving a lot of parallel requests. Or in training, where you aren't forced to go serially token by token and can parallelize across both batch and time dimension, because the next token targets (labels) are known. In these cases, once you load the weights into on-chip cache and pay that large fixed cost, you can re-use them across many input examples and reach ~50%+ utilization, actually making those FLOPS count.

So TLDR why is LLM inference surprisingly fast on your MacBook? If all you want to do is batch 1 inference (i.e. a single "stream" of generation), only the memory bandwidth matters. And the memory bandwidth gap between chips is a lot smaller, and has been a lot harder to scale compared to flops.

supplemental figure https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8

 

cross-posted from: https://lemmy.world/post/3350022

Incognito Pilot: The Next-Gen AI Code Interpreter for Sensitive Data

Hello everyone! Today marks the first day of a new series of posts featuring projects in my GitHub Stars.

Most of these repos are FOSS & FOSAI focused, meaning they should be hackable, free, and (mostly) open-source.

We're going to kick this series off by sharing Incognito Pilot. It’s like the ChatGPT Code Interpreter but for those who prioritize data privacy.

Project Summary from ChatGPT-4:

Features:

  • Powered by Large Language Models like GPT-4 and Llama 2.
  • Run code and execute tasks with Python interpreter.
  • Privacy: Interacts with cloud but sensitive data stays local.
  • Local or Remote: Choose between local LLMs (like Llama 2) or API (like GPT-4) with data approval mechanism.

You can use Incognito Pilot to:

  • Analyse data, create visualizations.
  • Convert files, e.g., video to gif.
  • Internet access for tasks like downloading data.

Incognito Pilot ensures data privacy while leveraging GPT-4's capabilities.

Getting Started:

  1. Installation:

    • Use Docker (For Llama 2, check dedicated installation).
    • Create a folder for Incognito Pilot to access. Example: /home/user/ipilot.
    • Have an OpenAI account & API key.
    • Use the provided docker command to run.
    • Access via: http://localhost:3030
    • Bonus: Works with OpenAI's free trial credits (For GPT-3.5).
  2. First Steps:

    • Chat with the interface: Start by saying "Hi".
    • Get familiar: Command it to print "Hello World".
    • Play around: Make it create a text file with numbers.

Notes:

  • Data you enter and approved code results are sent to cloud APIs.
  • All data is processed locally.
  • Advanced users can customize Python interpreter packages for added functionalities.

FAQs:

  • Comparison with ChatGPT Code Interpreter: Incognito Pilot offers a balance between privacy and functionality. It allows internet access, and can be run on powerful machines for larger tasks.

  • Why use Incognito Pilot over just ChatGPT: Multi-round code execution, tons of pre-installed dependencies, and a sandboxed environment.

  • Data Privacy with Cloud APIs: Your core data remains local. Only meta-data approved by you gets sent to the API, ensuring a controlled and conscious usage.


Personally, my only concern using ChatGPT has always been about data privacy. This explores an interesting way to solve that while still getting the state of the art performance that OpenAI has managed to maintain (so far).

I am all for these pro-privacy projects. I hope to see more emerge!

If you get a chance to try this, let us know your experience in the comments below!


Links from this Post

22
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/technology
 

cross-posted from: https://lemmy.world/post/3350022

Incognito Pilot: The Next-Gen AI Code Interpreter for Sensitive Data

Hello everyone! Today marks the first day of a new series of posts featuring projects in my GitHub Stars.

Most of these repos are FOSS & FOSAI focused, meaning they should be hackable, free, and (mostly) open-source.

We're going to kick this series off by sharing Incognito Pilot. It’s like the ChatGPT Code Interpreter but for those who prioritize data privacy.

Project Summary from ChatGPT-4:

Features:

  • Powered by Large Language Models like GPT-4 and Llama 2.
  • Run code and execute tasks with Python interpreter.
  • Privacy: Interacts with cloud but sensitive data stays local.
  • Local or Remote: Choose between local LLMs (like Llama 2) or API (like GPT-4) with data approval mechanism.

You can use Incognito Pilot to:

  • Analyse data, create visualizations.
  • Convert files, e.g., video to gif.
  • Internet access for tasks like downloading data.

Incognito Pilot ensures data privacy while leveraging GPT-4's capabilities.

Getting Started:

  1. Installation:

    • Use Docker (For Llama 2, check dedicated installation).
    • Create a folder for Incognito Pilot to access. Example: /home/user/ipilot.
    • Have an OpenAI account & API key.
    • Use the provided docker command to run.
    • Access via: http://localhost:3030
    • Bonus: Works with OpenAI's free trial credits (For GPT-3.5).
  2. First Steps:

    • Chat with the interface: Start by saying "Hi".
    • Get familiar: Command it to print "Hello World".
    • Play around: Make it create a text file with numbers.

Notes:

  • Data you enter and approved code results are sent to cloud APIs.
  • All data is processed locally.
  • Advanced users can customize Python interpreter packages for added functionalities.

FAQs:

  • Comparison with ChatGPT Code Interpreter: Incognito Pilot offers a balance between privacy and functionality. It allows internet access, and can be run on powerful machines for larger tasks.

  • Why use Incognito Pilot over just ChatGPT: Multi-round code execution, tons of pre-installed dependencies, and a sandboxed environment.

  • Data Privacy with Cloud APIs: Your core data remains local. Only meta-data approved by you gets sent to the API, ensuring a controlled and conscious usage.


Personally, my only concern using ChatGPT has always been about data privacy. This explores an interesting way to solve that while still getting the state of the art performance that OpenAI has managed to maintain (so far).

I am all for these pro-privacy projects. I hope to see more emerge!

If you get a chance to try this, let us know your experience in the comments below!


Links from this Post

16
submitted 1 year ago* (last edited 1 year ago) by Blaed to c/fosai
 

Incognito Pilot: The Next-Gen AI Code Interpreter for Sensitive Data

Hello everyone! Today marks the first day of a new series of posts featuring projects in my GitHub Stars.

Most of these repos are FOSS & FOSAI focused, meaning they should be hackable, free, and (mostly) open-source.

We're going to kick this series off by sharing Incognito Pilot. It’s like the ChatGPT Code Interpreter but for those who prioritize data privacy.

Project Summary from ChatGPT-4:

Features:

  • Powered by Large Language Models like GPT-4 and Llama 2.
  • Run code and execute tasks with Python interpreter.
  • Privacy: Interacts with cloud but sensitive data stays local.
  • Local or Remote: Choose between local LLMs (like Llama 2) or API (like GPT-4) with data approval mechanism.

You can use Incognito Pilot to:

  • Analyse data, create visualizations.
  • Convert files, e.g., video to gif.
  • Internet access for tasks like downloading data.

Incognito Pilot ensures data privacy while leveraging GPT-4's capabilities.

Getting Started:

  1. Installation:

    • Use Docker (For Llama 2, check dedicated installation).
    • Create a folder for Incognito Pilot to access. Example: /home/user/ipilot.
    • Have an OpenAI account & API key.
    • Use the provided docker command to run.
    • Access via: http://localhost:3030
    • Bonus: Works with OpenAI's free trial credits (For GPT-3.5).
  2. First Steps:

    • Chat with the interface: Start by saying "Hi".
    • Get familiar: Command it to print "Hello World".
    • Play around: Make it create a text file with numbers.

Notes:

  • Data you enter and approved code results are sent to cloud APIs.
  • All data is processed locally.
  • Advanced users can customize Python interpreter packages for added functionalities.

FAQs:

  • Comparison with ChatGPT Code Interpreter: Incognito Pilot offers a balance between privacy and functionality. It allows internet access, and can be run on powerful machines for larger tasks.

  • Why use Incognito Pilot over just ChatGPT: Multi-round code execution, tons of pre-installed dependencies, and a sandboxed environment.

  • Data Privacy with Cloud APIs: Your core data remains local. Only meta-data approved by you gets sent to the API, ensuring a controlled and conscious usage.


Personally, my only concern using ChatGPT has always been about data privacy. This explores an interesting way to solve that while still getting the state of the art performance that OpenAI has managed to maintain (so far).

I am all for these pro-privacy projects. I hope to see more emerge!

If you get a chance to try this, let us know your experience in the comments below!


Links from this Post

 

Generative Agents

Interactive Simulacra of Human Behavior

Paper

Code

Related Video(s)

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior.

Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day.

To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language.

In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time.

We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.

Summary from ChatGPT-4

Revolutionizing Virtual Worlds: Meet the Next Generation of Smart Sims!

TL;DR: New software agents, inspired by popular games like The Sims, can mimic authentic human behavior, make decisions, remember past experiences, and plan their day-to-day lives - all using the power of advanced language models!

Key Points:

What are these agents? Picture a virtual person that gets up, prepares breakfast, goes to work, paints if they're an artist, or writes if they're an author. Just like us, they have opinions, make friends, and even chat about their day. These virtual beings are what we call generative agents.

The Magic Behind the Curtain These agents are powered by an advanced system that remembers their experiences in a diary-like format, thinks back on them, and uses these memories to plan future actions. Imagine a digital brain that records every action and uses it to shape future behaviors.

Gaming Brought to Life Imagine stepping into a game, not unlike The Sims, where you can interact with 25 of these agents in real-time. Instead of pre-defined actions, these agents will react based on their past experiences and your interactions.

Why This is Huge Previous virtual characters often followed strict patterns, making them predictable. These new agents break the mold, reacting and planning dynamically, making the virtual world feel more real than ever.

Here we're blending the power of state-of-the-art language models with the fun of interactive gaming to create an experience that's dynamic, engaging, and incredibly believable! This could redefine how we design and experience virtual worlds.

This is really exciting! I remember seeing this demo earlier this year. I feel 2024 will be an exciting year for gaming breakthroughs. I am looking forward to this new age of advanced video game AI.

16
submitted 2 years ago* (last edited 2 years ago) by Blaed to c/fosai
 

Hello everyone!

I am back! I appreciate everyone who has been posting in the interim. Love to see new content being shared.

This thread is inspired by one submitted by @[email protected], who in this post detailed general LLM resources for everyone to discover and interact with (content which has now been consolidated to the full LLM guide here, thanks cll!).

All that being said, I promised I would get back to them on a few questions they had. I figured why not make a thread dedicated to questions any of you would like to ask me.


AMA

If you are here and subscribed to [email protected] - please feel more than welcome asking your question in the comments below. I will try to get back to you within a day or two. I will do my best to answer all questions within reason.

Note that I am not an expert in Machine Learning, but I do work as a professional close to this category of tech. Take my responses with a grain of salt and cross-reference other sources to make sure you are informed.

Always do your own research on top of any of my advice. I am not responsible for the success or failure of your endeavor. Own up to your builds, own up to your development. I am but a resource and friend along the way. A humble guide on your journey to the singularity.


Questions

from @[email protected]

Q1.) What people do you follow for AI? Such as on YT, Twitter, etc.

A1.) I am not on social media often. Lemmy is my safe haven, alongside very specific YouTube channels who give me the information I seek.

Generally speaking, I try to avoid the sensationalism and stay focused on how I can empower myself with this tech now. So far, the channels below have helped me stay informed with practical insights and information.

Whatever it is you're doing, be sure to have a goal in mind. You will spend an eternity learning with no direction.

For me, it was learning how to regularly and consistently train, fine-tune, and deploy open-source models on HuggingFace. That goal has coincidentally led to the creation of this community and my newfound passion for Machine Learning & AI.

Matthew Berman

I like Matthew. He’s cool. Always on top of AI news. 

AemonAlgiz

Aemon is great, provides both general education and technical tutorials.

Aitrepreneur

Useful info, just wish the channel wasn’t so clickbaity. I get it though.

AI Jason

Great for more technical tutorials that you can follow step-by-step.

Venelin Valkov

Another source for tutorials. 

Code Your Own AI

General education. 


Q2.) What other social media forums provide great information?

A2.) Like I mentioned in my last response, I’m not on social media often. I’m a year one gen-z’er (1997) who never really got into social media (until now). Weird, I know. That being said, the only other place I frequent for knowledge is Lex Fridman’s Podcasts - which are treasure troves of information if you enjoy the discussions and/or content. Particularly any of the episodes with tech-based content.

Even if you don’t like some of the guests, you can infer a lot of interesting information based on what context they share with Lex. Much of what I have pieced together for our community here at [email protected] is information I have cross referenced with other independent papers, news, and articles I find credible. I think that’s called journalism, but the definition of that has been skewed compared to today’s social media standards.

Aside from those podcasts, I am a big fan of the /r/LocalLLaMA community, but Reddit is hard for me to revisit nowadays. It just doesn’t feel the same after the API change. I feel the best parts have left or are tucked in corners that are hard to find.

In other words, I get all my info from those YouTube channels I shared above, random posts and comments I find in niche corners, and/or published papers or articles from firsthand sources (typically Arxiv or Medium).

That, and reading documentation. Lots and lots of documentation on whatever it is I'm using at the time...


Q3.) What GUI do you use for local LLMs?

A3.) I might be biased because I’m a linux & vim kind of guy, so I am totally fine working in a terminal - but if I need a consistent GUI for testing, I typically default to oobabooga’s text-generation-webui.

When I am using a GUI that’s not oobabooga’s, it’s for R&D. In the past I have found myself testing with SillyTavern, koboldAI’s client, and gpt4all - but none of them stuck with my workflows. I revisit them from time-to-time to see how they’ve developed. They’re definitely improving and worth taking a look at if you haven't already.

I am finding a lot of new interfaces to chat with LLMs, but not any in particular that stands out above the rest in terms of look, feel, and functionality. I think it’s only a matter of time before we see even more UI/UX's tailored for LLMs. Until then, I’ll be using ooba and other side projects I tinker with from time to time.

When in doubt, you can always learn to code/design your own GUI, but that's pretty advanced if you're just starting to get into all of this. At the end of the day, you should use what's best for you or what you prefer (or like) using the most.


Q4.) What parameters are “best”?

A4.) This is a tricky question. It could mean a lot of things, I’ll do my best.

a.) If this is in reference to hyper parameters and general parameters, I suggest reading this article for starters.

b.) If this question is in reference to what size parameters I personally find 'best' within LLMs (3B, 7B, 13B, etc), well, I'd say the best model is anything you can comfortably run that outputs the results you're hoping for. That could mean a high parameter Falcon 40B model, or perhaps a low parameter 3B Nous model.

The results are so drastically different between use cases and performance that I say experiment with all and see which is the most consistent for you particularly.

For fellow technologists running LLMs at home, that probably means you're running this on consumer-grade hardware. For desktop builds with NVIDIA GPUs I've found really consistent performance from 7B and 13B parameter models.

With some optimization, you can speed up inference depending on what base model you're using and what platform you're running it on. Sometimes even some of the 3B models surprise me with their coherence and perplexity. It's up to you to find what works for you best.

c.) If you're asking about specific parameters I apply during training - I don't have an answer to that yet. I haven't completed enough deployments to give you a worthy response.


Q5.) Is there a Wiki you use?

A5.) I have not found an open-source wiki that suits my need, so I made FOSAI ▲ XYZ to fill help in the gap and serve this community. It's still a work in progress. I sort of used this to practice my web dev. I have plans on upgrading it to a format that will better serve the variety of content it presents. It's a little dry IMO, but here nonetheless. I'll be sure to let everyone know when the new site is live!

When in doubt, read the native documentation of the software (or hardware) you choose to use. It might be a dense read, but nothing CTRL + F can't help with.

Q6.) Where do you go to learn about LLMs/AI/Machine Learning?

A6.) Like everything I do, I self teach myself concept by concept with a results-based mindset referencing all of the learning material and channels I shared a few questions ago.

I frequently fall victim to paralysis by analysis, so I combat this by dividing my ambition into actionable projects with the primary intention of getting them up and running (with reconciliation of notes, knowledge, and understanding after the fact).

I think it's important to adopt a mindset that is brave to jump right into the middle of something without understanding any of it. Almost like exposure therapy.

Over time - concepts will become less alien and things will begin to click. Consistency is key. In my experience, curiosity empowers wonder, which can lead to motivation to learn something you care about. If you care about Machine Learning as much as I do, you'll find ways to teach yourself through sheer will.

On my quest to better understand Machine Learning, I put together an online syllabus that is supposed to equip you with a core foundation of skills necessary to become a professional in the field. This is pieced together from a few videos I stumbled across and recommendations from colleagues in the field. Granted, I have the advantage of already being in tech, but I hold no degree - just a high school diploma.

Math is hard, but part of the fun once you realize the fact that it's the language of reality, physics, computers and science. If you want to learn the same material I am, check out the Machine Learning page from FOSAI ▲ XYZ. Pick and choose what you want to learn. It is non-linear, non-sequential, and here to be learned at your own pace.


Q7.) How do you find quality models?

A7.) Every other day I check and see what TheBloke has released on his HuggingFace. As far as I'm concerned, he has become an open-source hero for the field. I trust everything he puts out, often cherry picking the model I want to test based on their origin card or foundation model.

If there's a model worth running, Tom is fast to optimize a GPTQ or GGML model that he hosts on his page for all of us to interface with (for free). A true gigachad.

Support Tom on Patreon! Some his tiers offer direct technical support if you're hitting a wall and need some hands-on help: https://www.patreon.com/TheBlokeAI


Q8.) What Awesome github repositories do you know?

A8.) I've been waiting for someone to ask me this! Haha. Finally...

I have been collecting all kinds of interesting repo's for FOSS & FOSAI technologies. I am planning to do a new series soon where I dive into each one in more detail.

If you want a head start on that series, check out all of the cool tech people are developing here: https://github.com/adynblaed?tab=stars

I will be highlighting a few for AI / LLM particular use cases soon. You and everyone else here are welcome to explore some of the stars I've collected in the meantime.


Q9.) What do you think would be useful to share?

A9.) For [email protected], I think it's useful for everyone to share their experiences with using FOSS & FOSAI platforms (anything listed in our sidebar) from performance to discoveries, breakthroughs and celebrations.

To be more specific, it would be awesome to see how everyone is using LLMs on a day-to-day basis (if at all), and how it has impacted their lives productively and/or recreationally. For me, this has manifested in the fact I use Google Search 20% of the time I used to, since local LLMs + ChatGPT have replaced primary 'search' for me. This is a big deal because I work in tech and I can't tell you how much of my job was having decent Google-Fu. That has changed into how well I can prompt and how effectively I can extract usable information from a generative pre-trained transformer.

Now that this tech is here, spreading, and open-sourced - I feel we're in the calm before the storm - the crest of the rising tsunami of change the world is about to see in the next few years. I am excited to brave this ship and ride the crest with each and every one of you - but knowing where you want this ship to sail gives me insights that helps me adjust the content I provide (however long I can provide it) and course correct as needed.

Everyone posting their experience with how AI / LLMs / free open-source artificial intelligence are impacting them now, today is absolutely worth sharing even if it seems small or insignificant. I use this community as a time capsule as much as I do engaging with it for learning, sharing, and staying in the know.

Looking back at what we're talking about in 2023 versus what we might be talking about in 2024 will be an interesting context to study, especially as the world begins to adopt AI.

Sharing ideas can lead to discussions, with enough reinforcement these ideas become concepts.

Concepts have opportunity to turn to projects, which develop into realities for all of us to explore together.

Whatever your idea is, please share it. No matter how large or small, silly or ridiculous. If it's satire, label it and let's have fun in the absurdity. If it's legitimate, let's explore the possibilities and draw collective curiosity towards a common goal. This is how we grow and empower one another.

To innovate is to think, to think is to live.

Share your innovations! [email protected] is a safe space to create, think, express, invent, and explore the mind, the self, and the machine.

20
submitted 2 years ago* (last edited 2 years ago) by Blaed to c/fosai
 

FOSAI Weekly News Roundup (August 2023)

Hello everyone! I am back with another news roundup, something I will try to be more consistent on doing every week or so. It has been hard with a recent change in work schedule. That being said, it's likely news will be delivered in this format for the time being. At least until I can free up more of my time.

Until then, I hope this news sates some of your curiosity. By all means please post away if there's something important I miss on the day-to-day basis. As we grow, I will have less time to make frequent posts, I need all the help I can get keeping us informed!

Besides that, I have other projects in the pipeline for this community I am excited to share later this year. The first one is live, and still a WIP - but here for you nonetheless.

Visit https://fosai.xyz for insights on all of the free open-source AI tools we use!

Now, here's some technology & AI news!


🧌 KoboldCPP v1.3.9 Update

Kobold CPP has got a recent update that enables 16k context! Incredible to see how far we've come from that 2k window.


🦅 New Models by Faldore

Faldore has been hard at work releasing two new models: WizardLM-1.0-Uncensored-Llama2-13b & dolphinj-llama2-7b. Check them out!


🗞️ News Highlights

Here are some other news highlights from general sources outside of our community.


🎓 Learning Corner

There is plenty to cover over at https://fosai.xyz. If you're looking for other type learning resources, I'll be including them in this new category week-by-week.

Let me know if there's anything else in particular you'd like me to report on for this weekly news cycle!

Thank you for subscribing to [email protected]. I am excited to see the future we build together!

GL, HF, and happy devving!

[–] Blaed 2 points 2 years ago

I am actively testing this out. It's hard to say at the moment. There's a lot to figure out deploying a model into a live environment, but I think there's real value in using them for technical tasks - especially as models mature and improve over time.

At the moment, though, performance is closer to GPT 3.5 than GPT 4, but I wouldn't be surprised if this is no longer the case within the next year or so.

[–] Blaed 2 points 2 years ago

Hey, thanks for commenting and sharing your experience. I'll be adding a terminology table soon, I'll be sure to include this on there!

I have been bouncing between GPTQ and GGML models since TheBloke first started releasing them - I have yet to come to a definitive conclusion on which I prefer the most, but if you need the extra overhead, I can see why you'd choose GGML.

I don't use text-gen and Stable Diffusion in tandem often, but this will be good to note when I do!

[–] Blaed 4 points 2 years ago (1 children)

After finally having a chance to test some of the new Llama-2 models, I think you're right. There's still some work to be done to get them tuned up... I'm going to dust off some of my notes and get a new index of those other popular gen-1 models out there later this week.

I'm very curious to try out some of these docker images, too. Thanks for sharing those! I'll check them when I can. I could also make a post about them if you feel like featuring some of your work. Just let me know!

[–] Blaed 2 points 2 years ago* (last edited 2 years ago) (2 children)

I have used all of the above. In my experience, Elevenlabs is the most natural sounding (and easy-to-use) with open-source alternatives (kind of) close behind it.

Unfortunately, Elevenlabs code is proprietary, so there’s a bit of a compromise there (unless you want to use one of the open-source alternatives you mentioned). To your point though, they aren’t the most user friendly.

TTS has definitely been a neglected field of interest for some of the new tech to accompany this wave of AI development, but I think it’s only a matter of time before new options emerge as startups and other projects take flight this year and next. It will be a crucial area to nail for immersive video game dialogue, I’m sure someone will come up with a new platform or approach. Fingers crossed they make it open-source.

For now, my suggestion is sticking to whatever TTS workflow works best with your current tech stack until something new comes out.

If you end up finding something worth sharing, let us know! I’m very curious to see how audio and speech synthesis develops alongside all of this other fosai tech we’ve been seeing.

[–] Blaed 1 points 2 years ago* (last edited 2 years ago)

We got models overnight. How crazy is that!

GL, HF!

[–] Blaed 2 points 2 years ago* (last edited 2 years ago)

Appreciate the insight. I like that approach. I just learned you can become an Alchemist too. That's a nice touch.

Alternatively you can become an Alchemist worker which is much more lightweight and can even run on CPU (i.e. without a GPU).

If you're reading this, consider turning your PC into a Horde Worker! It's a great way to contribute to AI (text & image-based) if you have the power to spare.

[–] Blaed 2 points 2 years ago

Thanks for sharing!

[–] Blaed 6 points 2 years ago* (last edited 2 years ago) (2 children)

Hey! Appreciate your post. The AI Horde has been one of my favorite projects to see evolve over the course of this year. Consider me subbed.

For myself (and others not as knowledgeable on the project), do you think you could briefly describe the main differences between how The AI Horde approaches crowd compute / inference compared to something like Petals? I know you mentioned here that the horde doesn't do training. Is that the biggest difference to note?

Thanks again for your contribution to democratizing AI. Excited to see what The AI Horde can do with more supporters. I'll be dedicating a few more nodes when I have a chance to spin them up.

[–] Blaed 1 points 2 years ago

Assuming everything from the papers translate into current platforms, yes! A rather significant one at that. Time will tell us the true results as people begin tinkering with this new approach in the near future.

[–] Blaed 2 points 2 years ago

Thanks for reading! I'm glad you enjoy the content. I find this tech beyond fascinating.

Who knows, over time you might even begin to pick up on some of the nuance you describe.

We're all learning this together!

[–] Blaed 0 points 2 years ago* (last edited 2 years ago)

If you're looking for more tools, you're one click away from [email protected] where you'll find a plethora of resources on the sidebar.

If one click is too much, allow me to share the sidebar tools here!

Resources

The FOSAI Nexus & Lemmy Crash Course are the most cataloged in terms of open-source applications.

Fediverse / FOSAI

LLM Leaderboards

LLM Search Tools

LLM Eval & Benchmark Resources

Hope you find what you're looking for!

[–] Blaed 1 points 2 years ago* (last edited 2 years ago)

The KoboldAI Horde was the first thing that came to my mind when I heard about this too. After some research, it appears Petals and the AI-Horde are similar in concept, but different in strategy and execution.

The Kobold AI-Horde utilizes a 'kudos-based economy' to prioritize render/processing queues.

Petals seems to utilize a different routing/queue mechanism that prioritizes optimization over participation.

So you're not wrong. The AI-Horde accomplishes crowd compute through a similar high level approach, however, the biggest difference (at a glance) seems to be how the I/O is handled and prioritized between the two platforms. That's a bit of an oversimplification, but it communicates the idea.

I really like the concept of crowd-compute, but I'm not sure it'll get as popular as it needs to rival emerging (corporate) exaflops of compute. I hope Petals & AI-Horde benefit from the mutual competition. It would be really cool to see a future where George Hotz & tinycorp actually commoditize the petaflop for consumers. Maybe then crowd compute can begin to rival some of these big tech entities that otherwise dwarf available silicon.

view more: ‹ prev next ›