This AI newsletter is all you need #70

Published in

Towards AI

6 min readOct 24, 2023

What happened this week in AI by Louie

This week in AI, we were particularly interested in seeing two new agent models released. Nvidia has unveiled Eureka, an AI agent designed to guide robots in executing complex tasks autonomously. This agent, powered by GPT-4, can independently generate reward functions that surpass the performance of human experts in 83% of tasks, achieving an average enhancement of 52%. The fascinating demo shared by the company illustrates the agent’s ability to train a robotic hand to perform the rapid pen-spinning trick and a human. As mentioned by one of the authors in a blog post, this library utilizes generative AI and reinforcement learning to solve complex tasks.

In other agent news, Adept researchers have introduced a multi-modal architecture for AI agents named Fuyu, with 8 billion parameters. This model adopts a decoder-only architecture capable of processing images and text, simplifying the network design, scalability, and deployment. Additionally, unlike most existing models, it accepts images of varying dimensions, rendering it a valuable addition for use in agents. The model can generate responses for sizable images in just 100 milliseconds. We are excited about the recent progress on AI agents for physical and online applications. While still early in commercialization, agents capable of independently interacting with their environment and executing complex tasks create many opportunities for new AI products and applications.

- Louie Peters — Towards AI Co-founder and CEO

Hottest News

OpenAI Halted the Development of the Arrakis Model

OpenAI’s plans for developing the AI model Arrakis to reduce compute expenses for AI applications like ChatGPT have been halted. Despite this setback, OpenAI’s growth momentum continues, with a projected annual revenue of $1.3 billion. However, they may face challenges with Google’s upcoming AI model Gemini and scrutiny at an AI safety summit.

2. ‘Mind-Blowing’ IBM Chip Speeds Up AI

IBM has developed a brain-inspired computer chip (NorthPole) that significantly enhances AI’s speed and efficiency by reducing the need to access external memory. NorthPole is made of 256 computing units, or cores, each of which contains its own memory.

3. NVIDIA Breakthrough Enables Robots To Teach Themselves

NVIDIA researchers created an AI agent called Eureka, which can automatically generate algorithms to train robots — enabling them to learn complex skills faster. Eureka-generated reward programs outperform expert human-written ones on more than 80% of tasks.

4. Fuyu-8B: A Multimodal Architecture for AI Agents

Adept introduced Fuyu-8B, a powerful open-source vision-language model designed to comprehend and respond to questions regarding images, charts, diagrams, and documents. Fuyu-8B improves over QWEN-VL and PALM-e-12B on 2 out of 3 metrics despite having 2B and 4B fewer parameters, respectively.

5. After the ChatGPT Disruption, Stack Overflow Laid Off 28 Percent of Its Staff

Stack Overflow is letting go of 28% of its employees due to advancements in AI technology like ChatGPT. Chatbots like ChatGPT provide efficient coding assistance and heavily rely on content from sites like Stack Overflow. However, an important question arises regarding the sustainability of chatbots that gather data without benefiting their sources.

Five 5-minute reads/videos to keep you learning

Transformer Math 101

This article provides essential numbers and equations for working with large language models (LLMs). It covers topics such as compute requirements, computing optima, minimum dataset size, minimum hardware performance, and memory requirements for inference.

2. Why LLaVa-1.5 Is a Great Victory for Open-Source AI

LLaVa-1.5, a smaller yet powerful alternative to OpenAI’s GPT-4 Vision, proves the potential of open-source models for Large Multimodal Models (LMMs). It emphasizes the significance of understanding multimodality in AI, debunking doubts about the feasibility of open-source approaches.

3. GPT-4 Vision Prompt Injection

Vision Prompt Injection is a vulnerability that allows attackers to inject harmful data into prompts via images in OpenAI’s GPT-4. This risks system security, as attackers can execute unauthorized actions or extract data. Defending against this vulnerability is complex and may affect the model’s usability.

4. GPT-4 is Getting Faster

GPT-4 is rapidly improving its response speed, particularly in the 99th percentile, where latencies have decreased. GPT-4 and GPT-3.5 maintain a low latency-to-token ratio, indicating efficient performance.

5. Introducing The Foundation Model Transparency Index

A team of researchers from Stanford, MIT, and Princeton has developed a transparency index to evaluate the level of transparency in commercial foundation models. The index, known as the Foundation Model Transparency Index (FMTI), assesses 100 different aspects of transparency, and the results indicate that there is significant room for improvement among major foundation model companies.

Papers & Repositories

BitNet: Scaling 1-bit Transformers for Large Language Models

BitNet is a 1-bit Transformer architecture designed to improve memory efficiency and reduce energy consumption in large language models (LLMs). It outperforms 8-bit and FP16 quantization methods and shows potential for effectively scaling to even larger LLMs while maintaining efficiency and performance advantages.

2. HyperAttention: Long-context Attention in Near-Linear Time

HyperAttention is a novel solution that addresses the computational challenge of longer contexts in language models. It outperforms existing methods using Locality Sensitive Hashing (LSH), considerably improving speed. It excels on long-context datasets, making inference faster while maintaining a reasonable perplexity.

3. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

This paper introduces a new framework called Self-RAG. It is an enhanced model that improves Retrieval Augmented Generation (RAG) by allowing language models to reflect on passages using “reflection tokens.” This improvement leads to better responses in knowledge-intensive tasks like QA, reasoning, and fact verification.

4. PaLI-3 Vision Language Models: Smaller, Faster, Stronger

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. It utilizes a ViT model trained with contrastive objectives, which allows it to excel in multimodal benchmarks.

5. DeepSparse: Enabling GPU-Level Inference on Your CPU

DeepSparse is a robust framework that enhances deep learning on CPUs by incorporating sparse kernels, quantization, pruning, and caching of attention keys/values. It achieves GPU-like performance on commonly used CPUs, enabling efficient and robust deployment of models without dedicated accelerators.

Enjoy these papers and news summaries? Get a daily recap in your inbox!

The Learn AI Together Community section!

Meme of the week!

Meme shared by sikewalk

Featured Community post from the Discord

G.huy created a repository containing code examples and resources for parallel computing using CUDA-C. It provides beginners with a starting point to understand parallel computing concepts and how to utilize CUDA-C to leverage the power of GPUs for accelerating computationally intensive tasks. Check it out on GitHub and support a fellow community member. Share your feedback and questions in the thread here.

AI poll of the week!

Join the discussion on Discord.

TAI Curated section

Article of the week

Practical Considerations in RAG Application Design by Kelvin Lu

The RAG (Retrieval Augmented Generation) architecture has been proven efficient in overcoming the LLM input length limit and the knowledge cutoff problem. In today’s LLM technical stack, RAG is among the bedstones for grounding the discussed application on local knowledge, mitigating hallucinations, and making LLM applications auditable. This article discusses some of the practical details of RAG application development.

Our must-read articles

Unlocking the Mysteries of Diffusion Models: An In-Depth Exploration by Youssef Hosni

Introduction to Machine Learning: Exploring Its Many Forms by RaviTeja G

QLoRA: Training a Large Language Model on a 16GB GPU by Pere Martra

If you want to publish with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.