This AI newsletter is all you need #67

Published in

Towards AI

8 min readOct 3, 2023

What happened this week in AI by Louie

With recent developments, it has become increasingly clear that Large Language Modes (LLMs) are now becoming far more than chatbots. We were interested to read renowned AI figure Andrej Karpathy’s thoughts on this in a recent tweet where he sees LLMs becoming the kernel processes of a new operating system. In his view, thinking of LLMs as “just chatbots” is analogous to thinking of early computers as “just calculators”. This distinction becomes even more apparent with LLMs’ evolving capabilities, such as their capacity to operate in a multi-modal context, their ability to access various tools and the Internet, their aptitude for interpreting and executing code, and their versatile application as embedding databases. We find this analogy and the carry over of other computing concepts to LLMs intriguing and can help us form a better picture of how the technology could continue to develop.

While closed source LLMs have been busy rolling out new functionality such as ChatGPT’s new updates to accommodate multi-modality, we have also been noting several promising new open-source competitors in the field. Alibaba, for instance, has open-sourced their Qwen model, along with its chat variant with 7 billion parameters, in a move aimed at competing with Meta. This also signifies the very first instance of a prominent Chinese company publicly releasing a model. Also new to the open source LLM race, Mistral, a startup that secured over $100 million in funding just six months ago has now publicly released the Mistral-7B model under the Apache 2.0 license. This model demonstrates superior performance across various benchmarks when compared to both LLaMA 2 13B and LLaMA 1 34B. It achieves CodeLlama 7B-level performance in coding tasks while maintaining strong proficiency in English-related tasks. The model is freely available for download with no restrictions.

- Louie Peters — Towards AI Co-founder and CEO

Announcing our Free Certification Course on Training & Fine-Tuning LLMs for Production

We’re excited to release Towards AI’s second free certification course on Training and fine-tuning LLMs for Production in collaboration with Activeloop and Intel Disruptor Initiative. In this course, you will cover the intricacies of training, fine-tuning, and seamlessly integrating these models into AI products. This course will guide you in building the most cost-efficient software and hardware stack and state-of-the-art methods for preparing LLMs for production. It will also cover essential topics such as proprietary versus open-source models, various LLM training methodologies, and production deployment strategies. We also touch upon advanced fine-tuning techniques like LoRA, QLoRA, SFT, and RLHF or training custom models with Cohere. With the support of our partners at Cohere and Lambda, qualifying participants will receive compute credits to be able to run the examples themselves!

The course will go live on 9th October 2023. Pre-register today!

🚀 Join the AI Revolution with our friends at Alep! 🚀

Alep is shaping the future role of professional investment analysts by removing repetitive tasks and amplifying research capabilities, all anchored on the vision that LLMs will revolutionise the analyst role. As a nascent startup, Alep’s MVP is in motion and they are gearing up to bring on their first customers. While UK-rooted, Alep are open to remote flexibility with a touch of essential in-person collaboration.

Alep AI is founded by Matty Cusden-Ross, who was previously the CEO & Founder of fintech Flux which raised £10mn+ and saw integration into Barclays, Tier 1 retailers and a user base of 1.3mn! Prior to Flux Matty was the 2nd employee at Revolut where he drove user growth from 0 to 100k and tapped a 400k+ growth channel.

Alep are on the hunt for a visionary Technical Lead or CTO — could it be you? Apply here and shape the future of financial analysis!

Hottest News

Mistral 7B

The Mistral AI team released the Mistral 7B model, the most potent language model for its size to date. Powered by Grouped-query attention (GQA) and Sliding Window Attention (SWA), it outperforms other models in various domains while maintaining strong performance in English and coding tasks.

2. OpenAI’s ChatGPT Now Searches the Web in Real Time — Again

OpenAI has reintroduced web searching for ChatGPT, available to Plus and Enterprise users, allowing users to generate answers to prompts that search the web for the latest information. Some crucial updates include compliance with robots.txt rules and user agent identification, giving websites more control.

3. LLM Startup Embraces AMD GPUs, Says ROCm Has ‘Parity’ With Nvidia’s CUDA Platform

A startup called Lamini uses over 100 AMD Instinct MI200 GPUs and found that AMD’s ROCm software platform “has achieved software parity” with Nvidia’s dominant CUDA platform. They claim the computing costs are ten times cheaper than Amazon Web Services.

4. How General AI Will Eventually Reshape Everything

The transition to Artificial General Intelligence (AGI) signifies more than a change in terminology; it represents a significant leap in capabilities. AGI tackles the core challenge posed by AI — machines that excel at tasks but lack generalization abilities. With AGI, we can unlock doors to understanding and problem-solving.

5. Chroma X Google for Building on PaLM

Chroma is a leading vector database for storing embeddings for AI applications. Chroma has integrated with PaLM embeddings throughout the beta and is partnering with Google AI on the public release.

Five 5-minute reads/videos to keep you learning

First Impressions With GPT-4V(ision)

OpenAI has released GPT-4V for Plus users, showcasing its image processing skills, OCR capabilities, and performance in solving mathematical problems. This guide shares the first impressions of the GPT-4V image input feature with a series of experiments to test the functionality of GPT-4V, showing where the model performs well and where it struggles.

2. Non-Engineers Guide: Train a LLaMA 2 Chatbot

This tutorial presents how anyone can build their open-source ChatGPT without writing a single line of code. With tools like AutoTrain, ChatUI, and Spaces, even non-ML specialists can create advanced ML models, fine-tune LLMs, and easily interact with open-source LLMs.

3. How To Prompt Like a Pro?

This video tutorial focuses on why large language models react differently to the same prompts. It covers tokenizers, embeddings, inference parameter optimization, and some excellent prompting techniques.

4. Is AI a Platform Shift?

Think of a platform shift as a change in the dominant layer that applications are built on. This article presents AI as a possible platform shift with different hypotheses to support it.

5. Student Use Cases for AI

While Generative AI tools and LLMs empower students and educators with advanced technology, they also present challenges like the need for user verification and potential biases. This article examines student use cases such as AI as a feedback generator, personal tutor, team coach, and learner.

Papers & Repositories

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

QA-LoRA, a new method in quantization-aware training, outperforms QLoRA in terms of efficiency and accuracy. It balances the trade-off between quantization and adaptation, leading to minimal loss of accuracy. It is especially effective in low-bit quantization scenarios like INT2/INT3 without the need for post-training quantization, and it can be applied to various model sizes and tasks.

2. Small-Scale Proxies for Large-Scale Transformer Training Instabilities

A study has discovered that instabilities in training large Transformer-based models can be detected in advance by analyzing activations and gradient norms. These instabilities, which occur in smaller and larger models with higher learning rates, can be mitigated using strategies employed in large-scale settings.

3. Language Models in Molecular Discovery

Language models are being used in chemistry to accelerate the process of molecule discovery and show potential in early-stage drug research. These models assist in de novo drug design, property prediction, and reaction chemistry, offering a faster and more effective approach to the field. Moreover, open-source language modeling software enables scientists to easily access and advance scientific language modeling, facilitating quicker chemical discoveries.

4. Tabby: Self-Hosted AI Coding Assistant

Tabby is a fast and efficient open-source AI coding assistant compatible with popular language models. It supports CPU and GPU for coding tasks and offers swift coding experiences.

5. The Reversal Curse: LLMs Trained on “A Is B” Fail To Learn “B Is A”

Researchers have discovered a phenomenon called the “Reversal Curse” that affects the generalization abilities of auto-regressive LLMs. These models struggle to infer the reverse of a fact, hindering their ability to answer related questions accurately. Even larger models like GPT-3.5 and GPT-4 face challenges in addressing this issue, indicating a need for further advancements in language modeling.

Enjoy these papers and news summaries? Get a daily recap in your inbox!

The Learn AI Together Community section!

Meme of the week!

Meme shared by r1gell

Featured Community post from the Discord

Bri built Galactic, a cleaning and curation tool for massive unstructured text datasets. It’s designed to help you curate fine-tuning datasets, create document collections for retrieval-augmented generation (RAG), and even perform deduplication of web-scale datasets for LLM pre-training. Check it out on GitHub and support a fellow community member! Share your feedback and questions in the thread here.

AI poll of the week!

Join the discussion on Discord.

TAI Curated section

Article of the week

Temporal Edge Regression with PyTorch Geometric by Marco Lomele

Graphs are becoming one of the favorite tools of data scientists. Graphs can be adapted to temporal scenarios. They can vary from the simple static form, where there is no notion of time, to a fluid spatiotemporal setup, where the topology is fixed. Still, the features change at regular intervals to the chaotic, fully continuous, and time-dynamic mode, where everything can change at any time. This article studies the effectiveness of combining GNNs and Transformers for time series forecasting on graphs.

Our must-read articles

Smart Control of Traffic Lights Using AI by Mihir Gandhi

Advanced Matplotlib: A Comprehensive Guide to Data Visualization by RaviTeja G

If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.