Choose Your Weapon: Survival Strategies for Depressed AI Consultants

Kelvin Lu
Towards AI
Published in
12 min readSep 22, 2023

--

Photo by Sepehr Samavati on Unsplash

A new Terminator movie has recently been released. In this new episode, the future human resistance sends a robot back in time to destroy OpenAI’s server farm, thereby preventing the emergence of ChatGPT and AGI. This is not true, but how does it sound?

As an AI practitioner, how do you feel about the recent AI developments? Besides your excitement for its new power, have you wondered how you can hold your position in the rapidly moving AI stream?

The birth of ChatGPT has caused a stir. People are fascinated by its power but are also fearful about its unknown future. This bewildered love and hate-feeling is confusing the general public and AI professionals as well.

I came across a paper recently that is inspiring to read and is truly one of a kind [1]. Instead of focusing on technical aspects, the authors explore the anxieties experienced by researchers due to the challenges of the rapid advancement of AI.

It is interesting to know that academic researchers at the top institutes are also worried about “not coping with the current pace of AI advancement”. The paper reveals that researchers face the same resource limitations as professionals in the industry, which is not surprising because model training is getting so expensive. The authors proposed strategies for how to do research with limited resources.

While I appreciate the authors’ frankness, their suggestions are primarily aimed at AI researchers rather than practitioners. In this article, I will explore how we, as AI practitioners, can adjust to this challenge. Since this topic is seldom discussed, I like to be the icebreaker. I hope you will find the discussion inspiring; however, please note that these are my personal views, and I welcome additional perspectives.

Why It Is Important

In the past, people had a clear understanding of which technologies and tools to use for their machine learning tasks. They were familiar with the processes of topic modeling and sentiment analysis; they knew all the libraries they used; and they felt like conductors in a symphony. However, with the advent of LLM, everything has changed. LLMs seem to rule them all, and interestingly, no one knows how LLMs work. Now, people are questioning whether they should still develop solutions other than LLM but know little about how to make LLM-based solutions accountable.

There were significant distinctions between academic researchers, ML practitioners, and their clients. The researchers focused on the development of new concepts with all their fascinating academic magic; while clients were aware that ML consultants had expertise that they and their team lacked, the ML consultants took pride in delivering their unique contributions. Everyone was happy. However, emerging large language models have significantly changed the world.

Nowadays, generative AI makes advanced AI capability much more accessible to end users, while most academic researchers are not able to research new foundation models because training a new model is too expensive. The ways all people use LLMs are the same: mostly prompting, very rarely fine-tuning. They are all puzzled by the hallucination problem, and the solutions are all experimental. Advanced math skills are not very important in LLM development. Thus, all LLM users are solving the same problem using the same technology and achieving the same outcome. As a result, LLMs have brought academic researchers, practitioners, and end clients closer together.

ML consultants, who make a living by providing ML services may ask: How do we justify our profession in this scenario if the researchers are solving practical issues and the clients can use the cutting-edge AI tools on their own?

Photo by Gioele Fazzeri on Unsplash

The First Principle

Generative AI is such a vigorous field that new models, new products, and new theories are announced every week or two. It is not easy to keep up with the pace. But first things first, before we dive deep into each of the emerging developments, we need to go back to the first principle and think about what problem the technology is going to solve. This is a simple but effective strategy. There are so many new developments, and each one of them is trying to help certain people. If we apply this strategy, we can focus more on the technologies relevant to our goal.

One example is prompt engineering. Prompt engineering has proved to be very useful. Many techniques have been developed, like in-context learning, chain of thoughts, tree of thoughts, etc. Some people foresaw the emergence of prompt engineer as a new title. Is this the future of the ML engineer?

Let’s think about why prompt engineering has been developed. Then we’ll realize that people developed instruction-following LLMs to allow users to interact with LLMs in human language. Prompt engineering is mostly about structuring text in a way that can be better interpreted and understood by generative AI models. Although prompt engineering has a bunch of proven patterns that make it look technician-oriented, it is primarily aimed at helping low-experience users. The current complexity of prompt engineering is because we haven’t learned how to make the LLMs reliably understand the prompt. With further development, prompt engineering may become so mature that everyone can achieve satisfactory results with brief training. ML professionals should learn prompt engineering skills, but unfortunately, prompt engineering does not provide strong job security as its primary goal is to lower the threshold, not to maintain it. If you consider prompt engineering to be your core capability, the knowledge gap between you and your clients is getting narrower.

In addition to the CLI and GUI, someone has proposed the idea of NLI, or Natural Language user Interface. I do believe this is a good idea. If you share the same idea, you’ll agree that prompts must be available to everyone. It should not be a tightly held secret at all.

Knowing the Limitations

Generative AI has received a lot of praise, but it has its own limitations. The model is keen to hallucinate, not good at logical reasoning, and not easy to harness. All these features make LLM applications difficult to develop and present opportunities for us to provide professional services and make generative AI more useful to deliver business values. To achieve this, in addition to knowing what generative AI can do well, it is essential to know about the areas where generative AI falls short, how to evaluate its performance, and how to mitigate these problems. Building up this kind of know-how will be invaluable in the long run.

Let’s consider the RAG application as an example. RAG is an essential building block in the generative AI ecosystem. It’s the go-to option when we want to let the LLM base its reasoning on local data, and it’s the right option when we want to handle contents that are much larger than the prompt window. Numerous RAG systems have been developed, and many vector databases have been created to keep up with this trend. However, RAG also has its own pitfalls. In fact, building an impressive RAG demo is fairly easy, but transforming it into a production-ready system can be quite challenging. Being aware of these limitations will help us become accountable consultants.

Please go to the following posts to see relevant discussions if you want to know more about the RAG application: Disadvantages of RAG [2], considerations of applying vector databases [3], how to select and host embedding models [4], and how to fine-tune an embedding LLM [5].

Beyond the example of RAG, do you know how the LLM’s performance was evaluated? Do you know what a prompt injection is and what prompt leakage is? You may have noticed in your practice that prompts are sensitive to nuance, but are you interested in measuring the robustness of your prompt? These are all practical questions and are the focus of many researchers. Although there are no perfect solutions to the questions yet, being aware of the concerns and remediations can still help a lot in our solution design. I hope these questions inspire you to see that there’s so much we can do. All these questions and resolutions will build us into strong AI professionals.

Get the Big Picture

Another important strategy is to think of the generative AI application as a whole system, not a single model. We know that LLMs have a lot of imperfections. Some of the imperfections are inherent flaws of the current LLM structure. Only ground-breaking new structures can solve those issues. As Yann Lecun said, “LLMs are not the future.” While researchers are busy seeking alternative LLM architectures, we need to know that quite a few of the issues must be addressed with engineering solutions.

There’s an interesting article from the MIT Technology Review about Meta’s failed LLM project last year [6]. Meta released a new LLM to generate scientific insights. However, the model only lived for 3 days before it was pulled offline. What has happened?

After it was trained on 48 million scientific articles, textbooks, and lecture notes, the model could fabricate “scientific deep fakes”. It produced fake scientific papers and falsely attributed them to real researchers. It hallucinated fictional Wiki articles, like the “history of bears in space”. Imagine if a diligent scientist had used a fake reference that the LLM had provided.

This is no doubt a very common LLM hallucination problem. Eliminating LLM’s hallucinations is a tough battle. Even with all known remediations applied, we need to have a sense of what level of accuracy we can achieve. If the project requires zero tolerance for incorrect answers, a single LLM is probably not a good choice. Depending on the need, advanced solutions, like knowledge graph-backed ones, perform much better. The response would be more reliable and traceable, and the whole solution would be more manageable.

Check out the following leaderboard to get a sense of realistic accuracy for a single model [8].

Responsibility

Everyone would agree that the most significant problem with LLM is hallucinations. People have developed enormous strategies to control the problem. Besides hallucinations, are there any other issues that make decision-makers hesitate to apply generative AI to their systems? Let’s have a look at a few of them:

Explainability

Whenever we query for something, the LLMs always give us an answer. But how the models came up with that answer is unclear. How can we be sure how the model draws the response? Does it refer to any other reliable information, or is it just made up? To the RAG again: how can we know that the answer has included all important information or is incomplete?

Fairness

All LLMs are trained on a large dataset. They are designed to discover patterns in the data. In this context, LLMs are just statistical models. And all statistical models are biased towards the most popular patterns. A common problem is that they don’t perform well on the long tail.

Consider a model built on the demographic data of Australia, where 80% of the population lives in the top 8 major cities. It generates good overall performance; however, it overlooks the interests of regional people and aboriginals. In some cases, these biases can be a show-stopper, which has become more important recently because of rising concerns about AI safety.

Safety

Training a new LLM from scratch requires a large amount of training data and tonnes of money. Except for those who work for companies with very deep pockets, the majority of our LLM application developments have to be based on pre-trained foundation models. And this leads to two inherent risks:

  1. How can we know the training data is safe? It’s so easy to sneak in malicious data to create a backdoor.
  2. How can we be sure the generated model is safe? Even though the training data is clean and clear, the generated model may be sensitive to adversarial attacks. You may already know that a computer vision model can be fooled by adding invisible noise to the input image. LLMs are sensitive to attacks as well.

If we have to assume the training data and the model are unsafe, how can we detect, control, and manage the risk to prevent them from causing hazards?

Privacy and Data Security

The LLMs are merely machine learning models. And their knowledge was built on the large training dataset. They have no sense of what kinds of data should be disclosed without restriction, which data should only be provided to certain groups of users, or which data should not be disclosed at all. Then, how do we guardrail the applications? How do we audit the privacy and security of the training data, and do we instruct LLMs to forget something we don’t want them to remember?

Engineering is Our Home Field

Awareness of all the above will give us a sense of what should be included in our knowledge base and in our project proposals. Our ideation can’t go live without solid engineering.

There are plenty of blank fields in Generative AI production. Quite a few aspects are either new concepts or need to be upgraded from old MLOps practices. Traditional MLOps are more focused on making sure the model’s performance stays up to expectations, as well as the efficiency of training and the runtime environment. LLM applications do need efficient model hosting, fine-tuning, and runtime performance enhancement. In addition to that, we need to provide an engineering solution to each of the aforementioned concerns. These unique challenges make LLM operations more difficult to maintain than traditional MLOps can cope with.

Take LLM model management as an example: we need to learn how to host a model on a GPU cluster efficiently, which is a MLOps skill. We also need to know all the skills of LLM model evaluation, model tuning, how to effectively prepare the training data, how to control model bias with the help of human-in-the-loop, how to control model risk with RLHF, etc. The task list is much longer, and the technical stack is much more complicated. We need to be hands-on with all of them. These are ways we can help our clients.

Software developers have spent decades trying to find the best project management model. Numerous methodologies have been developed based on countless project failures. Eventually, people realized that communication was the key to project success. They found the Agile model to be the most efficient method, and Scrum became the de facto standard.

The ML projects are much more complicated to communicate and much harder to keep everyone on the same page; however, there is no machine learning project-specific best practice yet. My interpretation is that it indicates the infancy of machine learning applications. I saw that communication is pulling back projects again. I expect some guy will come up with new Scrum adjustments to make the ML project run smoother.

How can we deal with this challenge? My suggestion is to go over the boundary as much as you can. Don’t limit yourself to a certain scope. Your project will benefit from your ability to bridge the gap between stakeholders.

Parting Words

The emerging generative AI leads us into a blue sea of unlimited challenges and opportunities. At the moment, everything seems murky and fast-moving. There are so many things to learn, try, and solve every day. The good news is that, until the next ground-breaking innovation, the main Generative AI challenges are practical. Most of them are solving real-world problems and don’t require advanced math skills. If we can make a plan for ourselves and build up our expertise in a planned way, we will find catching up with the latest developments easier. We can keep making ourselves resourceful ML consultants, and we have the opportunity to contribute our practices back to the ML community. I’m not sure how long this opportunity may last, but this is just awesome!

As you can understand, this is a long journey. If you agree with my vision, please support me by subscribing and clapping, and I can share my progress with you.

--

--