Searching for a Prompt Engineer with 10+ Years of Experience?

Who could be a great fit for this new hyped position

Evgeniya Sukhodolskaya
Towards AI

--

Image by author

Prompt engineering has been around for a while, but the term has gone viral with the recent boom in large language models (LLMs). The main driver of the phenomenon is that zero-shot and few-shot learning work well only with language models of a very large size.

The hype around prompt engineering was inevitable — after all, prompt wording can have a dramatic effect on the quality of LLM output (LLMs exhibit large variance over different prompt templates). It is also an effective way to downstream language models without retraining any model parameters or gathering data required for traditional fine-tuning.

The research literature suggests a variety of promising prompt engineering techniques. But we have yet to discover a universal “find the best prompt for a certain task” algorithm suitable for every LLM. We have to experiment with each specific model and task to find a prompt that produces the desired output. That’s why the “Prompt Engineer” job became a thing.

Prompt Engineers

Image by author

As it happens with every hype position in IT (speaking from my personal work experience as a Developer Advocate), there is a lot of confusion for both employers and employees about what a Prompt Engineer has to know. The confusion is made worse by the common perception that prompt engineering is a process of reformulating questions to an oracle until you get a desired answer, something like a patient parent doing math homework with their kid. On the contrary, prompt engineering is a very deliberate procedure. Even more importantly, “prompt engineering is not just about designing and developing prompts. It encompasses a wide range of skills and techniques that are useful for interacting and developing with LLMs.”

There are plenty of helpful guides, courses and articles with techniques and best practices of prompt engineering, which I used myself to get the hang of it. And in most of them, based on my experience, you will see the same recommendation: good prompts are born out of experimentation, so you need to establish your own process of designing a prompt and develop a sixth sense for prompt engineering. Obviously, some of the intuition can come from working experience as a Machine Learning Engineer, which helps develop a deeper understanding of how LLMs are built and trained. But there are some less obvious professions that can naturally develop strong intuition in prompt engineering. One of them is a Crowd Solution Architect (CSA).

Crowd Solution Architects

Image by author

It might seem like I’m replacing one trendy, misunderstood IT job title with another one that is just as baffling and newfangled. Well, it’s actually not that new — Crowdsourcing Solution Architects have been around since the emergence of crowdsourcing platforms, so CSAs can easily have 15+ years of experience. The industry already has a solid understanding of the professional scope of a CSA.

So, what are their day-to-day tasks, and why can their experience be helpful for developing intuition in prompt engineering? To put it simply, CSAs design data labeling tasks for a crowd (expert or non-expert), converting initial specifications to a format that is easy for labelers to understand. This includes writing detailed instructions, creating a user-friendly labeling interface, setting up a fair pricing scheme and quality-control mechanisms, and, most importantly, applying sophisticated decomposition techniques to an initial problem. Decomposition is an important skill that turns a hard task into a set of easier subtasks. An example is to transform a ranking problem into a side-by-side labeling task followed by noisy Bradley-Terry aggregation. The subtasks are solved by the crowd, and then CSAs apply various aggregation techniques to get the final answer to the problem.

Image by author

Why does designing crowdsourcing tasks help develop intuition in prompt engineering

Image by author

While experimenting with prompt engineering, I noticed similarities between best practices in designing crowdsourcing tasks and best practices in prompt engineering. This led me to develop and test several hypotheses on how CSA methods could be used to improve prompt results. I strongly believe that new prompting techniques may be discovered by applying knowledge from adjacent areas of research, such as crowdsourcing.

Decomposition + Aggregation

Techniques like chain-of-thought prompting and least-to-most prompting demonstrate that a model performs better if a task is divided into subtasks or steps. Decomposition works the same way for crowdsourcing tasks by breaking them down into smaller problems that are much easier to solve. It is intuitively easy to understand why such a technique helps to increase human labeling quality — it makes it easier to concentrate and requires less expertise from labelers. For LLMs, it might work well because the subtasks might occur more frequently in LLM training datasets scraped from the web.

CSAs have learned in practice how to decompose tasks for search relevance estimation, 3D transport labeling, ad moderation, personal information (PI) recognition in code, and many other use cases. They have developed a strong intuition for what is hard and what is easy to solve with a crowd. This intuition can be partially captured in a set of best practices. For example, there is a rule of thumb that the number of classes in classification tasks shouldn’t exceed 5 or 6; otherwise, the choice overload bias comes into play. Another rule is that class elimination, one by one, from simple to complex, will provide better results than labeling classes at the same time. Can these rules be applied for labeling with LLMs? It seems so! We applied the latter rule to labeling with GPT-4, and the results showed 20% higher accuracy.

Instructions

One of the keys to getting good results with crowdsourcing is to provide clear and detailed instructions (basically, a detailed prompt for humans). Since LLMs are incapable of reasoning, giving them the perfect instructions for people might not lead to the best output quality, as shown in the article The Turking Test: Can Language Models Understand Instructions?

However, some best practices for prompt engineering and instruction design for crowdsourcing match one-to-one, proving that a skilled CSA could make a promising prompt engineer. Here are some practices that apply to prompts and instructions:

  • Short does not equal good. Be as clear as possible.
  • Since LLMs are few shot learners like we are, examples matter a lot.
  • If there are multiple possible classes in the task, provide enough examples (at least 2–3) to illustrate each of them.
  • Use real-life (not synthetic) examples for your few-shots.
  • If you do add rare cases to the task, make sure to explain them well in the instructions.

Key takeaways

  • There is a promising unexplored research field for applying CSA techniques to prompt engineering.
  • Intuition in prompt engineering certainly could be gained through other work experience.
  • At Toloka, we successfully use our CSAs as prompt engineers. If you happen to have someone on your team who worked with crowdsourcing, you don’t have to search for a prompt engineer!

--

--

Data Advocate, pessimistic extravert, love NLP, heated conversations & data-centric approach:) “Data Engineering & Analytics” master, TU Munich