The Art Of Negotiation: CICERO AI

Mandar Karhade, MD. PhD.
Towards AI
Published in
6 min readNov 29, 2022

--

Since the days of Deep Blue’s victory over grandmaster Garry Kasparov and recently OpenAI’s OpenAI Five dominating DOTA 2 and Pluribus poker-playing AI outsmarting humans, we have come a long way. the latest iteration of the gamer AI is the CICERO AI by Meta. This breakthrough AI plays the game of Diplomacy. The CICERO is groundbreaking because the complexity of the game of Diplomacy is high. It includes conversations with the players, allying-attacking, and many other actions that could be considered under the umbrella of “Negotiating”.

On Nov 22nd, Meta announced,

Today, we’re announcing a breakthrough toward building AI that has mastered these skills. We’ve built an agent — CICERO — that is the first AI to achieve human-level performance in the popular strategy game Diplomacy*. CICERO demonstrated this by playing on webDiplomacy.net, an online version of the game, where CICERO achieved more than double the average score of the human players and ranked in the top 10 percent of participants who played more than one game.

Perhaps, there are many similarities between Poker and Diplomacy. These games are about playing people than the cards or a hand. Diplomacy involves understanding social norms and playing into social skills. These games allow bluffing, which is a manifestation of the motivations, strategy, language, and reacting in a way to get other players to respond in a way that is beneficial to you.

CICERO AI can learn to negotiate in the game of Diplomacy better than humans
Credits: https://unsplash.com/@santesson89

The details on how the AI is constructed can be found in the research paper Human-level play in the game of Diplomacy by combining language models with strategic reasoning. The CICERO website can be found here. If you are interested in gaining access to CICERO’s data, please submit RFP here. The code for CICERO is available (GitHub).

HOW is CICERO built?

The language model supporting CICERO has been trained on 2.7B parameters. The data used for this model is the text from the internet on 40,000 games played on webDiplomacy.net. The dialogue generated using this model is called “controllable dialogue”. The plan for actions to interpret and plan the next negotiation by CICERO happens in 4 distinct steps:

  1. Forming initial prediction
  2. refining the prediction and intent
  3. Coming up with candidate messages as replies
  4. Filter replies to identify the one which has the most “Value” in the game
Step 1 Using the board state and current dialogue, Cicero makes an initial prediction of what everyone will do.
Step 1: CICERO makes an initial prediction — Credits: Facebook ai blog.
Step 2 CICERO iteratively refines that prediction using planning and then uses those predictions to form an intent for itself and its partner.
Step 2: CICERO refines prediction to form an intent — Credits: Facebook ai blog.
Step 3: CICERO generates candidate messages — Credits: Facebook ai blog.
Step 4: CICERO filters the output to maximize impact/output — Credits: Facebook ai blog.

OpenAI Five-like AI models were generated using self-play reinforcement learning (RL), where the game chooses random steps to propagate, ending with outcomes (either positive or negative). At each outcome, the model learns to correct itself to get the positive outcome over the negative, making it a highly iterative process where the game keeps learning better ways than itself to achieve the most positive outcome.

piKL model

The approach that CICERO uses is a hybrid approach between using purely supervised learning and unsupervised learning. The algorithm that Meta has developed is called piKL. At each step, CICERO predicts everyone’s policy based on the dialogue it has shared with other players. Then CICERO chooses a new policy that has the highest chance of winning the intent during that step, given predicted original policies. This is similar to a Baysian approach however, the dependency in the equation is predicted by the AI itself. First, by predicting the current policy of opposing players based on the current message to the opposing players and then utilizing those policy predictions to inform its own policies, CICERO avoids being tricked by very straightforward “lies” that result in a loss.

Generating the dialogue

The first task is generating a dialogue with other players in a way that can elicit a response to inform policy predictions is a critical task. By using a controlled vocabulary, CICERO is able to communicate clearly and strategize with other players.

CICERO AI negotiating

What does it mean for human-AI interaction?

Meta mentions that “The emergence of goal-oriented dialogue systems in a game that involves both cooperation and competition raises important social and technical challenges in aligning AI with human intentions and objectives.” Meta believes that they have made significant headway in this work to align language models with intentions. A part of this headway is to identify and prioritize intentions. Meta would like to keep this model open-sourced to allow other researchers to continue taking advantage of what they have achieved.

Negotiation Credits: https://unsplash.com/@headwayio

Closing thoughts: Trust is earned over time

Like many other articles that I have written so far, the progress in AI is impressive. Being able to generate text that can successfully have a semi-natural dialogue to achieve what the AI is programmed to do is an impressive technical achievement.

However and unfortunately, I had to use the headline “Trust is earned over time” for this section of closing thoughts. Currently, we are living in the time of the largest tech gap between generations. Boomers and Gen X did not grow up in a world that involved tech complexity. Expecting them to be able to identify and outsmart an AI/fake persona specifically made to negotiate to defeat you is unrealistic. Even Millennials and the generations born with tech gen-Z have suffered from unregulated and profit-driven social media proliferation.

Meta hopes that other researchers can build on its code “in a responsible manner” but can we really expect it? We need to invest in abilities to be able to identify and assess the social impact. This technology will likely be used to manipulate humans. I strongly believe that without appropriate guardrails, this technology will be misused.

--

--