More than half a decade after Microsoft’s truly monumental Taye debacle, the incident still stands as a stark reminder of how quickly an AI can be damaged after exposure to the powerful toxicity of the Internet and a warning against building bots without sufficient robust behavioral chains. On Friday, Meta’s AI Research division will see if the latest version of Blenderbot AI can withstand the horrors of the interwebs with the public demo release of its 175 billion-parameter Blenderbot 3.
A major obstacle currently facing chatbot technology (and the natural language processing algorithms that power them) is sourcing. Traditionally, chatbots are trained in very well curated environments – otherwise you’ll always get a Taye – but that limits the topics it can discuss to those specific ones available in the lab. Conversely, you can have the chatbot pull information from the internet to access a wide variety of topics, but could go completely Nazi at some point.
“Researchers cannot possibly predict or simulate every conversation scenario only in research settings,” Meta AI researchers wrote in a blog post Friday. “The AI field is far from truly intelligent AI systems that can understand, engage and chat with us like other people can. To build models that can be better adapted to the real world, chatbots need to learn from a diverse, broad perspective with people ‘in the wild’.”
Meta has been working to address the issue since it first introduced the BlenderBot 1 chat app in 2020. Initially little more than an open-source NLP experiment, by the next year BlenderBot 2 had learned both to remember information it had discussed in previous conversations and how to search the web for additional details on a particular topic. BlenderBot 3 takes it one step further by evaluating not only the data it pulls from the Internet, but also the people it speaks to.
When a user registers an unsatisfactory response from the system—currently hovering around 0.16 percent of all training responses—Meta processes the user’s feedback back into the model to avoid repeating the error. The system also uses the Director algorithm that first generates a response using training data, then runs the response through a classification to verify that it fits within a user feedback-defined scale of right and wrong.
“To generate a sentence, the language modeling and classification mechanisms must match,” the team wrote. “Using data indicating good and bad answers, we can train the classifier to penalize bad, toxic, contradictory or repetitive statements and statements that are generally useless.” The system also uses a separate user weighting algorithm to detect unreliable or ill-intentioned responses from the human interlocutor – essentially, the system learns not to rely on what that person has to say.
“Our live, interactive, public demo enables BlenderBot 3 to learn from organic interactions with all kinds of people,” the team wrote. “We encourage adults in the United States to try the demo, have natural conversations about topics of interest, and share their responses to advance the research.”
BB3 is expected to speak more naturally and conversationally than its predecessor, thanks in part to the vastly improved OPT-175B language model, which is nearly 60 times larger than BB2’s model. “We found that, compared to BlenderBot 2, BlenderBot 3 provides a 31 percent improvement in overall call task rating, as evaluated by human assessments,” the team said. “It’s also rated as twice as knowledgeable, while being factually incorrect 47 percent less of the time. Compared to GPT3, it turns out to be more up-to-date on topical questions 82 percent of the time and more specifically, 76 percent of the time.”
All products recommended by Engadget have been selected by our editorial team, independent of our parent company. Some of our stories contain affiliate links. If you buy something through one of these links, we may earn an affiliate commission.