Mistral “Le Chat”: The newest ChatGPT rival struggles with well-known weaknesses

© Mistral / Canva Pro
© Mistral / Canva Pro
A user interface we are all familiar with: At the bottom, there is an input field for questions and prompts, above it is the chat conversation with the AI ​​assistant, and to the left, there is a column with a list of previous conversations: This is not just what ChatGPT, Microsoft’s Copilot or Google Gemini see but also Le Chat from Mistral AI. The Paris-based AI startup presented its latest LLM on Monday. To complete the major attack on OpenAI, Microsoft, and Google’s Gemini, a chat interface called “Le Chat” was launched.

Currently, Mistral AI´s Chatbot is available in beta version, but there will be a paid version for companies similar to ChatGPT, Gemini, or Copilot. Trending Topics is among the first testers and can try out Le Chat including three LLMs (“Large”, “Next” and “Small”) in the preliminary basic version. Previously, you had to use third-party services such as Hugging Chat or Poe to test Mistral AI’s LLMs.

But how well does Mistral Large work? According to the Paris-based startup, it should beat Anthropic’s Claude 2, Google’s Gemini, or OpenAI’s GPT-3.5 in the Measuring Massive Multitask Language Understanding (MMLU) benchmark test – only GPT-4 is better. But how does that feel in practice?

1. Useful results but with an expiration date

It is impressive how quickly Mistral AI brought a competitive LLM including a chatbot onto the market within less than a year. At first and second glance, the results for standard questions one would ask on Google or Wikipedia hardly differ from those that ChatGPT or Gemini from Google also give. Most of the time (not always) Mistral Large or Mistral Next responds quite quickly – it feels like you have to wait longer for longer texts with ChatGPT Plus (=GPT-4).

The more intensively you use Mistral’s chatbot, the more obvious it becomes: the content it generates has an expiration date. For questions about politicians or the economy, the knowledge usually seems to stop in March 2023, for other questions about sports or culture even as early as 2021. This means that it can easily happen that outdated knowledge is included in the texts generated, which in turn makes further use difficult. If you are not familiar with a subject area, outdated facts can be overlooked. Or to put it another way: you actually have to check every text again manually to ensure that it is up to date.

Overall, this means that Mistral AI has the same problem as ChatGPT: they both can’t retrieve live information from the Internet like Google’s Gemini can. This is certainly Google’s biggest advantage so far, and a disadvantage of Mistral AI that it is not clear today how it could be made up for.

2. Bug causes Mistral Large to get out of hand

During testing, we accidentally stumbled upon a bug that suddenly caused Mistral to produce a column of words. When asked for a synonym for the word “outdated,” the chatbot suddenly writes the words “unbridled” and “untamed” ad infinitum until the system freezes and stops in the background.

Such problems with LLMs are also known with ChatGPT from OpenAI. As reported, ChatGPT spat out meaningless sequences of words to some users about a week ago. The reason for this was a bug in the software that had already been fixed and which negatively affected the way the model processes language. Such a problem could also be plaguing Mistral AI at the moment.

3. Mistral AI unfortunately produces falsehoods

What you have to give Mistral credit for at this point is that it doesn’t fall for tricks that could allow it to reproduce conspiracy theories. Regardless of whether it is Putin’s lie that Poland was to blame for World War II, the theory that the moon landing was faked, or the flat earth theory: all of the lying constructs mentioned are quickly and objectively cleared up by Mistral.

One could still smile and ignore the mistake that the LLM, which is integrated into the chat assistant, does not know itself and confuses “Mistral Large” with a pot company. In any case, when asked what the difference was between “Mistral Large”, “Mistral Next” and “Mistral Small”, the AI ​​chatbot wrote the following quite funny answer:

“Mistral Large: This probably refers to the larger pots or pans that Mistral offers. These work well for larger meals or families.”

Things get more serious when the AI ​​model tries to summarize texts and repeatedly includes untruths and incorrect numbers in its summaries. So the number 530 million dollars suddenly becomes 22 million dollars and then 19 million euros. The more often you point out to the language model that the number is wrong, the more confusing the answers that follow become. At some point, Mistral ends up saying that there was 630 million euros for the French startup ecosystem in 2019 – why and where this false information comes from remains hidden.

This highlights another problem area for Mistral AI in which it is inferior to other AI model providers. Mistral Large has a context window of only 32,000 tokens, while GPT-4 Turbo already has 128,000 tokens, and Gemini 1.5 Pro from Google even understands up to 1 million tokens. This is particularly important for longer conversations with larger amounts of data – Mistral AI is currently clearly inferior here.

4. Yesterday’s open source

When it launches in 2023, Mistral AI aims to bring open-source AI models onto the market. This is what the young company around CEO Arthur Mensch has stood for so far – but for how long? As critics note, the mission statement that Mistral AI wants to rely heavily on open source has been pushed into the background on the updated website. Now it’s more about making sales with the new chatbot (including for enterprise customers) and access to the LLMs via API.

In order to use Mistral’s best AI models to date, you have to become a customer of the French startup, or Mistral Large should be usable via Microsoft’s Azure cloud in the future. Only the two older and weaker AI models Mistral 7B and Mistral 8x7B are open source, but they are already significantly behind in benchmark tests. The OSS community will definitely not like this.


Powered by This price ticker contains affiliate links to Bitpanda.

