Generative AI
Follow


AI Chat playground accuracy metrics

Bob Gastineau on 19 Nov 2024
Latest activity Reply by Walter Roberson on 4 Dec 2024

I was curious to startup your new AI Chat playground.
The first screen that popped up made the statement:
"Please keep in mind that AI sometimes writes code and text that seems accurate, but isnt"
Can someone elaborate on what exactly this means with respect to your AI Chat playground integration with the Matlab tools?
Are there any accuracy metrics for this integration?
Bob Gastineau
Bob Gastineau on 19 Nov 2024 (Edited on 19 Nov 2024)
Ok, no metrics on Matlab Plus AI LLM accuracy?
Let me understand what you are attempting todo here:
  1. Integrate MATLAB math and scientific tools with another third party AI tool where it is known that the AI LLM result does not generate accurate results.
  2. There are no published metrics on your AI integration.
  3. The is no third party qualification on the accuracy and performance of the tool
  4. You expect the community to debug your tool with no resolution on accuracy in sight?
Is it realistic to expect engineers to accept and use a tool that does not generate accurate results? and your Matlab AI LLM integration apparently generates random garbage that needs to be fact checked.
David
David on 19 Nov 2024
@Robert Gastineau, the statement describes the fact that large language models are non-deterministic and can generate responses that are inaccurate. For example, MATLAB code in a response may appear correct, in that there are no obvious issues such as malformed statements. However, there can be function names, properties, or arguments that are made up (hallucinations). Having the code editor in the AI Chat Playground allows you to quickly analyze and run the code to verify if it's correct.
Language models are improving quickly and these types of issues are becoming less frequent.
Bob Gastineau
Bob Gastineau on 2 Dec 2024 (Edited on 2 Dec 2024)
Not only is your reply insulting, it is irresponsible. The known "general" accuracy for any given AI-LLM model's specific metric tests varies greatly. The can be as low as sub-40% but also as high as in the 90% range. 1 sigma or even 2 sigma accuracy for most engineering tools is unacceptable and AI-LLM's will never achieve 2 sigma without drastic augmentations. I am amazed that MATHWORKS is willing to sacrifice their credibility to support what appears to be a PR stunt to join in on the current AI hype.
David
David on 3 Dec 2024
The AI Chat Playground exists in the online community for free, not in any paid MathWorks product. There are prominent disclaimers and it is not intended to contribute to someone's solution used in a production scenario.
Many users are happy with their experience using the Playground as it offers a way to experience what a large language model is capable of in the context of MATLAB programming. With a simple prompt and short conversation anyone can get responses that accelerates their understanding without having to read documentation or write code. LLMs are very capable (worth the hype IMO) and likely to become part of many productivity workflows.
I agree with you that at this time, big LLMs (that aren't that good with math yet) should be avoided or leveraged cautiously when precision is required.
Bob Gastineau
Bob Gastineau on 3 Dec 2024
you have a program that has 2 sigma accuracy. 1000000 people use the program daily. Each error the program produces causes 1 hour of wasted productivity. Within one year, what is the net expense costs of errors incurred for all people, if each person is getting paid $30 per hour?
To calculate the net expense cost of errors in terms of wasted productivity and hourly pay, we can build on our previous calculation.
Here's the breakdown:
  1. Number of users per day: 1,000,000
  2. Error rate: 4.55% (2 sigma accuracy means 95.45% accurate, so 100% - 95.45% = 4.55% error rate)
  3. Number of errors per day: 1,000,000 * 0.0455 = 45,500 errors
  4. Hours of wasted productivity per day: 45,500 hours
  5. Number of days in a year: 365
  6. Total hours of wasted productivity in a year: 45,500 * 365 = 16,607,500 hours
  7. Hourly wage: $30 per hour
  8. Total expense cost in one year: 16,607,500 hours * $30/hour = $498,225,000
So, the net expense cost of errors incurred for all users in one year, considering each person is paid $30 per hour, would be $498,225,000.
Walter Roberson
Walter Roberson on 4 Dec 2024
I would be astonished if the usage is anywhere remotely close to 1 million users per day. A couple of thousand users per day seems more plausible.
Each error the program produces causes 1 hour of wasted productivity.
Evidence, please!
The AI Chat playground is often used for exploration, to get ideas about how to approach problems, with the code not being executed at all. When the code is executed, the errors might or might not be obvious. It is likely that several iterations of code will be gone through. If the code problems are obvious (such as missing plots) then it does not take a hour to find that a problem exists (though it might possibly take an hour to find ways to rephrase the question so that the missing plot is also generated.)

Tags

No tags entered yet.