AI Chat playground accuracy metrics

Bob

19 Nov 2024

14 Views (30 days)

6 Comments

Follow Post

You are now following this topic

You will see updates in your content feed.
You may receive emails, depending on your notification preferences.

Explore > Generative AI

Follow Channel

You are now following this channel

You will see updates in your content feed.
You may receive emails, depending on your notification preferences.

I was curious to startup your new AI Chat playground.

The first screen that popped up made the statement:

"Please keep in mind that AI sometimes writes code and text that seems accurate, but isnt"

Can someone elaborate on what exactly this means with respect to your AI Chat playground integration with the Matlab tools?

Are there any accuracy metrics for this integration?

More Actions

Share
Follow Post
Flag/Report
Delete

Pin this topic?

This action will pin this post making it appear at the top of the recent discussion pages in the community.

Permanently delete this topic?

This cannot be undone.

Share a link to this topic:

https://www.mathworks.com/matlabcentral/discussions/ai/877388-ai-chat-playground-accuracy-metrics

Flag or Report

Flag

Off Topic

Out of Date

Duplicate Post

Report

Spam or Advertisement

Violates our Terms of Service

Something else

Please provide additional details

Error Occurred

Unable to complete the action because of changes made to the page. Reload the page to see its updated state.

Transfer this topic to MATLAB Answers:

This action will cause the topic to be permanently closed and transferred to MATLAB Answers.

Provide at least one tag to group and locate related content. Separate tags with commas.

Feature this Topic

Select Community area(s) where you want to feature this:

All Community

MATLAB Answers

File Exchange

Cody

Blogs

Feature End Date:

End feature on this topic

Do you want to stop featuring this topic?

Bob

6 Comments

Time Descending

Time Ascending
Most Likes
Least Likes

Bob on 19 Nov 2024 (Edited on 19 Nov 2024)

More Actions

Flag/Report

Delete

Permanently delete this reply?

This cannot be undone.

Share a link to this reply:

https://www.mathworks.com/matlabcentral/discussions/ai/877388-ai-chat-playground-accuracy-metrics/2616366#reply_2616366

Flag or Report

Flag

Off Topic

Out of Date

Duplicate Post

Report

Spam or Advertisement

Violates our Terms of Service

Something else

Please provide additional details

Ok, no metrics on Matlab Plus AI LLM accuracy?

Let me understand what you are attempting todo here:

Integrate MATLAB math and scientific tools with another third party AI tool where it is known that the AI LLM result does not generate accurate results.
There are no published metrics on your AI integration.
The is no third party qualification on the accuracy and performance of the tool
You expect the community to debug your tool with no resolution on accuracy in sight?

Is it realistic to expect engineers to accept and use a tool that does not generate accurate results? and your Matlab AI LLM integration apparently generates random garbage that needs to be fact checked.

0 Replies

Bob

David on 19 Nov 2024 (Edited )

More Actions

Delete

Permanently delete this reply?

This cannot be undone.

Share a link to this reply:

https://www.mathworks.com/matlabcentral/discussions/ai/877388-ai-chat-playground-accuracy-metrics/2616365#reply_2616365

Flag or Report

Flag

Off Topic

Out of Date

Duplicate Post

Report

Spam or Advertisement

Violates our Terms of Service

Something else

Please provide additional details

@Robert Gastineau, the statement describes the fact that large language models are non-deterministic and can generate responses that are inaccurate. For example, MATLAB code in a response may appear correct, in that there are no obvious issues such as malformed statements. However, there can be function names, properties, or arguments that are made up (hallucinations). Having the code editor in the AI Chat Playground allows you to quickly analyze and run the code to verify if it's correct.

Language models are improving quickly and these types of issues are becoming less frequent.

1 Reply

David

Bob on 2 Dec 2024 (Edited on 2 Dec 2024)

More Actions

Flag/Report

Delete

Permanently delete this reply?

This cannot be undone.

Share a link to this reply:

https://www.mathworks.com/matlabcentral/discussions/ai/877388-ai-chat-playground-accuracy-metrics/2616733#reply_2616733

Flag or Report

Flag

Off Topic

Out of Date

Duplicate Post

Report

Spam or Advertisement

Violates our Terms of Service

Something else

Please provide additional details

Not only is your reply insulting, it is irresponsible. The known "general" accuracy for any given AI-LLM model's specific metric tests varies greatly. The can be as low as sub-40% but also as high as in the 90% range. 1 sigma or even 2 sigma accuracy for most engineering tools is unacceptable and AI-LLM's will never achieve 2 sigma without drastic augmentations. I am amazed that MATHWORKS is willing to sacrifice their credibility to support what appears to be a PR stunt to join in on the current AI hype.

1 Reply

Bob

David on 3 Dec 2024 (Edited )

More Actions

Delete

Permanently delete this reply?

This cannot be undone.

Share a link to this reply:

https://www.mathworks.com/matlabcentral/discussions/ai/877388-ai-chat-playground-accuracy-metrics/2616778#reply_2616778

Flag or Report

Flag

Off Topic

Out of Date

Duplicate Post

Report

Spam or Advertisement

Violates our Terms of Service

Something else

Please provide additional details

The AI Chat Playground exists in the online community for free, not in any paid MathWorks product. There are prominent disclaimers and it is not intended to contribute to someone's solution used in a production scenario.

Many users are happy with their experience using the Playground as it offers a way to experience what a large language model is capable of in the context of MATLAB programming. With a simple prompt and short conversation anyone can get responses that accelerates their understanding without having to read documentation or write code. LLMs are very capable (worth the hype IMO) and likely to become part of many productivity workflows.

I agree with you that at this time, big LLMs (that aren't that good with math yet) should be avoided or leveraged cautiously when precision is required.

1 Reply

David

Bob on 3 Dec 2024 (Edited )

More Actions

Flag/Report

Delete

Permanently delete this reply?

This cannot be undone.

Share a link to this reply:

https://www.mathworks.com/matlabcentral/discussions/ai/877388-ai-chat-playground-accuracy-metrics/2616784#reply_2616784

Flag or Report

Flag

Off Topic

Out of Date

Duplicate Post

Report

Spam or Advertisement

Violates our Terms of Service

Something else

Please provide additional details

you have a program that has 2 sigma accuracy. 1000000 people use the program daily. Each error the program produces causes 1 hour of wasted productivity. Within one year, what is the net expense costs of errors incurred for all people, if each person is getting paid $30 per hour?

To calculate the net expense cost of errors in terms of wasted productivity and hourly pay, we can build on our previous calculation.

Here's the breakdown:

Number of users per day: 1,000,000
Error rate: 4.55% (2 sigma accuracy means 95.45% accurate, so 100% - 95.45% = 4.55% error rate)
Number of errors per day: 1,000,000 * 0.0455 = 45,500 errors
Hours of wasted productivity per day: 45,500 hours
Number of days in a year: 365
Total hours of wasted productivity in a year: 45,500 * 365 = 16,607,500 hours
Hourly wage: $30 per hour
Total expense cost in one year: 16,607,500 hours * $30/hour = $498,225,000

So, the net expense cost of errors incurred for all users in one year, considering each person is paid $30 per hour, would be $498,225,000.

1 Reply

Bob

Walter Roberson on 4 Dec 2024 (Edited )

More Actions

Flag/Report

Delete

Permanently delete this reply?

This cannot be undone.

Share a link to this reply:

https://www.mathworks.com/matlabcentral/discussions/ai/877388-ai-chat-playground-accuracy-metrics/2616787#reply_2616787

Flag or Report

Flag

Off Topic

Out of Date

Duplicate Post

Report

Spam or Advertisement

Violates our Terms of Service

Something else

Please provide additional details

I would be astonished if the usage is anywhere remotely close to 1 million users per day. A couple of thousand users per day seems more plausible.

Each error the program produces causes 1 hour of wasted productivity.

Evidence, please!

The AI Chat playground is often used for exploration, to get ideas about how to approach problems, with the code not being executed at all. When the code is executed, the errors might or might not be obvious. It is likely that several iterations of code will be gone through. If the code problems are obvious (such as missing plots) then it does not take a hour to find that a problem exists (though it might possibly take an hour to find ways to rephrase the question so that the missing plot is also generated.)

0 Replies

Walter Roberson

Contributor

Remember to read the Community Guidelines

Add rating

Draft

Preview

Follow activity on this topic (change notification settings)

Account Required

You must sign in or create an account to perform this action.

Bob

Post

Replies

View profile

Posts by this author

AI Chat playground accuracy metrics

Discussion

Discussions

Explore

Followed

Highlights

General

Books

Fun

Generative AI

Ideas

MATLAB EXPO

Meta Cody

Meta Contests

SimBiology

Team Cool Coders

Team Creative Coders

Team Relentless Coders

ThingSpeak

Tips & Tricks

日本語

Channels

Highlights

General

Books

Fun

Generative AI

Ideas

MATLAB EXPO

Meta Cody

Meta Contests

SimBiology

Team Cool Coders

Team Creative Coders

Team Relentless Coders

ThingSpeak

Tips & Tricks

日本語

AI Chat playground accuracy metrics

You are now following this topic

You are now following this channel

Tags

Bob

Posts by this author

Discussions