Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Code formatting in the forum

Asked by Jan Simon on 4 Apr 2013

Although this forum is online in the 3rd year now and thousands of examples can be found, it is still a tedious task to suggest beginners to format their code. The experienced contributors have explained the procedure thousands of times, and less than a hand full of the beginners found the time to thank them for this.

The problem has been mentioned exhaustively in the wish-list already. It shouldn't be complicated to solve this problem by adding explicit instructions for the first 5 times users post a question. Obviously neither the "{} code" nor the "? Help" button encourage people to learn the basics in the forum. But I'd hope that they spend the time to read text instructions like:

 Formatted code is a core feature of this forum. Insert a blank line before and after the code and start each line with at least 2 spaces.
 Follow the "? help" button to learn more.

And when this message disappears after the 5th posting, it could even get a red background and some flashing effects.

This would be much more efficient than letting the editors and other diligent users do this ungrateful job.

5 Comments

Cedric Wannaz on 6 Aug 2013

By the way, why two spaces? I find that, for single level code, I prefer one space so the code is a little shifted.

Evan on 6 Aug 2013

I agree that more voting would better utilize the whole point of the "reputation" system. It seems like the community on other help-forums use voting much less sparingly, while here 0 or 1 is the most common score for even excellent answers.

Oftentimes, I notice that a user has submitted a very detailed and on-point response to someone's question and think to myself its a shame that the answer was never accepted. It's only lately that I'm realizing that, even though I'm not the OP, I actually have the ability to at least give the author some sort of feedback/credit for their effort.

Jan Simon on 15 Aug 2013

@Evan: You can accept a question of another contributor above a limit of 500 reputation points, when the author of a question did not select one for 7 days. Currently 48 users have reached this level. But frequent voting will increase the number.

Jan Simon

Products

No products are associated with this question.

5 Answers

Answer by Cedric Wannaz on 6 Aug 2013
Edited by Cedric Wannaz on 8 Aug 2013

EDIT @ 4:30pm EST: strfind -> regexp with neg. look behind for avoind matching nbsp;.

Here is a simple crawler. It is not my original idea, which was a mechanism at Mathworks level and not at a user (one of us) level. I implemented a few criteria which are not those listed above, as the crawler has to work with content that was already parsed and "preformatted" by the forum.

The criteria implemented should be improved. Typically, the function call(s)/def(s) detection is too "simple" and generates false positive when users write function names followed by parentheses in normal text.

Anyhow, this is just a simple demo.

The whole code below (both functions) should be saved in forumCrawler.m, and you can set pageDepth to control how many forum pages you want to process.

----------------------------------------------------------------------------------------------------------------

 function forumCrawler
    pageDepth = 1 ;
    baseURL   = 'http://www.mathworks.com' ;
    for pageId = 1 : pageDepth
        fprintf('\n=== Processing page %d..\n', pageId) ;
        url = sprintf('%s/matlabcentral/answers/?page=%d', baseURL, pageId) ;
        thread  = regexp(urlread(url), '(?<=<h3><a href=").*?(?=")', 'match') ;
        nThread = length(thread) ;
        for tId = 1 : nThread
            fprintf(' - Analyzing thread %d/%d..\n', tId, nThread) ;
            url = sprintf('%s%s', baseURL, thread{tId}) ;
            htmlBuffer = urlread(url) ;
            % - Scan question.
            question = regexp(htmlBuffer, ...
               '(?<=class="question-body">).*?(?=</div>)', 'match') ;
            [tf, msg] = isLikelyUnformatted(question{1}) ;
            if tf
                fprintf('   [<a href="%s">question</a>] %s.\n', url, msg) ;
            end
            % - Scan answers.
            answer = regexp(htmlBuffer, ...
               '<div id="([^"]+)" class="answer-body">(.*?)</div>', 'tokens') ;
            for cId = 1 : length(answer)
                [tf, msg] = isLikelyUnformatted(answer{cId}{2}) ;
                if tf
                    answerUrl = sprintf('%s#%s', url, answer{cId}{1}) ;
                    fprintf('   [<a href="%s">answer</a>  ] %s.\n', ...
                       answerUrl, msg) ;
                end
            end
            % - Scan comments.
            comment = regexp(htmlBuffer, ...
               '<div id="([^"]+)" class="comment-body">(.*?)</div>', 'tokens') ;
            for cId = 1 : length(comment)
                [tf, msg] = isLikelyUnformatted(comment{cId}{2}) ;
                if tf
                    commentUrl = sprintf('%s#%s', url, comment{cId}{1}) ;
                    fprintf('   [<a href="%s">comment</a> ] %s.\n', ...
                       commentUrl, msg) ;
                end
            end
        end
    end
 end
 function [tf, msg] = isLikelyUnformatted(content)
    tf = true ;
    % Eliminate content within <pre>..</pre> and <tt>..</tt> tags,
    % so we work on what is meant to be text.
    buffer  = regexp(content, '<pre.*?</pre>', 'split') ;
    content = [buffer{:}] ;
    buffer  = regexp(content, '<tt.*?</tt>', 'split') ;
    content = [buffer{:}] ;
    % Check for a few indicators.
    if ~isempty(regexp(content, '\w:\w', 'ONCE'))
        msg = 'range def. found' ;  return ;  end
    if ~isempty(regexp(content, '\w(', 'ONCE'))
        msg = 'function call(s)/def(s) found' ;  return ;  end
    if ~isempty(regexp(content, '(?<!nbsp);</p>', 'ONCE'))
        msg = '";</p>" found' ;  return ;  end
    tf  = false ;
    msg = '' ;
 end

4 Comments

Cedric Wannaz on 6 Aug 2013

Thank you Jan, but it does nothing but showing which threads might contain an unformatted question/answer/comment. Then a forum editor still has to do all the work ;-) This is why, I think, the mechanism should be implemented at the forum level with some big warning message strongly suggesting to format the code (so editors don't have this as a burden).

Evan on 8 Aug 2013

This is a really slick little function. And if something similar were implemented on TMW's end, even false positives would be pretty harmless. I think we'd still end up with people neglecting formatting (after all, nowadays popup dialogs and warning messages are either 1) meant to be ignored or 2) an exercise for honing your ability to quickly close windows). Still, it's a simple enough feature that it's worth having.

Cedric Wannaz on 8 Aug 2013

I thought about it a little more and, somehow, I wouldn't mind having automatically an intermediary page when we submit a question (not for comments or answers, but for questions only) with a big read message reminding about formatting and displaying the post as a preview. We don't post that many questions finally, so it wouldn't be annoying.

I think that this mechanism is light enough so it wouldn't take Mathworks that much time/work to implement.

Cedric Wannaz
Answer by Jan Simon on 6 Aug 2013

Bump.

It is really tedious to remind so many newcomers in the forum to format their questions. But ignoring the questions due to the lack of readability would reduce the quality of the forum.

Is there really no idea how newcomers could be motivated to apply a proper formatting?

3 Comments

Cedric Wannaz on 6 Aug 2013

A quite simple test could be implemented actually: build a dictionary of MATLAB commands/functions and operator (without words/symbols that are frequently used outside of code), and have the forum detect code based on this and display a warning when code is detected in a new post.

Additional criteria could be used..

  • The presence of single new lines, because there is often no reason to use them in normal text (as two new line chars are needed to start a new paragraph) but they a frequent in code.
  • The presence of single line return after a ;
  • The density of ";,:,=,(,),{,},[,],+,*,/,\,_", etc, as if there are for example more than one equal sign per 50 characters (thresholds could be determined based on a statistics made on existing posts), it is probably code.
  • The presence of single letters like bcefghjklmnpqruvwxz.
  • The presence of e.g. words with a "camelCase" cap. convention.
  • Etc.
Jan Simon on 6 Aug 2013

@Cedric: I have the impression that you are able to write a function, which recognized a missing formatting. Matlab's urlread can grab the contents from the forum automatically. But I hesitate to apply an auto-formatting of messages of foreign people remote controlled by my local Matlab. But adding a comment with the usual suggestions would be not dangerous.

I'd still prefer that TMW implements this to let us concentrate on the quality of the answers. Thunderbird (p)recognized, when I want to attach a file. Amazon offers opinions about what I want to buy. And if TMW guesses that all questions contain code, the false classification rate will be below 50%. So if a beginner (less than 4 questions) starts to type in any character, a popup appears telling "If you want to insert code...". I hate popups, but this could be a valid application. Hm, I even let my browser suppress popups. Perhaps the idea is not working in reality.

Cedric Wannaz on 6 Aug 2013

@Jan: I meant at Mathworks level, in PHP or whatever language they are using, they could implement a detection based on this list of criteria and display a warning if needed. These criteria would certainly catch most cases where there is unformatted code (and we don't need 100% accuracy), and their implementation is a matter of building a few regular expressions.

Also, this mechanism wouldn't prevent a user to submit an answer/comment, but just add a warning page which would display a red/big message warning that some unformatted code seemed to be detected and asking the user to either go back, or confirm that he/she wants to post the current content.

That said, if it presents any interest, I am probably able to build a MATLAB-based crawler which detects threads with unformatted code based on the aforementioned list of criteria, yes.

Jan Simon
Answer by Iain on 6 Aug 2013

Why not have two textboxes, one for text, and one for code?

1 Comment

Cedric Wannaz on 6 Aug 2013

We often mix code with text actually.

I you look at this or this, I wouldn't have been able to manage these answers+comments with split text/code.

Iain
Answer by Evan on 6 Aug 2013
Edited by Evan on 6 Aug 2013

Is there any way to have two levels of permissions for editing another user's question? At the moment, assuming there are no users who have been granted privileges prematurely, there are only 15 users capable of editing a question. I would say 50% or less of these users have been very active on these forums over the past month or so.

I understand that editing another user's question is a privilege that has potential for abuse and should therefore be difficult to obtain, but if it were possible to split the permissions in some manner, allowing users with, say, a reputation of 750 or 1000 to use the "format code" feature without modifying the text of a question, would it be worth the effort?

Perhaps its cynical, but I think that it's going to be near impossible to get new posters to adhere to the standards for formatting. We can put up announcements, add brightly colored textboxes to the "new question" page, and even flag certain keywords, but unless we actually are making it impossible to submit a post unless you've formatted those flagged keywords, people will continue submitting giant walls of unformatted code.

And not to hijack this topic, but another feature I would like to see is the ability to move comments and answers for those cases where users don't catch on to the differences between them.

2 Comments

Cedric Wannaz on 6 Aug 2013

This is related to the "janitor" type of work mentioned here in my answer at the bottom (copied below).

 " " "

I've seen Walter mentioning "janitor" type of work on the forum, and I think that a 500 rep. should allow people to do this kind of work actually, if they have time and energy for this (and if they are trusted; I'll develop this below). It is obviously tricky to give enough privileges to perform janitor work without giving all privileges, but it is certainly worth working on finding a solution.. in the sense that currently you have to be a high rep. member to spend your time on e.g. formatting questions instead of answering them (..).

Jan posted lately a question about formatting and I commented mentioning "trustees". I think that it is meaningful in the sense that active people in the top 10 rep. know roughly who is answering questions and have an idea about the quality of the answers; in other words, I think that privileges would be better distributed by a mechanism involving rep. points but more importantly a sponsorship/trustee mechanism involving these top 10 rep. active people.

Mixing this idea and the "janitor" type of work mentioned above, I believe that it would be quite interesting if members hitting 500 rep. points, and defined as trustees by top 10 members, would get a limited privilege for editing questions (maybe more interesting than giving a privilege for accepting answers). To illustrate, a logic could be:

  • Rep. points provide recognition as they should, but no privilege. These are separate aspects of the "life" on the forum.
  • Rep. points + the sponsorship/trustee flag provide privileges. E.g. 500 pts + trustee provide "janitor type of work" privileges. People with these privileges are thought to be able to know when/where they are proficient enough to accept answers, and hence have the privilege to accept answers. They can also edit questions without having the full editor privilege, which could be defined as: adding/deleting spaces, underscore, stars, and CR/LF. This would allow performing most of the formatting tasks, without leaving the possibility to change the content (addressing hence Jan's concern in his post mentioned above). It would be relatively easy to implement the check: after removal of these characters in both the original and the modified text, the strings must match.
 " " "
Jan Simon on 6 Aug 2013

In other forums I see BBcode for formatting, e.g. [code]fprintf[/code]. If this would be recognized here also, the >500 rep janitors could be allowed to insert exactly these keys and nothing else. Then marking the text with the mouse and hitting the "{} Code" button would work without the need to open the message for editing.

The number of conflicts e.g. in "function [code] = createCode" is surprisingly small. But of course there is approximately the same number of beginners who omit the formatting also in the forums I'm talking of.

Evan
Answer by Jan Simon on 8 Aug 2013

Another simple idea: Some additional buttons are inserted for the editors to insert standard comments: "Please follow the the [? Help] link to learn how to..." and "Please consider the suggestions on [how to ask a good question]...".

In addition it is required to send email notification when a comment is posted also. Perhaps an automatic closing is useful also.

Then the editors still have to struggle with the forgotten formatting, but hitting one button is much less time consuming.

0 Comments

Jan Simon

Contact us