Code formatting in the forum

9 views (last 30 days)
Jan
Jan on 4 Apr 2013
Answered: Jan on 4 May 2015
Although this forum is online in the 3rd year now and thousands of examples can be found, it is still a tedious task to suggest beginners to format their code. The experienced contributors have explained the procedure thousands of times, and less than a hand full of the beginners found the time to thank them for this.
The problem has been mentioned exhaustively in the wish-list already. It shouldn't be complicated to solve this problem by adding explicit instructions for the first 5 times users post a question. Obviously neither the "{} code" nor the "? Help" button encourage people to learn the basics in the forum. But I'd hope that they spend the time to read text instructions like:
Formatted code is a core feature of this forum. Insert a blank line before and after the code and start each line with at least 2 spaces.
Follow the "? help" button to learn more.
And when this message disappears after the 5th posting, it could even get a red background and some flashing effects.
This would be much more efficient than letting the editors and other diligent users do this ungrateful job.
  5 Comments
Jan
Jan on 15 Aug 2013
@Evan: You can accept a question of another contributor above a limit of 500 reputation points, when the author of a question did not select one for 7 days. Currently 48 users have reached this level. But frequent voting will increase the number.

Sign in to comment.

Accepted Answer

Cedric Wannaz
Cedric Wannaz on 6 Aug 2013
Edited: Cedric Wannaz on 8 Aug 2013
EDIT @ 4:30pm EST: strfind -> regexp with neg. look behind for avoind matching nbsp;.
Here is a simple crawler. It is not my original idea, which was a mechanism at Mathworks level and not at a user (one of us) level. I implemented a few criteria which are not those listed above, as the crawler has to work with content that was already parsed and "preformatted" by the forum.
The criteria implemented should be improved. Typically, the function call(s)/def(s) detection is too "simple" and generates false positive when users write function names followed by parentheses in normal text.
Anyhow, this is just a simple demo.
The whole code below (both functions) should be saved in forumCrawler.m, and you can set pageDepth to control how many forum pages you want to process.
----------------------------------------------------------------------------------------------------------------
function forumCrawler
pageDepth = 1 ;
baseURL = 'http://www.mathworks.com' ;
for pageId = 1 : pageDepth
fprintf('\n=== Processing page %d..\n', pageId) ;
url = sprintf('%s/matlabcentral/answers/?page=%d', baseURL, pageId) ;
thread = regexp(urlread(url), '(?<=<h3><).*?(?=")', 'match') ;
nThread = length(thread) ;
for tId = 1 : nThread
fprintf(' - Analyzing thread %d/%d..\n', tId, nThread) ;
url = sprintf('%s%s', baseURL, thread{tId}) ;
htmlBuffer = urlread(url) ;
% - Scan question.
question = regexp(htmlBuffer, ...
'(?<=class="question-body ).*?(?=</div>)', 'match') ;
[tf, msg] = isLikelyUnformatted(question{1}) ;
if tf
fprintf(' [<a href="%s">question>] %s.\n', url, msg) ;
end
% - Scan answers.
answer = regexp(htmlBuffer, ...
'<div id="([^"]+)" class="answer-body">(.*?)</div>', 'tokens') ;
for cId = 1 : length(answer)
[tf, msg] = isLikelyUnformatted(answer{cId}{2}) ;
if tf
answerUrl = sprintf('%s#%s', url, answer{cId}{1}) ;
fprintf(' [<%s answer> ] %s.\n', ...
answerUrl, msg) ;
end
end
% - Scan comments.
comment = regexp(htmlBuffer, ...
'<div id="([^"]+)" class="comment-body">(.*?)</div>', 'tokens') ;
for cId = 1 : length(comment)
[tf, msg] = isLikelyUnformatted(comment{cId}{2}) ;
if tf
commentUrl = sprintf('%s#%s', url, comment{cId}{1}) ;
fprintf(' [<%s comment> ] %s.\n', ...
commentUrl, msg) ;
end
end
end
end
end
function [tf, msg] = isLikelyUnformatted(content)
tf = true ;
% Eliminate content within <pre>.. and |..| tags,
% so we work on what is meant to be text.
buffer = regexp(content, '
', 'split') ;
content = [buffer{:}] ;
buffer = regexp(content, '<tt.*?</tt>', 'split') ;
content = [buffer{:}] ;
% Check for a few indicators.
if ~isempty(regexp(content, '\w:\w', 'ONCE'))
msg = 'range def. found' ; return ; end
if ~isempty(regexp(content, '\w(', 'ONCE'))
msg = 'function call(s)/def(s) found' ; return ; end
if ~isempty(regexp(content, '(?<!nbsp);</p>', 'ONCE'))
msg = '";</p>" found' ; return ; end
tf = false ;
msg = '' ;
end
  4 Comments
Cedric Wannaz
Cedric Wannaz on 8 Aug 2013
I thought about it a little more and, somehow, I wouldn't mind having automatically an intermediary page when we submit a question (not for comments or answers, but for questions only) with a big read message reminding about formatting and displaying the post as a preview. We don't post that many questions finally, so it wouldn't be annoying.
I think that this mechanism is light enough so it wouldn't take Mathworks that much time/work to implement.

Sign in to comment.

More Answers (5)

Jan
Jan on 6 Aug 2013
Bump.
It is really tedious to remind so many newcomers in the forum to format their questions. But ignoring the questions due to the lack of readability would reduce the quality of the forum.
Is there really no idea how newcomers could be motivated to apply a proper formatting?
  3 Comments
Cedric Wannaz
Cedric Wannaz on 6 Aug 2013
@Jan: I meant at Mathworks level, in PHP or whatever language they are using, they could implement a detection based on this list of criteria and display a warning if needed. These criteria would certainly catch most cases where there is unformatted code (and we don't need 100% accuracy), and their implementation is a matter of building a few regular expressions.
Also, this mechanism wouldn't prevent a user to submit an answer/comment, but just add a warning page which would display a red/big message warning that some unformatted code seemed to be detected and asking the user to either go back, or confirm that he/she wants to post the current content.
That said, if it presents any interest, I am probably able to build a MATLAB-based crawler which detects threads with unformatted code based on the aforementioned list of criteria, yes.

Sign in to comment.


Iain
Iain on 6 Aug 2013
Why not have two textboxes, one for text, and one for code?
  1 Comment
Cedric Wannaz
Cedric Wannaz on 6 Aug 2013
We often mix code with text actually.
I you look at this or this, I wouldn't have been able to manage these answers+comments with split text/code.

Sign in to comment.


Evan
Evan on 6 Aug 2013
Edited: Evan on 6 Aug 2013
Is there any way to have two levels of permissions for editing another user's question? At the moment, assuming there are no users who have been granted privileges prematurely, there are only 15 users capable of editing a question. I would say 50% or less of these users have been very active on these forums over the past month or so.
I understand that editing another user's question is a privilege that has potential for abuse and should therefore be difficult to obtain, but if it were possible to split the permissions in some manner, allowing users with, say, a reputation of 750 or 1000 to use the "format code" feature without modifying the text of a question, would it be worth the effort?
Perhaps its cynical, but I think that it's going to be near impossible to get new posters to adhere to the standards for formatting. We can put up announcements, add brightly colored textboxes to the "new question" page, and even flag certain keywords, but unless we actually are making it impossible to submit a post unless you've formatted those flagged keywords, people will continue submitting giant walls of unformatted code.
And not to hijack this topic, but another feature I would like to see is the ability to move comments and answers for those cases where users don't catch on to the differences between them.
  2 Comments
Jan
Jan on 6 Aug 2013
In other forums I see BBcode for formatting, e.g. [code]fprintf[/code]. If this would be recognized here also, the >500 rep janitors could be allowed to insert exactly these keys and nothing else. Then marking the text with the mouse and hitting the "{} Code" button would work without the need to open the message for editing.
The number of conflicts e.g. in "function [code] = createCode" is surprisingly small. But of course there is approximately the same number of beginners who omit the formatting also in the forums I'm talking of.

Sign in to comment.


Jan
Jan on 8 Aug 2013
Another simple idea: Some additional buttons are inserted for the editors to insert standard comments: "Please follow the the [? Help] link to learn how to..." and "Please consider the suggestions on [how to ask a good question]...".
In addition it is required to send email notification when a comment is posted also. Perhaps an automatic closing is useful also.
Then the editors still have to struggle with the forgotten formatting, but hitting one button is much less time consuming.

Jan
Jan on 4 May 2015
And the next bump.
The number of questions with unreadable code is not decreasing. The frequent contributors still waste too much time with asking for the application of the "{} Code" format.
Please, TMW, force the newcomers in the forum to read the instructions about code formatting! Remove the gaptcha (if there one), but display 3 lines of code and accept the user only, if he or she marks it with the mouse and press the magic button.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!