Rethinking Line Length in Scientific / MATLAB Code
Many widely cited code style guides originate from large-scale software engineering contexts: multi-developer teams, large codebases, separate reviewers, and tooling-driven workflows. While those constraints are valid in their domain, they often map poorly onto scientific and engineering scripting as it is typically practiced with MATLAB.
In laboratory and engineering environments, code serves a different role. It is frequently written by individuals or small groups, and then iteratively modified, copied, adapted, and extended as part of an evolving problem-solving process. In this context, the primary priorities are not strict stylistic consistency or tooling compatibility, but rather:
- maintaining clarity of underlying structure,
- minimizing the risk of errors during modification, and
- supporting rapid comprehension of mathematically or logically dense code.
This raises the question: should fixed line-length limits be replaced by context-aware principles? Could these be supported by a suitable AI tool?
The following proposal outlines a small set of heuristics governing line length, based on observations of real-world MATLAB usage, particularly for numerically intensive and structurally rich code. These heuristics aim to:
- preserve and expose meaningful structure (e.g. systems of equations, tables, repeated patterns)
- avoid formatting that obscures relationships or introduces errors, and
- treat different kinds of code (logic vs. data vs. structured expressions) appropriately.
Scope
These principles apply to scientific and engineering scripting, particularly:
- MATLAB-like environments
- numerically or structurally dense code
- monolithic or semi-monolithic workflows
- code that is frequently modified, copied, and adapted
They are not intended for large-scale commercial software engineering, where different constraints dominate.
Core Objective
Line length and formatting should maximize comprehension, structural clarity, and correctness under modification, rather than enforce arbitrary limits.
Hierarchy of Heuristics
Higher-numbered heuristics take precedence over lower-numbered ones.
1) Reasonable Line Length
Code intended for reading should use a reasonable line length, guided by:
- human visual comprehension when scanning
- clarity of expression
- preservation of logical units
This would tend toward 70-100 characters per line, depending on the density.
2) Preserve Semantic Integrity of Lines
Line breaks must not split code in ways that degrade understanding.
Avoid:
- dangling fragments
- very short continuation lines
- separation of tightly coupled elements
- etc.
Prefer:
- keeping logically cohesive expressions intact
- breaking only at clear structural boundaries
One slightly longer line is preferable to two poorly structured lines.
3) Treat Data as Data (Not Prose/Code)
Code that primarily represents data rather than logic is not intended for sequential reading.
This includes:
- large numeric vectors
- lookup tables
- pasted datasets
- etc.
Such code:
- may exceed line length limits without restriction
- should prioritize density and structural stability
- is assumed to be accessed via search or indexing rather than visual parsing
Readability is not the objective; retrievability and integrity are.
4) Preserve and Expose 2D Structure
If code encodes a logical, mathematical, or tabular structure with inherent spatial relationships, it should be represented accordingly.
This includes:
- systems of equations
- tabulated data
- repeated structured expressions
- etc.
Requirements:
- alignment should be used where it improves comprehension
- patterns should be visually apparent
- deviations from patterns should be easily detectable
This principle should be applied strongly, tending toward mandatory use where feasible.
Exception
If a structure would become impractically wide, a compromise representation may be used.
Breaking meaningful spatial structure is considered harmful to comprehension and correctness.
5) Preserve Structural Consistency Across Similar Code
Code segments representing similar or related logic should be expressed in consistent structure and layout.
This applies to:
- repeated formulas
- analogous computations
- structurally similar transformations
- etc.
Consistency enables:
- rapid comparison
- detection of inconsistencies
- safer modification
Similar logic should be represented in similar ways.
Meta-Principles
A. Structure Over Style
Line lengths should reflect the underlying structure of the problem, not conform to arbitrary limits.
B. Correctness Over Convention
Avoid line lengths and formatting that:
- obscures patterns
- hides inconsistencies
- increases the risk of modification errors
C. Optimize for Modification
Code in this domain is frequently:
- edited
- duplicated
- adapted for n
- extended
- commented-out for testing different versions
- etc
Line lengths should reduce the likelihood of errors during these operations, for example by keeping atomic concepts on the same line rather than splitting them up.
D. Anomaly Visibility
Formatting should make unexpected deviations immediately visible.
E. Tool Support
An intelligent tool should:
- respect and preserve structural layout
- avoid rigid line-length enforcement
- detect patterns and inconsistencies
- assist rather than constrain the programmer
I would be interested to hear how well these ideas match others’ experience, particularly in scientific or engineering workflows.
See also:
8 Comments
Time DescendingLooking at each of your points in turn:
Reasonable Line Length
I agree with this. Just as run-on sentences in a novel can be annoying, run-on lines of code can be equally difficult to interpret (for humans; MATLAB doesn't care as much.) I try to stick to the width of my screen (or half of it, if I have the Editor and Command Window side-by-side as I often do) so that I don't have to scroll to read the last few characters of a command. I do this when answering questions in MATLAB Answers as well; having two characters of line length force the Answers box to include a scroll bar at the end is mildly annoying to me.
Preserve Semantic Integrity of Lines
Often lines tend to get long when multiple name-value arguments are included in a function call or when a function has positional arguments that are naturally paired (like lb and ub as lower and upper bounds for a "function function".) In that case, I'll split the line with a line continuation (...) after each name-value argument or after a pair of name-value arguments (if they're short enough to fit in a reasonable length) or after the pair of positional arguments (avoiding lb on one line and ub on the next.)
Treat Data as Data (Not Prose/Code)
I disagree slightly with one aspect of this.
The aspect I disagree with is the first part of your thought "Readability is not the objective; retrievability and integrity are." If inlining the data affects the readability of the code, it shouldn't be inlined IMO.
If you have a large block of data, rather than hard-coding it in the middle of the flow of the code I'd either move it to a separate file (see for example the bench.dat data file used by the bench function) or to a local function in the file (the defaultspy local function inside the spy function.)
Preserve and Expose 2D Structure
I agree, as this can avoid problems with accidentally introduced mismatched line lengths. But if you're adding in a lot of extra spaces (because your data is very large and contains mixtures of floating-point and integer values, positive and negative numbers) I'd start asking "Should this be in a separate file, either human-readable or MAT-file?" as per my opinion on your third point.
Preserve Structural Consistency Across Similar Code
I'd take this a step further. If you're typing the same formula (or the same formula with very small modifications) repeatedly, I wouldn't just make them look similar. I'd consider encapsulating the implementation of the formula in a function (anonymous, local, or stand-alone) with potentially additional input arguments to reflect the minor differences. For instance, if I wanted to compute x^2+n*x+1 for various values of n, I would create an anonymous function:
f = @(x, n) x.^2+n.*x+1;
so that I could use it repeatedly. If one of the uses of it required a unary function (just the x input) that's easy enough to do with a binding pattern.
g = @(x) f(x, 2) % Bind n = 2
Next dimension
I find that line length (width) is less of an issue when reviewing code than the number of lines in a function or code section (height). Much of the code I write for work is testing code, and I try where possible to keep the test methods in my test classes short enough that they can fit on one screen. Scrolling to look back at where/how a particular variable was defined isn't that disruptive to my thought process when developing, but it's still a little more disruptive than simply looking earlier on the same screen.
I think your approach is excellent, Stephen. Well done!
On line length, specifically in conjunction with clarity.
My typical coding pattern (as will have been observed in Answers forum) is to comment at the end of lines rather than in line and, with the advent of modern high resolution monitors the 80 column card length days are long gone.
Consequently, while code lines are generally fairly short, the overall line length I'm comfortable with (even at an advanced age with much less acute vision) is otoo 120 characters.
How would your algorithms fit with the above model? (Ignoring line length entirely for data is pure genius and where having any arbitrary line length with the auto code folding that most editors currently implement is wrongheaded).
This is a terrific set of standards. Would they be possible as settings in your IDE, where the underlying code would not have to necessarily comply (based on writer or viewer preference), but the output in the viewer could comply if set?
I heartily agree with these recommendations.
Sign in to participate