Write the lines (sentences) of a 1513 x 1 string into separate lines in a text file, keeping the whole length of each sentence, without breaking them.

46 views (last 30 days)
This is the function I am using: writelines(newstr,"c:\users\lnitz\downloads\tarnowagain.txt")
newstr is a 1513 x 1 string, consisting of whole sentences from a story.Many sentences are quite long.
The text below was two sentences (each in a single line) in the string document. but appears broken into four lines in the writelines output.
Sie hatte keine Kinder, und mir wurde daher
der ganze Reichthum ihres Herzens an Liebe.
Seit meinem zehnten Jahre war sie mir Pflegerin
und Erzieherin, und mein Herz liebte und ehrte sie wie eine zweite Mutter.
My ideal is to have either a text file with one sentence per line or a spreadsheet with one sentence in the first cell of each line. This woujld be input to a text program that focuses on sentences.

Accepted Answer

Samay Sagar
Samay Sagar on 25 Apr 2024 at 9:36
Edited: Samay Sagar on 28 Apr 2024 at 4:51
The issue you are encountering likely stems from the inherent formatting of the “newstr” string. This can happen if the sentences in “newstr” contain newline characters. You can use the “replace” function to remove the newline characters.
To ensure each sentence is recognized and treated as a distinct entity that occupies a single line, you can leverage MATLAB's “regexp” function. You can apply the “regexp” function to your text data, ensuring every sentence is correctly identified.
Here is how you can implement the above solution :
newstr ="This is a sentence. Here's another one! And yet another one?";
newstr = replace(newstr, newline, "");
newstr = replace(newstr, "\n", "");
sentences = regexp(newstr, '\S.*?[\.\!\?]', 'match');
sentencesStrArray = string(sentences);
filePath = 'sentences.txt';
writelines(sentencesStrArray, filePath);
To store your sentences in a spreadsheet, you can use “cellstr” to convert the sentences into cell array and then store it in a CSV file.
sentencesCellArray = cellstr(sentences)';
% Specify the CSV file path
csvFilePath = 'sentences.csv';
% Write the cell array to the CSV file
writecell(sentencesCellArray, csvFilePath);
For more information about “regexp”, “cellstr” and “writecell”, you can refer the following documentation:
Hope this helps!
  1 Comment
Stephen23
Stephen23 on 28 Apr 2024 at 4:22
Edited: Stephen23 on 28 Apr 2024 at 4:25
Note that the superfluous type conversion using CELLSTR is easily avoided using WRITELINES instead of WRITECELL.
The comment "Removes all types of newline characters" is incorrect because NEWLINE is defined to be exactly one character: char(10). Nothing else. Although it could be argued that there are different types of newline character, the NEWLINE returns exactly one of them: char(10).
The code commented "For explicit \n characters" is of unclear benefit.
Also note that assuming periods end sentences is not always correct (but it might be for that data set).

Sign in to comment.

More Answers (0)

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!