How do I insert punctuation in unpunctuated text?

Hi all,
I am currently working on a very fun project, but unfortunately came across a problem I haven't been able to solve for some time. What I am trying to do is punctuate text that contains no punctuation. At the time of writing I have a lot of text files that contain proper punctuation, and matching text files without punctuation.
Initially, I thought that Matlab would have some Neural Network that I could train with the input and output files I have but no, unfortunately not.
Therefor I am reaching out to you and hope there is someone who can help me punctuate unpunctuated text.

8 Comments

Please give us a small example.
You would not remove the punctuation in your case -- but you could convert them into tokens.

@Jan Simon:

The input is a txt file without punctuation:

What s happened to me he thought it wasn t a dream his room a proper human room although a little too small lay peacefully between its four familiar walls a collection of textile samples lay spread out on the table samsa was a travelling salesman and above it there hung a picture that he had recently cut out of an illustrated magazine and housed in a nice, gilded frame it showed a lady fitted out with a fur hat and fur boa who sat upright raising a heavy fur muff that covered the whole of her lower arm towards the viewer

The output has to be the input with punctuation:

What's happened to me? he thought. It wasn't a dream. His room, a proper human room although a little too small, lay peacefully between its four familiar walls. A collection of textile samples lay spread out on the table - Samsa was a travelling salesman - and above it there hung a picture that he had recently cut out of an illustrated magazine and housed in a nice, gilded frame. It showed a lady fitted out with a fur hat and fur boa who sat upright, raising a heavy fur muff that covered the whole of her lower arm towards the viewer.

@Walter Roberson:
I will take a look at the link provied
I have no idea how to solve this. It seems a very difficult problem, particularly as there seems to be many valid ways of applying punctuation to the given sample. e.g:
"What's happened to me?" he thought. It wasn't a dream.
What's happened to me? He thought it wasn't a dream.
@Guillaume,
I want to train a Neural Network to punctuate texts. With enough training it should be able to do this, and I have tons of ebooks to train the NN with.
No neural network is going to be able to say which is more correct of:
What's happened to me? He thought. It wasn't a dream.
"What's happened to me?", he thought. It wasn't a dream.
What's happened to me? He thought it wasn't a dream.
Without a ton of context no human can do that either. And even with context, it can still be ambiguous.
The traditional example:
Eats shoots and leaves. (Panda)
Eats, shoots, and leaves. (Gunman)
I don't need it to say if it's correct, all it needs to do is put some punctuation in and I will make a table containing all those entries. I will then compare this table with a table I've made of the original text and compare the two tables.
I've heard that this problem is more suitable for deep LSMT. But I have no idea how to implement this with Matlab. Do you maybe know how to implement this?

Sign in to comment.

Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Asked:

r r
on 22 Jan 2018

Commented:

r r
on 22 Jan 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!