fastBERTtokens: Tokenizing for BERT in parallel

This function simply divides your text into batches, and tokenizes in parallel. Provides significant speed-up.

Ralf Elsas

Version 1.0.0 (1.43 KB)

23 Downloads

(0)

24 Feb 2023

Download

Open in MATLAB Online

Download

Open in MATLAB Online

Function to use Matlab BERT tokenizer in parallel

This function simply divides your text into batches, and tokenizes in parallel. As the Matlab tokenizer is very slow when run on a single processor for large data, this provides a significant speed-up. On an i7-10875H laptop with 8 logical units, tokenizing 76k sentences takes about 100 seconds.

Also note that providing the Matlab BERT model is important, as different BERT models use different encodings for the special BERT tokens like [SEP] etc.

Cite As

Ralf Elsas (2026). fastBERTtokens: Tokenizing for BERT in parallel (https://www.mathworks.com/matlabcentral/fileexchange/125295-fastberttokens-tokenizing-for-bert-in-parallel), MATLAB Central File Exchange. Retrieved May 25, 2026.

Acknowledgements

Inspired by: Transformer Models

General Information

Version 1.0.0 (1.43 KB)
View License

MATLAB Release Compatibility

Compatible with R2021a and later releases

Platform Compatibility

Windows
macOS
Linux

Open in new tab

Version	Published	Release Notes	Action
1.0.0	24 Feb 2023		Download