Top-k Sparse Attention Layer

Top-k Sparse Attention Layer implemented based on Deep Learning Toolbox and customized deep learning layer template.

Chuguang Pan

Version 1.0.0 (6.79 KB)

1 Download

(0)

2 Jun 2026

Download

Open in MATLAB Online

Download

Open in MATLAB Online

The parameterized top‑K sparse attention mechanism is an efficient approximation of standard scaled dot‑product attention. For each query, it retains only the K largest similarity scores across all keys, setting the rest to

. After softmax, this produces strictly sparse attention weights, reducing both memory and computational complexity from

. The hyperparameter K directly controls the sparsity–accuracy trade‑off.

In the implementation, sparsification relies on a hard threshold mask derived from the top‑K selection. This mask is treated as a constant during the forward pass, and gradients flow only through the selected scores—a formulation that follows the straight‑through estimator approach. This layer can be directly plugged into a dlnetwork. This layer is well‑suited to tasks involving long sequences where standard attention is prohibitively expensive, such as efficient Transformers or resource‑constrained time‑series forecasting.

Cite As

Chuguang Pan (2026). Top-k Sparse Attention Layer (https://www.mathworks.com/matlabcentral/fileexchange/184003-top-k-sparse-attention-layer), MATLAB Central File Exchange. Retrieved June 27, 2026.

General Information

Version 1.0.0 (6.79 KB)
View License

MATLAB Release Compatibility

Compatible with R2025a to R2026a

Platform Compatibility

Windows
macOS
Linux

Open in new tab

Version	Published	Release Notes	Action
1.0.0	2 Jun 2026		Download

Top-k Sparse Attention Layer

Cite As

Tags

General Information

Requires

MATLAB Release Compatibility

Platform Compatibility