You are now following this Submission
- You will see updates in your followed content feed
- You may receive emails, depending on your communication preferences
The parameterized top‑K sparse attention mechanism is an efficient approximation of standard scaled dot‑product attention. For each query, it retains only the K largest similarity scores across all keys, setting the rest to
. After softmax, this produces strictly sparse attention weights, reducing both memory and computational complexity from
to
. The hyperparameter K directly controls the sparsity–accuracy trade‑off.
In the implementation, sparsification relies on a hard threshold mask derived from the top‑K selection. This mask is treated as a constant during the forward pass, and gradients flow only through the selected scores—a formulation that follows the straight‑through estimator approach. This layer can be directly plugged into a dlnetwork. This layer is well‑suited to tasks involving long sequences where standard attention is prohibitively expensive, such as efficient Transformers or resource‑constrained time‑series forecasting.
Cite As
Chuguang Pan (2026). Top-k Sparse Attention Layer (https://www.mathworks.com/matlabcentral/fileexchange/184003-top-k-sparse-attention-layer), MATLAB Central File Exchange. Retrieved .
General Information
- Version 1.0.0 (6.79 KB)
MATLAB Release Compatibility
- Compatible with R2025a to R2026a
Platform Compatibility
- Windows
- macOS
- Linux
| Version | Published | Release Notes | Action |
|---|---|---|---|
| 1.0.0 |
