CMA_MOMAB

Source code for the article "Covariance Matrix Adaptation for Multiobjective Multiarmed Bandits"
117 Downloads
Updated 19 Jan 2019

Upper confidence bound (UCB) is a successful multiarmed bandit for regret minimization. The covariance matrix adaptation (CMA) for Pareto UCB (CMA-PUCB) algorithm considers stochastic reward vectors with correlated objectives. We upper bound the cumulative pseudoregret of pulling suboptimal arms for the CMA-PUCB algorithm to logarithmic number of arms K, objectives D, and samples n, O(ln(nDK) ∑i (||Σi||²/Δi)), using a variant of Berstein inequality for matrices, where Δi is the regret of pulling the suboptimal arm i. For unknown covariance matrices between objectives Σi, we upper bound the approximation of the covariance matrix using the number of samples to O(nln(nDK) + ln²(nDK) ∑i (1/Δi)). Simulations on a three objective stochastic environment show the applicability of our method.

Cite As

Drugan, Madalina. “Covariance Matrix Adaptation for Multiobjective Multiarmed Bandits.” IEEE Transactions on Neural Networks and Learning Systems, Institute of Electrical and Electronics Engineers (IEEE), 2019, pp. 1–10, doi:10.1109/tnnls.2018.2885123.

View more styles
MATLAB Release Compatibility
Created with R2018b
Compatible with any release
Platform Compatibility
Windows macOS Linux
Categories
Find more on Sparse Matrices in Help Center and MATLAB Answers
Tags Add Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Versions that use the GitHub default branch cannot be downloaded

Version Published Release Notes
1.0.3

Contains a Readme file

1.0.2

Comparison with uniform sampling
Improved cumulative regret plots

1.0.1

A bug was detected
A plot file is present

1.0.0

To view or report issues in this GitHub add-on, visit the GitHub Repository.
To view or report issues in this GitHub add-on, visit the GitHub Repository.