Network routing with multi-armed bandit algorithms

Jan 1, 0001

In this Project I’ve used Bandit Algorithms to solve the Problem of Routing for the graph below:

Algorithms:

For solving this task I’ve used different algorithms for multi-armed bandit problems like UCB and Epsilon-Greedy. each route in the network is considered an arm for a multi-armed bandit and each arm has its own stochastic reward due to the delays that may occur during the route.

Results

Below you can see the average reward for the epsilon greedy algorithm and the percentage of optimal action selection for UCB and epsilon greedy algorithms:

the average reward for epsilon greedy algorithm

the percentage of optimal action selection for UCB and epsilon greedy

Reinforcement Learning past

Amir Mesbah

Master student in Computer Engineering (with a major in AI)

I am a graduated Master’s student from University of Tehran.