Amir Mesbah

Master student in Computer Engineering (with a major in AI)

University of Tehran

Biography

Welcome! I graduated from the University of Tehran with a master’s degree in AI. Currently, I am a Research Assistant at the Cognitive Systems lab at the School of ECE at the University of Tehran. I completed my bachelor's in Computer Engineering at the University of Tabriz (ECE faculty).

During my master’s, I worked on different Reinforcement Learning (RL) problems and gained valuable hands-on experience in Deep Learning (DL). My thesis focused on enabling RL agents to find subgoal states in the Environment faster using Subspaces and a Free Energy Paradigm. I was fortunate to work under the supervision of Dr. Nili Ahmadabadi, Dr. Hosseini and Dr. Shariatpanahi. My goal is to pursue my academic career through a Ph.D. and gain more experience and knowledge in both theoretical and practical aspects of RL and DL.

Download my resumé.

Interests

Reinforcement Learning
Deep Learning
Trustworthy AI

Education

MSc in Computer Engineering (with a major in AI)

University of Tehran
BSc in Computer Engineering, 2020

Univerisity of Tabriz

Publications

Amirhossein Mesbah, Reshad Hosseini, Seyed Pooya Shariatpanahi, Majid Nili Ahmadabadi

December, 2024 Preprint on Arxiv - Submitted to IEEE Transactions on Neural Networks and Learning Systems

Subgoal Discovery Using a Free Energy Paradigm and State Aggregations

Reinforcement Learning (RL) is mainly inspired by studies on animal and human learning. However, RL methods suffer higher regret in comparison to natural learners in realworld tasks. This is partly due to the lack of social learning in RL agents. We propose a social learning method for improving the performance of RL agents for the multi-armed bandit setting. The social agent observes other agents’ decisions, while their rewards are private. The agent uses a preference-based method, similar to the policy gradient learning method, to find if there are any agents in the heterogeneous society worth learning from their policies to improve their performance. The heterogeneity is the result of diversity in learning algorithms, utility functions, and expertise. We compare our method with state-of-the-art studies and demonstrate that it results in higher performance in most scenarios. We also show that performance improvement increases with the problem complexity and is inversely correlated with the population of unrelated agents.

Banafsheh Karimian, Erfan Mirzaei, Amirhossein Mesbah, Reshad Hosseini, Seyed Pooya Shariatpanahi, Majid Nili Ahmadabadi

November, 2022 Submitted to IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS

Who to Learn from: A Preference-based Method for Social Reinforcement Learning

Reinforcement Learning (RL) is mainly inspired by studies on animal and human learning. However, RL methods suffer higher regret in comparison to natural learners in realworld tasks. This is partly due to the lack of social learning in RL agents. We propose a social learning method for improving the performance of RL agents for the multi-armed bandit setting. The social agent observes other agents’ decisions, while their rewards are private. The agent uses a preference-based method, similar to the policy gradient learning method, to find if there are any agents in the heterogeneous society worth learning from their policies to improve their performance. The heterogeneity is the result of diversity in learning algorithms, utility functions, and expertise. We compare our method with state-of-the-art studies and demonstrate that it results in higher performance in most scenarios. We also show that performance improvement increases with the problem complexity and is inversely correlated with the population of unrelated agents.