During my master’s, I worked on different Reinforcement Learning (RL) problems and gained valuable hands-on experience in Deep Learning (DL). My thesis focused on enabling RL agents to find subgoal states in the Environment faster using Subspaces and a Free Energy Paradigm. I was fortunate to work under the supervision of Dr. Nili Ahmadabadi, Dr. Hosseini and Dr. Shariatpanahi. My goal is to pursue my academic career through a Ph.D. and gain more experience and knowledge in both theoretical and practical aspects of RL and DL.
Download my resumé.
MSc in Computer Engineering (with a major in AI)
University of Tehran
BSc in Computer Engineering, 2020
Univerisity of Tabriz
Reinforcement Learning (RL) is mainly inspired by studies on animal and human learning. However, RL methods suffer higher regret in comparison to natural learners in realworld tasks. This is partly due to the lack of social learning in RL agents. We propose a social learning method for improving the performance of RL agents for the multi-armed bandit setting. The social agent observes other agents’ decisions, while their rewards are private. The agent uses a preference-based method, similar to the policy gradient learning method, to find if there are any agents in the heterogeneous society worth learning from their policies to improve their performance. The heterogeneity is the result of diversity in learning algorithms, utility functions, and expertise. We compare our method with state-of-the-art studies and demonstrate that it results in higher performance in most scenarios. We also show that performance improvement increases with the problem complexity and is inversely correlated with the population of unrelated agents.
Reinforcement Learning (RL) is mainly inspired by studies on animal and human learning. However, RL methods suffer higher regret in comparison to natural learners in realworld tasks. This is partly due to the lack of social learning in RL agents. We propose a social learning method for improving the performance of RL agents for the multi-armed bandit setting. The social agent observes other agents’ decisions, while their rewards are private. The agent uses a preference-based method, similar to the policy gradient learning method, to find if there are any agents in the heterogeneous society worth learning from their policies to improve their performance. The heterogeneity is the result of diversity in learning algorithms, utility functions, and expertise. We compare our method with state-of-the-art studies and demonstrate that it results in higher performance in most scenarios. We also show that performance improvement increases with the problem complexity and is inversely correlated with the population of unrelated agents.
Responsibilities include:
Theoretical Problem sets, Homworks and Projects for graduate courses Courses:
Responsibilities include:
Responsibilities include:
Responsibilities include: