Dear Free Lancers, Please go through all the files and questions and contact me only if you 100% sure that you can complete the project. Also a report is required. I do not like to have any kind of differences once we start working. Hope U understand.
Consider 10 slot machines that you can only choose one machine to play at each time. You will get a reward for playing with any machine that you choose. The goal is to find the most profitable machine and play it as often as possible. The p-code Matlab function generate_MAB.m provides 10 sequences of rewards (randomly) of length N. You can design an algorithm to learn a good strategy by playing the multi-armed bandit problem N times. Example Matlab code Test_MAb.m is provided where a simplified epsilon-greedy strategy is used to make the trade off between exploration and exploitation. The goal is to minimize the total regret (or the percentage of reward loss) between the ideal cumulative reward (that you always choose the best arm) and the actual reward obtained from your algorithm.
Try to formulate your algorithm design as an engineering optimization problem and test your hypothesis for multiple runs of the 10 reward sequences.
1. When N is very large, do you see any arm that provides maximum expected reward? Can your algorithm eventually select the most profitable arm with high probability?
2. When you fix N (say N=1000), can your algorithm perform better than the simplified epsilon-greedy strategy on average (say over 100 runs)? Can your algorithm perform better in every single run?
3. Propose your own evaluation criterion to compare your optimized sequential arm selection strategy with the epsilon-greedy strategy. Can you find another way to adjust epsilon sequentially? Can you optimize the epsilon-greedy strategy when you can see the reward of each arm after you selected an arm at each time?
4. Assume that you initially have $100 and selecting an arm will cost you $1.00 each time and you will get the reward from the arm that you selected. Set N to be very large and simulate your arm selection strategy until either you have used all your money or you have reached more than $200 for the first time. Document the time that your algorithm has to stop. Can you optimize your strategy so that it will get $200 with high probability in as few steps as possible?
4 pekerja bebas membida secara purata $143 untuk pekerjaan ini
hi, i am Electrical Engineer and expert of probability and random processes and statics. i'm sure i can do this job perfectly. open your chat box to discuss further details. thanks
Can show u sample of cuckoo bird and Ant Bee colony Optimzation. Published paper in int. journal. Matloab Coading I know very well. I can give u within few days. Thanx.