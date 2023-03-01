



With a cutoff of 5, you’re choosing about 1 random option for every 20 decisions the algorithm makes. I chose 5 as my cutoff. For enthusiasts, there is a further optimization process to decide which cutoff to use and change cutoff values ​​as learning continues. Your best bet is often to try several values ​​and see which works best. Reinforcement learning algorithms may take random actions because they rely on past experience. Constantly choosing the predicted best option can mean missing out on better choices that have never been attempted.

I never thought this algorithm would really improve my life. But an optimization framework backed by mathematical proofs, peer-reviewed papers, and huge Silicon Valley revenues makes a lot of sense to me. Exactly how does it actually collapse?

8:30 am

first decision? Id whether to wake up at 8:30 as scheduled. I turned off the alarm, opened the RNG, held my breath and rotated to exhale 9.

I have an important question here. Has going to bed and waking up on time ever yielded more favorable results? and tried to tally up the hazy memories of my morning snooze. As long as I didn’t miss anything important, I decided that the pleasure of being in bed outweighed the pleasure of a lazy weekend morning.

9:00 am

I had a group project meeting in the morning and had to finish reading on Machine Learning (Bayesian Deep Learning with Subnetwork Inference, anyone?) before it started. RNG instructed me to decide whether to skip the meeting based on previous experience. I decided to attend. I rolled again and got a 5 to decide whether to read or not. Randomly choose to read or skip.

It was a very small decision, but I was surprisingly nervous as I was getting ready to roll another random number on my phone. , I didn’t really want to. Apparently, skipping reading is only fun if you do it on purpose.

I pressed the GENERATE button.

65. After all I will read.

11:15 am

I wrote a list of options for how to spend my free time that I’m facing right now. I could research PhD programs to apply for, go down irrelevant internet rabbit holes, and take a nap. A lot of data from RNGI needs to make data-driven decisions about what to do.

This was a more complicated initial decision than yes or no, and the moment I started racking my brain over how favorable each option was, it became clear that there was no way to give an accurate estimate. Computer scientists have already told us what is considered preferable when an AI agent following a random algorithm makes a decision. The agent’s experience translates into a reward score that the AI ​​tries to maximize, whether it’s time survived in a video game or money earned in the stock market. However, defining a reward function can be difficult. Intelligent cleaning robots are a classic example. If you tell the robot to simply maximize the litter it throws away, the robot can learn to knock over the litter and dump the same litter again, thus increasing its score.

