What if multimodal AI were a sweaty game, not on model side, but in user-centric arena? Give player an agent, virtual world, and waifus character. Let them fight. Players control their agent with prompts to win the objective, each has the same computing power.