OpenAI made a system that’s better at Montezuma’s Revenge than humans

Nitin Naresh November 1, 2018

0 401 3 minutes read

Artificial intelligence (AI) can generate synthetic scans of brain cancer, simultaneously translate between languages, and teach robots to manipulate objects with humanlike dexterity. And as new research from OpenAI reveals, it’s pretty darn good at playing video games, too.
On Tuesday, OpenAI — a nonprofit, San Francisco-based AI research company backed by Elon Musk, Reid Hoffman, and Peter Thiel, among other tech luminaries — detailed in a research paper AI that can best humans at the retro platformer Montezuma’s Revenge. The top-performing iteration found 22 of the 24 rooms in the first level, and occasionally discovered all 24.
It follows news in June of an OpenAI-developed bot that can defeat skilled teams in Valve’s Dota 2.
As OpenAI noted in an accompanying blog post, Montezuma’s Revenge is notoriously difficult for machine learning algorithms to master. It was the only Atari 2600 title to foil Google subsidiary DeepMind’s headline-grabbing Deep Q-Learning network in 2015, which scored a 0 percent of the average human score (4.7K).
“Simple exploration strategies are highly unlikely to gather any rewards, or see more than a few of the 24 rooms in the level,” OpenAI wrote. “Since then, advances in Montezuma’s Revenge have been seen by many as synonymous with advances in exploration.”
OpenAI calls its method Random Network Distillation (RND), and said it’s designed to be applied to any reinforcement learning algorithm — i.e., models that use systems of rewards and punishments to drive AI agents in the direction of specific goals.
Traditionally, agents learn the next-state predictor model from their experiences and use the error of the prediction as an intrinsic reward. Unlike prior methods, RND introduces a bonus reward that’s based on predicting the output of a fixed and randomly initialized neural network on the next state.
In the course of a run, the agents played Montezuma’s Revenge completely randomly, improving their strategy through trial and error. Thanks to the RND component, they were incentivized to explore areas of the game map they might not have otherwise, managing to achieve the game’s objective even when it wasn’t explicitly communicated.
“Curiosity drives the agent to discover new rooms and find ways of increasing the in-game score, and this extrinsic reward drives it to revisit those rooms later in the training,” OpenAI explained. “Curiosity gives us an easier way to teach agents to interact with any environment, rather than via an extensively engineered task-specific reward function that we hope corresponds to solving a task. An agent using a generic reward function not specific to the particulars of an environment can acquire a basic level of competency in a wide range of environments, resulting in the agent’s ability to determine what behaviors are even in the absence of carefully engineered rewards.”

Above: The AI agents are driven by curiosity.

Image Credit: OpenAI

RND addressed another common issue in reinforcement learning schemes: the so-called noisy TV problem, in which an AI agent can become stuck looking for patterns in random data (like static on a TV).
“Like a gambler at a slot machine attracted to chance outcomes, the agent sometimes gets trapped by its curiosity,” OpenAI wrote. “The agent finds a source of randomness in the environment and keeps observing it, always experiencing a high intrinsic reward for such transitions.”
So how’d it perform? On average, OpenAI’s agents scored 10K over nine runs with a best mean return of 14.5K. A longer-running test yielded a run that achieved 17.5K, corresponding to passing the first level and finding all 24 rooms.
It wasn’t just Montezuma’s Revenge they mastered. When set loose on Super Mario, the agents discovered 11 levels, found secret rooms, and defeated bosses. They learned how to beat Breakout after a few hours of training. And when tasked with volleying a ball in Pong with a human player, they tried to prolong the game rather than win.
OpenAI has its fingers in a number of AI pies besides gaming.
Last year, it developed software that produces high-quality datasets for neural networks by randomizing the colors, lighting conditions, textures, and camera settings in simulated scenes. (Researchers used it to teach a mechanized arm to remove a can of Spam from a table of groceries.) More recently, in February, it released Hindsight Experience Replay (HER), an open source algorithm that effectively helps robots to learn from failure. And in July, it unveiled a system that directs robot hands in grasping and manipulating objects with state-of-the-art precision.
Source: VentureBeat
To Read Our Daily News Updates, Please Visit Inventiva Or Subscribe Our Newsletter & Push.

OpenAI made a system that’s better at Montezuma’s Revenge than humans

Nitin Naresh

Read Next

Built To Escape China, Cambodia Broken By Tariffs? Fashion And The New Trade War Reality

Trump’s Tariff Tsunami – Warning Bells For US Economy, Wipes Clean $5 Trillion Off Wall Street And Billionaire’s Wealth

The Great Indian Marriage Meltdown – High Expectations, Low Pay, And A Culture In Crisis In 2025

The World Reacts To Trump’s Tariffs And Its Not Sunny. Macron Calls For Suspension Of Investments In US , Canada Slaps 25% Tariffs On US Cars, And For Apple-Its Back To Square One!

Trapped In Myanmar’s Scam Compounds. Sold, Beaten, And Silenced. Escape Or Organ Harvesting, The Brutal Choice Facing Scam Victims In Myanmar

Trump’s Tariff Chaos Hits India and Other Nations Hard – But Is His “Dollar for Dollar” Policy Justified? A Blow to the Ailing Global Economy – Is Recession Now Inevitable?

Madhabi Puri Buch: The First Woman At SEBI’s Chair, But Now The Chair Needs Firm Repair?

Florida Influencer And ‘Dog Mom’ Films Herself Having Sex With Pet Chihuahua. Why Is Zoophilia Or Bestiality On The Rise, And Is It A Mental Disorder?

Trump’s Reciprocal Tariff Noose Tightens Escalating Global Trade Tensions. Which Countries To Be Affected Most Cited As The “Dirty 15”?

Earthquake, The Silent Threat Beneath Our Feet – From U.S To Japan To India. The $1.8 Trillion Megaquake, Warns Japan And Indian Subcontinent? Not If, But When!

Built To Escape China, Cambodia Broken By Tariffs? Fashion And The New Trade War Reality

Trump’s Tariff Tsunami – Warning Bells For US Economy, Wipes Clean $5 Trillion Off Wall Street And Billionaire’s Wealth

The Great Indian Marriage Meltdown – High Expectations, Low Pay, And A Culture In Crisis In 2025

The World Reacts To Trump’s Tariffs And Its Not Sunny. Macron Calls For Suspension Of Investments In US , Canada Slaps 25% Tariffs On US Cars, And For Apple-Its Back To Square One!

Trapped In Myanmar’s Scam Compounds. Sold, Beaten, And Silenced. Escape Or Organ Harvesting, The Brutal Choice Facing Scam Victims In Myanmar

Trump’s Tariff Chaos Hits India and Other Nations Hard – But Is His “Dollar for Dollar” Policy Justified? A Blow to the Ailing Global Economy – Is Recession Now Inevitable?

Madhabi Puri Buch: The First Woman At SEBI’s Chair, But Now The Chair Needs Firm Repair?

Florida Influencer And ‘Dog Mom’ Films Herself Having Sex With Pet Chihuahua. Why Is Zoophilia Or Bestiality On The Rise, And Is It A Mental Disorder?

Trump’s Reciprocal Tariff Noose Tightens Escalating Global Trade Tensions. Which Countries To Be Affected Most Cited As The “Dirty 15”?

Earthquake, The Silent Threat Beneath Our Feet – From U.S To Japan To India. The $1.8 Trillion Megaquake, Warns Japan And Indian Subcontinent? Not If, But When!

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Read Next

Built To Escape China, Cambodia Broken By Tariffs? Fashion And The New Trade War Reality

Trump’s Tariff Tsunami – Warning Bells For US Economy, Wipes Clean $5 Trillion Off Wall Street And Billionaire’s Wealth

The Great Indian Marriage Meltdown – High Expectations, Low Pay, And A Culture In Crisis In 2025

The World Reacts To Trump’s Tariffs And Its Not Sunny. Macron Calls For Suspension Of Investments In US , Canada Slaps 25% Tariffs On US Cars, And For Apple-Its Back To Square One!

Trapped In Myanmar’s Scam Compounds. Sold, Beaten, And Silenced. Escape Or Organ Harvesting, The Brutal Choice Facing Scam Victims In Myanmar

Trump’s Tariff Chaos Hits India and Other Nations Hard – But Is His “Dollar for Dollar” Policy Justified? A Blow to the Ailing Global Economy – Is Recession Now Inevitable?

Madhabi Puri Buch: The First Woman At SEBI’s Chair, But Now The Chair Needs Firm Repair?

Florida Influencer And ‘Dog Mom’ Films Herself Having Sex With Pet Chihuahua. Why Is Zoophilia Or Bestiality On The Rise, And Is It A Mental Disorder?

Trump’s Reciprocal Tariff Noose Tightens Escalating Global Trade Tensions. Which Countries To Be Affected Most Cited As The “Dirty 15”?

Earthquake, The Silent Threat Beneath Our Feet – From U.S To Japan To India. The $1.8 Trillion Megaquake, Warns Japan And Indian Subcontinent? Not If, But When!

Subscribe to our mailing list to get the new updates!

Android Pie has a battery life problem

Porsche 911 has 24 different models: All of them explained in a 5-minute video

Related Articles

Leave a Reply Cancel reply

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future