The widely used chatbot ChatGPT is designed to create digital text, everything from poetry to research papers to computer programs. But when a team of artificial intelligence researchers at computer chip company Nvidia got their hands on the chatbot’s underlying technology, they realized it could do a lot more.
Within weeks, they taught him how to play Minecraft, one of the most popular video games in the world. Within the digital world of Minecraft, I learned to swim, gather plants, hunt pigs, mine gold, and build houses.
“He can enter the world of Minecraft and explore on his own, collect materials on his own, and get better at all kinds of skills,” said Linxi Fan, a senior researcher at Nvidia, known as Jim.
The project This was an early sign that the world’s leading AI researchers are turning chatbots into a new type of autonomous system called an AI agent. These agents can do more than just chat. They can use software applications, websites, and other online tools, including spreadsheets, online calendars, travel websites, and more.
Over time, many researchers say, AI agents could become much more sophisticated, could replace office workers, and automate almost any administrative function.
“It’s a huge business opportunity, potentially worth trillions of dollars,” said Jeff Clune, a professor of computer science at the University of British Columbia, who previously worked on this type of technology as a researcher at OpenAI, a San Francisco startup that built the technology. ChatGPT. “This has a huge upside – and serious consequences – for society.”
Nvidia agent playing a game. Similar agents can schedule meetings, edit files, analyze data, and create multi-colored bar charts. The idea is that these automated systems will eventually act as personal assistants capable of handling a wide range of online tasks.
Today’s agents are limited, and they can’t exactly organize your life. ChatGPT can search travel site Expedia for flights to New York, but you still have to book the reservation yourself.
This technology, as researchers improve it, could make office workers and consumers more efficient. It could also change the nature of video games, providing a new wave of bots that players can play alongside and chat with.
GPT-4, the technology that powers ChatGPT, is what researchers call a grand language model. It is an artificial intelligence system that learns skills by analyzing massive amounts of data.
Over the past few months, this technology has wowed hundreds of millions of people with the way it creates emails, writes speeches, and singsongs on almost any topic. But his most important skill may be his talent for writing computer programs.
He can instantly create a program that draws a unicorn or digitally drops snowflakes on his laptop screen. Professional software developers can commission code that they can integrate into larger programs, including everything from social media apps to search engines. But this is only part of what this technology can do. It can also generate computer code that can be used in other software applications and websites.
This is how Dr. Fan and other Nvidia researchers taught GPT-4 how to play Minecraft. “The most important word here is code,” Dr. Fan said. “Code can take action.”
People use software applications and websites by touching buttons, menus, and other graphical tools. AI agents use apps and websites by accessing their application programming interfaces, or APIs — the underlying software code that allows them to communicate with other online services.
If you ask an agent to upload a video to the Internet, for example, they can generate code called an application programming interface (API) provided by YouTube. “An API is just text used to talk to a machine,” said Selin Nayhin, a researcher who helps run the autonomous AI agent project, AutoGPT.
In theory, a chatbot can write code to access any API on the Internet. But today’s chatbots are not skilled enough to do more than simple tasks. Even if that were the case, allowing them to roam freely online would be a huge security risk. So the companies started small.
A few months after OpenAI unveiled ChatGPT, it quietly released a way for a chatbot to do more than just generate text. After installing several plug-ins — software that enhances what the bot can do — you can ask it to search travel sites like Expedia for available flights, get a map of your city from Google Earth or even convert a spreadsheet detailing your annual spending into… Multicolor bar chart.
Because it comes with a plugin called a code compiler, ChatGPT can not only write code, but also run it. This has allowed the technology to instantly perform tasks it could not do in the past, including editing spreadsheets and converting still images into videos. Google, Microsoft and other companies are exploring similar technologies.
“These are projects where we envision AI essentially working with other AI on your behalf,” said Ashley Lawrence, a vice president at Microsoft.
Independent projects like AutoGPT are trying to take this kind of thing several steps further. The idea is to give the system goals such as “start a company” or “make some money.” It will then look for ways to reach that goal by asking itself questions and connecting to other Internet services.
Today, this doesn’t work so well. Systems like AutoGPT tend to get stuck in endless loops. But researchers like Dr. Fan are constantly working to improve this type of technology in an attempt to make it more useful and more reliable.
Other researchers are building a new type of artificial intelligence agent designed to use software tools. In the summer of 2022, Dr. Clune was part of a team of researchers at OpenAI who built an agent capable of this Use computer programs as much as a person would – Mouse click after mouse click, keystroke after keystroke.
Dr. Clune and his colleagues fed the system hours of online videos that showed people playing Minecraft. By analyzing the way people use the mouse and keyboard to navigate through the digital world of Minecraft, the system learned to run the game on its own.
Other companies are building, including a startup called Adept Similar agents Which uses websites like Wikipedia, Redfin, and Craigslist, and popular office applications from companies like Salesforce.
Dr. Clune says this type of agent will eventually allow AI to use a much wider range of software applications and websites. Everyone will have access to a digital assistant that can do almost anything on the Internet, he said. This would make life easier, but it could also replace countless jobs.
“If AI can do anything we can do, it’s not just replacing boring tasks,” he said. “It replaces all tasks.”
“Web specialist. Lifelong zombie maven. Coffee ninja. Hipster-friendly analyst.”