OpenAI’s Second Agent
Three weeks prior to the interview, OpenAI unveiled Deep Research, marking its second foray into agent technology. This agent possesses the capability to search across multiple websites, conducting comprehensive online research within a timeframe of 5 to 30 minutes. It synthesizes the gathered information and delivers detailed reports, complete with citations to verify the sources.
This document compiles and organizes insights from an interview conducted by Sequoia Capital with Isa Fulford and Josh Tobin, the leading figures behind OpenAI’s Deep Research. The two team members provide an in-depth look at the technical intricacies and product development philosophy underpinning Deep Research. They also share observations on the current use cases they are witnessing.
Deep Research’s origins lie within OpenAI’s internal investigations into the model’s capacity to handle tasks requiring long-term planning and execution. The team’s overarching objective is to ultimately provide users with the definitive agent: a seamless, all-in-one solution capable of handling web searches, computer operations, and any other tasks the user delegates to it.
Deep Research has undergone specific optimizations at the product level. For instance, as highlighted in previous DeepSeek analyses, Deep Research fosters user trust through the implementation of clear citations and Chain-of-Thought (CoT) reasoning. The team has also incorporated a clarification flow to guarantee a consistent understanding of the assigned task. Deep Research surpasses both AI-powered search engines and ChatGPT in its ability to retrieve and organize information. However, it’s important to note that at its current stage, Deep Research is less proficient at deriving novel insights from existing information and is not yet capable of making groundbreaking scientific discoveries.
Key Takeaways:
- OpenAI has launched Deep Research, its second agent, designed for thorough online investigations.
- The agent’s capabilities are derived from end-to-end training of the underlying model.
- Deep Research excels in information synthesis and the discovery of obscure facts.
- Use cases span professional work, personal life, programming, and education.
- The team anticipates significant advancements in agent technology by 2025.
Agent Capabilities Stem from End-to-End Model Training
Deep Research is an agent that can search multiple websites and generate comprehensive reports, completing many tasks that would take humans hours. Integrated within ChatGPT, it answers questions in approximately 5-30 minutes, enabling deeper research and providing more detailed and specific answers than standard ChatGPT. OpenAI previously launched Operator, and Deep Research is its second agent, with more planned for the future.
Origins
Approximately a year ago, OpenAI began internally adopting a reasoning paradigm, with the goal of training models to think before providing answers. This approach proved to be highly successful.
Initially, OpenAI’s focus was on Math and Science domains. However, they discovered that this new reasoning model architecture also unlocked the ability to handle longer-term tasks, which involved agent capabilities.
Concurrently, OpenAI recognized that many tasks necessitate extensive online research or external context, robust reasoning abilities, discernment of information sources, and a degree of creativity. Ultimately, OpenAI developed model training methods capable of handling these tasks. They decided to train models to perform browsing tasks, using the same methods as for training reasoning models but applied to more real-world scenarios.
The Deep Research project began with an original demo by Isa Fulford and Yash Patil. Josh Tobin rejoined OpenAI about six months prior to the interview, after working at a startup. He became deeply interested in the foundational work and joined the Deep Research project.
Key Individuals:
- Isa Fulford: AI researcher in OpenAI’s Post-training team, a major contributor to the ChatGPT Retrieval Plugin.
- Yash Patil: Member of the core model team in OpenAI’s Post-training team, having dropped out of Stanford.
- Josh Tobin: Previously a Research Scientist at OpenAI, later founded Gantry (a product to improve ML through analysis, alerts, and human feedback). He rejoined OpenAI and currently leads the Agents product research team.
Clarification Flow
Deep Research features a unique design element: the clarification flow. Before initiating the research process, the Deep Research model poses questions to the user. Typically, ChatGPT only asks follow-up questions at the conclusion of an answer or inquires about the user’s satisfaction with the response. Deep Research, in contrast, engages in this behavior proactively, at the outset.
This was a deliberate design choice made by the team. Users receive the most optimal responses from the Deep Research model only when their prompts are exceptionally clear and detailed. However, users often do not provide all the necessary information in their initial prompt. Therefore, OpenAI aimed to ensure that after waiting 5 or 30 minutes, users would receive a sufficiently detailed and satisfactory answer. This extra step was incorporated to encourage users to provide all the details required for the model to perform effectively.
Many users on X (formerly Twitter) have reported interacting with o1 or o1 Pro initially to refine their prompts. Once satisfied with the prompt’s clarity, they then submit it to Deep Research.
The Ultimate Form of Agents
Over the past few months, OpenAI has launched three distinct versions of Deep Research, all bearing the same name. Josh Tobin believes that while each product possesses its own strengths and weaknesses, the quality differences between them are readily apparent. Ultimately, this is attributed to the way the models are constructed, the effort invested in building the datasets, and the utilization of O-series models as the underlying engine. This allows the Deep Research models to be optimized, resulting in highly intelligent and high-quality tools.
Currently, Deep Research, O3, and Operator exist as relatively independent entities. However, OpenAI’s vision is for users to eventually have access to a single, ultimate agent capable of performing web searches, operating computers, or completing other desired tasks, integrating all these functions in a more natural and intuitive manner.
End-to-End Training is the Fundamental Reason for the Model’s Power
The underlying model of Deep Research is a fine-tuned version of O3. O3 represents OpenAI’s most advanced reasoning model, and a significant portion of Deep Research’s analytical capabilities are derived from it. OpenAI specifically trained the Deep Research model on complex browsing tasks and other reasoning tasks. Consequently, Deep Research is also capable of utilizing browsing tools and Python tools. Through end-to-end training on these tasks, Deep Research learned strategies to handle them, ultimately making the model excel at online search and analysis.
Intuitively, a user submits a request, and the model initially engages in careful consideration. Subsequently, it searches for relevant information, extracts it, and reads it. After comprehending how this information relates to the request, the model determines what to search for next to progress closer to the user’s desired final answer. Deep Research can synthesize all this information into a well-structured report, complete with citations that point to the original sources.
The innovation that imbues Deep Research with its agent capabilities lies in OpenAI’s end-to-end training of the model. This implies that many operations during the research process are unpredictable beforehand. It’s impossible to achieve the flexibility that the model gains through training by writing a language model, program, or script. Through training, the Deep Research model learned how to react to real-time web information and adjust its strategies promptly based on what it encounters. Therefore, the Deep Research model is actually conducting highly creative searches. Users can observe the model’s intelligence in deciding what to search for next or how to circumvent certain issues by examining the summaries provided by the CoT.
Differences Between Deep Research and AI Search
Regarding John Collison’s question about how much of Deep Research’s capability comes from real-time access to web content and how much from CoT, the two OpenAI researchers believe that Deep Research’s outstanding capability is a result of the combination of both.
Other AI search products are not trained end-to-end, so they are not as flexible in responding to information as Deep Research, nor are they as creative in solving specific problems.
Before joining OpenAI, Josh Tobin worked at a startup and attempted to build agents in the manner most people describe building them, essentially constructing an operation graph with LLMs intervening at certain nodes. While the LLM can decide what action to take next, the logic of the entire sequence of steps is defined by humans.
Josh Tobin found this to be a powerful method for rapid prototyping, but it quickly encountered problems in real-world applications. It’s difficult to foresee all the situations the model might face and to consider all the different branches of paths it might want to take. Moreover, since these models are not specifically trained to make decisions, they are often not the best decision-makers at the nodes; they are trained to do something akin to decision-making.
This reiterates that the true power of the Deep Research model stems from direct end-to-end training, aiming to solve the tasks that users actually need to solve. Therefore, there’s no need to set up an operation graph or make node decisions in the background architecture; everything is driven by the model itself.
Furthermore, if a user has a very specific and predictable workflow, then the approach Josh Tobin described above holds value. But if highly flexible processing is required, then an approach similar to Deep Research might be the optimal choice.
Josh Tobin suggests that some strict rules should not be hard-coded into the model. If there’s a need, such as ‘not wanting the model to access a certain database,’ it’s better to implement it with manually written logic. People often believe they can be smarter than the model by writing code, but in reality, as the field progresses, models typically devise better solutions than humans.
One of the most important lessons of machine learning is that the results you obtain are dependent on what you optimize for. Therefore, if users can establish a system to directly optimize for the desired outcome, it will be significantly better than attempting to piece together models that don’t fit the entire task. Consequently, RL tuning on the overall model basis may become a key component in building the most powerful agents.
High-Quality Data is One of the Key Factors for Model Success
One of the key factors contributing to the success of the Deep Research model is the availability of a high-quality dataset. The quality of the data input into the model is likely the primary determinant of the model’s quality. In the Deep Research project, Edward Sun is responsible for optimizing all datasets.
Advantages of Deep Research
Deep Research’s strength lies in its ability to provide the best answers when users have a detailed description of their needs. However, even if the user’s question is vague, Deep Research can clarify the desired information. It is most powerful when users are seeking a specific set of information.
Deep Research is not only capable of broadly gathering all information about a source but also excels at finding very obscure facts, such as long-tail content that wouldn’t appear on the first few pages in a traditional search, details of a specific episode of an obscure TV show, and so on. In a question about an Austrian general, ChatGPT once gave the wrong answer, while Deep Research successfully located the correct one.
Deep Research is highly proficient at synthesizing information, particularly in locating specific, hard-to-find information. However, Deep Research is less effective at extracting novel insights from existing information and is not yet capable of making new scientific discoveries.
Use Cases of Deep Research
Target Users
Deep Research is designed for anyone engaged in knowledge work in their daily work or life, particularly those who need to gather large amounts of information, analyze data, and make decisions. Many users are applying Deep Research to their work, such as in research, to understand the situation in areas like markets, companies, and real estate.
Use Cases
OpenAI envisions Deep Research serving both business and personal life scenarios, as it is a highly versatile capability applicable to both domains. Deep Research’s appeal lies in its ability to save significant amounts of time. Tasks that might have previously taken hours or even days can now be 90% answered with Deep Research. OpenAI believes there will be more similar tasks in business scenarios, but Deep Research will also become an integral part of people’s personal lives.
Deep Research is not intended to replace the workforce. For knowledge work, especially tasks that require substantial time to find information and draw conclusions, Deep Research will empower individuals with superpowers, enabling tasks that might have taken 4 or 8 hours to be completed in 5 minutes, allowing users to achieve more.
The interview mentioned use cases including: medical, investment, and other professional work scenarios; shopping, travel, and other family scenarios; programming and personalized education.
Medical, Investment, and Other Professional Work Scenarios
In medicine, Deep Research can assist in finding all the literature or recent cases of a certain disease, thereby saving valuable time.
In investment, with the aid of Deep Research, investors can choose to research every potential startup they might invest in, not just the ones they have time to meet with.
In company operations, a user considering launching a consumer goods company has been extensively utilizing Deep Research to determine whether specific brand names have already been registered, whether domain names are occupied, market size, and various other pieces of information.
Shopping, Travel, and Other Family Scenarios
A user considering purchasing a new car wanted to know when the next model would be released. There were numerous speculative articles online, so the user asked Deep Research to compile all relevant rumors. Deep Research produced an excellent report, informing the user that a new car might be released in the next few months.
When Deep Research was launched in Japan, users found it extremely helpful in finding restaurants that met specific requirements and could also assist users in discovering things they might not havefound otherwise.
When users need to purchase an expensive item, plan a special trip, or spend a considerable amount of time contemplating a problem, they might spend hours online searching for relevant information, browsing through reviews, etc. Deep Research can rapidly organize this information, create a summary report, and provide detailed and personalized advice.
Busy working mothers often lack the time to plan birthday parties for their children, but now they can accomplish this quickly with the aid of Deep Research.
Deep Research is also excellent at following instructions. If users not only want to know about a product but also want to compare it with all other products, or even want to see reviews from websites like Reddit, they can make many different requests to Deep Research, and it will complete these tasks all at once. Users can also instruct Deep Research to present the information in a table format.
Programming
Many individuals are utilizing Deep Research for programming purposes. This scenario was not initially considered by OpenAI, but many people are using it to write code, search for code, even find the latest documentation for a package, or write scripts, with impressive results.
Education
Personalized education represents a very intriguing application scenario. If users have a topic they wish to learn, such as reviewing biology or understanding current events, they only need to provide the parts they don’t understand or the information they want to delve into, and Deep Research can compile a detailed report. Perhaps in the future, it will be possible to provide personalized education based on what Deep Research learns about the user.
Agents Will Emerge in 2025
Future Development Directions for Deep Research
In terms of product form, OpenAI hopes that Deep Research will be able to embed images in the future, locate pictures of products, generate charts, and embed these charts within the answers.
In terms of information sources, OpenAI aims to expand the data sources the model can access. They hope the model will be able to search private data in the future. OpenAI will further enhance the model’s capabilities, making it better at browsing and analysis.
In terms of information accuracy, to enable users to trust Deep Research’s output, users can view the sources of information cited by the model. During the model training process, OpenAI also strives to ensure the correctness of citations, but the model may still make mistakes, hallucinate, or even trust a source that may not be the most credible. Therefore, this is an area OpenAI hopes to continue improving.
To integrate more broadly into the OpenAI Agent roadmap, OpenAI hopes that Deep Research can be extended to many different application scenarios, combining the most advanced reasoning models with tools that humans can use to complete work or daily life tasks, and then directly optimizing the model to achieve the results users want the agent to achieve.
At this stage, there is actually nothing preventing Deep Research from expanding to more complex task scenarios. AGI is now an operational issue, and there will be many exciting developments to look forward to in the future.
Sam Altman believes that the tasks Deep Research can complete will account for a few percent of all economically viable tasks in the world. Josh Tobin believes that Deep Research cannot do all the work for users, but it can save users several hours or even days. OpenAI hopes that a relatively near-term goal is for Deep Research and the agents built subsequently, as well as other agents built upon this foundation, to save users 1%, 5%, 10%, or 25% of their time, depending on the type of work they perform.
Agent & RL
Isa Fulford and Josh Tobin concur that agents will emerge this year.
RL experienced a peak, then seemed to enter a trough, and is now receiving renewed attention. Yann LeCun once used an analogy: if people are making a cake, most of it is cake, there will be a little frosting, and finally a few cherries on top. Unsupervised learning is like the cake, supervised learning is the frosting, and RL is the cherry.
Josh Tobin believes that when conducting RL in 2015-2016, using the cake analogy, it might have been an attempt to add the cherry without the cake. But now, there are language models pre-trained on vast amounts of data; these models are extremely powerful, and we know how to perform supervised fine-tuning on these language models to make them adept at executing instructions and doing what people desire. Now everything functions very effectively, and it is highly suitable to adjust these models according to user-defined reward functions for any use case.