Since its launch last October, Rovo has gained significant traction among our customers using common queries such as “What’s the status of project X?” and “Generate a concise summary from the meeting notes.”

But as our customers continue to embrace AI, they are beginning to submit requests that require advanced retrieval to uncover insights not explicitly mentioned in the query. For instance, they might ask, “Show me how competitors are integrating AI into collaboration platforms like Confluence.”

To address complex scenarios, we recently introduced Deep Research, a novel feature within Rovo Chat that delivers detailed research reports within minutes in response to any inquiry.

Deep Research doesn’t just answer simple questions; it can create comprehensive and easily digestible reports that range from immediate actions to proactively identifying knowledge gaps. These reports can be effortlessly exported to your Confluence page with clear citations and sources.

How Deep Research improves knowledge search and report generation

Deep Research is an advanced extension of Rovo Chat and is designed for in-depth information retrieval and comprehensive report generation. This is powered by an enhanced Retrieval-Augmented Generation (RAG) architecture with two main stages:

  1. Query-dependent retrieval
  2. Content-grounded answer generation

Deep Research continuously builds on Rovo Chat’s orchestration flow, improving each stage to enable deeper information retrieval and more thorough report generation:

Transition from Single-Path to Multiple-Path Information Retrieval

This research plan decomposes the complex task into several manageable sub-tasks. Information retrieval can be executed concurrently for each sub-task, enhancing efficiency and effectiveness.

Implementing Structured Report Generation

This begins by creating an outline for the final report. Each section is then developed in parallel, followed by the integration of these sections into a cohesive report.

Generating a contextual research plan

Before generating a research plan, Deep Research begins by gathering foundational data pertinent to the user’s query. This process taps into Atlassian’s Teamwork Graph: the intelligent data layer that connects people, projects, and proprietary knowledge from across your organization.

By searching internal and third-party tools, analyzing these rich connections, and factoring in your team and project context (plus what you’re working on right now), Rovo delivers tailored insights and results while strictly respecting your permissions and access controls.

By collecting this foundational data upfront, Deep Research ensures that the resulting research plan is highly personalized and contextually accurate.

After that, Deep Research creates a dynamic plan by breaking the main task into sub-tasks that address both the explicit and implicit aspects of the user’s request. In enterprise environments, research tasks often extend beyond the knowledge of any language model, making this a foundational step.

Example

If your team is developing a new product codenamed supernova, and a new team member wants to learn everything about supernova, Deep Research will recognize that they are likely referring to the internal project rather than the astronomical phenomenon.

Reflecting and refining results

Following each research round, results are systematically gathered and analyzed. Any findings that may lack relevance are set aside, especially if they do not yield meaningful insights in accordance with the research plan.

Once findings are compiled, Rovo is enabled to reflect on the information collected and to strategize for the subsequent research phase. This iterative process has the potential to uncover new insights or to highlight existing gaps, thereby suggesting possible new research directions.

This cycle continues for a predetermined number of rounds; however, Rovo retains the discretion to conclude the research if it determines that sufficient information has been acquired.

Generating reports with Rovo

Rovo’s Deep Research process mirrors the meticulous approach of a human researcher preparing a summary report. Initially, it synthesizes findings into a structured outline that forms the backbone of the report. Each section is then crafted in parallel, prioritizing comprehensiveness, accuracy, and readability.

Subsequently, a dedicated proofreader agent thoroughly reviews the individual sections, merging them into a cohesive and logically organized document. Throughout the report, in-line citations and hyperlinks are provided for every key claim or data point, enabling users to easily verify sources and trace information back to its origin.

Finally, the completed report is formatted to easily export to Confluence or other applications for seamless knowledge sharing and collaboration. This systematic approach ensures that users receive high-quality, actionable research in minutes rather than days.

How Deep Research leverages reasoning models

Deep Research uses a diverse array of models across its framework. This includes the Atlassian-hosted Llama 3.2 8B for query understanding and GPT-4o and GPT-4.1 for executing sub-agent tasks. Additionally, we employ Claude 3.7’s Sonnet thinking for effective planning and reflection.

Our findings indicate that reasoning models, such as Claude’s thinking, possess significantly enhanced capabilities to understand and navigate the extensive data collected during the research process. This leads to the development of more relevant and nuanced research plans, ultimately improving the quality of our reports.

Furthermore, we provide users with insight into the internal reasoning processes of our planners and reflectors. This transparency allows users to follow the steps Deep Research undertook to reach a conclusion.

How Atlassian measures Deep Research quality

We evaluate the response quality of Rovo Deep Research in two different ways: using Atlassian’s internal data with an LLM-as-a-judge, and focusing on complex tasks that require multi-step reasoning.

  • Automated Side-by-Side Evaluation: We compare reports across four dimensions with their previous versions. A side-by-side evaluation makes it easy to measure system changes and improvements.
    • Relevance: How well does each report address the specific user query?
    • Depth of insight: How deep, thorough, and comprehensive is the exploration of the topic?
    • Factual accuracy: Are the claims supported by evidence or well-established facts?
    • Clarity: Is the report well-organized, easy to follow, and clearly written?
  • Reference Free Evaluation: We leverage LLM-as-a-judge and measure with five dimensions and their weighted scores:
    • Factual Accuracy
    • Depth of Insight
    • Citation Diversity
    • Citation Sufficiency
    • Relevance

Challenges discovered and addressed in Deep Research

After establishing our evaluation framework, we systematically assessed Deep Research on real-world tasks. This process surfaced several weaknesses in our initial approach:

  • Limited depth of insight: Early evaluations revealed that reports sometimes lacked comprehensive analysis or missed nuanced perspectives. To address this, we introduced structured report generation, which decomposes the research task into an outline, generates each section independently, and then synthesizes the results. This approach enabled deeper, more thorough exploration of complex topics.
  • Insufficient source attribution: We found that factual claims were occasionally difficult to verify, reducing trust in the output. In response, we implemented inline citations throughout the report, ensuring that every key statement is directly linked to its supporting source.

What’s next for Rovo’s Deep Research

Deep Research is available today in Rovo Chat and is ready to support your most challenging research needs. Here are a few ways many teams are already leveraging it:

  • Ramp up on new domains or internal projects by generating comprehensive reports that reference both Atlassian and third-party sources.
  • Generate a timeline of key milestones, blockers, and decisions for a long-running project by synthesizing updates from Jira, and Confluence.
  • Analyze customer feedback to identify recurring pain points and feature requests, and summarize actionable insights.

We encourage you to try Deep Research for your next complex question or technical investigation. Have feedback or want to share how you’re using it in your workflow? We’d love to hear from you in the Atlassian Community.

How Rovo Deep Research works