The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) - AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia

AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia

Victor Dibia, a Principal Research Software Engineer at Microsoft Research, discusses the development and application of AI agents, particularly multi-agent systems. He highlights the evolution of these systems, emphasizing the importance of planning, diverse expertise, and dynamic environments in their application. Dibia explains the concept of Autogen, a framework for building multi-agent systems, which allows for dynamic task execution by defining agents with broad capabilities. He also discusses the challenges of integrating AI agents into practical applications, such as interface agents that simulate human interactions with web browsers. Dibia predicts continued advancements in AI models, reinforcement learning, and the consolidation of agentic patterns in 2025, which will enhance the efficiency and adaptability of AI agents. He also emphasizes the importance of developing skills to effectively integrate AI into workflows, particularly in software engineering, where AI can significantly boost productivity.

Key Points:

AI agents are evolving to handle complex tasks by simulating human interactions and using multi-agent systems.
Autogen framework allows dynamic task execution by defining agents with broad capabilities, enhancing flexibility.
Interface agents simulate human interactions with web browsers, though they face challenges in efficiency and reliability.
Future advancements in AI models and reinforcement learning will improve agent efficiency and adaptability.
Developing skills to integrate AI into workflows is crucial, especially in software engineering, to boost productivity.

Details:

1. 📧 Automated Email Request by Agents

1.1. Challenges in Information Retrieval

1.2. Technical Implementation of Automated Email Requests

2. 🎙️ Welcome to the Podcast

The host, Sam Charrington, welcomes the audience to the TwiML AI podcast episode.
Guest Victor Dibia, Principal Research Software Engineer at Microsoft Research, is introduced.
Discussion will cover AI agent innovations for 2024 and expectations for the coming year.
Victor's work on multi-agent frameworks, including 'autogen', will be explored.

3. 👨‍💻 Victor Dibia's Background and Work at Microsoft Research

3.1. Victor Dibia's Role at Microsoft Research

3.2. Victor Dibia's Academic Background

4. 📈 Autogen: From R&D to Productization

Microsoft's HCI research group has successfully transitioned from a traditional research focus to developing Autogen, highlighting their capability to adapt and innovate.
Autogen is undergoing a transformation into a productized software infrastructure, exemplifying a strategic shift towards commercialization within research divisions.
The process involved leveraging existing research insights and aligning them with market needs to create a viable product.
Challenges in this transition included integrating research with commercial strategies and ensuring scalability and usability in product design.
This move reflects a broader trend in research groups to not only innovate but also monetize and apply research in practical, impactful ways.

5. 🧑‍🔬 Early Work and Research on Multi-Agent Systems

After completing a PhD, the individual joined IBM Research in Yorktown Heights as a postdoc and later as a research staff member, focusing on integrating human-computer interaction (HCI) with machine learning.
IBM developed cognitive service APIs that facilitated building complex multi-modal demos, which included features like spatial text-to-speech and image recognition for room-scale experiences.
An early model was trained for automated data visualization using sequence-to-sequence models, which were initially created for language translation tasks.
This work demonstrated that visualizations in JSON Vigalite format could be dynamically translated from data also in JSON, enabling runtime generation of data visualizations from sampled JSON data, enhancing the efficiency and flexibility of data handling.

6. 🔄 The Evolution of Autogen Framework

The transition from traditional big data warehousing to machine learning at Cloudera involved building prototypes and engaging in customer consulting, highlighting the practical shift towards integrating machine learning with existing systems.
Microsoft Research developed a tool called LiDAR, which automated the creation of data visualizations and hypotheses using LLMs, offering a significant leap in data summary automation and visualization efficiency.
The project, initiated in late 2020, utilized codec models for executing a comprehensive backend pipeline, showcasing agentic behavior and integrating LLMs for code compilation and processing, thus enhancing the automation capabilities.
A key advancement was shifting from manually built pipelines to defining a set of agents that autonomously collaborate to solve tasks, indicating a move towards more autonomous systems and reducing human intervention.
Challenges included ensuring accuracy and reliability in automated processes, which were addressed by iterative testing and refinement of the LLM-based systems.
These developments have significantly impacted current practices by reducing the time required for data processing and visualization, thereby increasing efficiency and enabling more complex analysis.

7. 🛠️ Building Autonomous Multi-Agent Systems

Autogen introduces a novel approach to application development using autonomous agents with broad capabilities rather than specific pipelines.
The autonomous agents are designed to be task-oriented and collaborative, significantly enhancing their flexibility and problem-solving abilities.
Research, including documented experiments, has been conducted by a team within the group, focusing on the collaboration and innovative exploration of these systems.
The paper provides insights into practical applications and the effectiveness of autonomous multi-agent systems in various problem-solving scenarios.

8. 🤖 Defining AI Agents: Features and Capabilities

9. 📊 Key Developments in AI Agents for 2024

Recent focus on structured action spaces in LLMs enhances API call reliability.
Agent-native foundation models have significantly risen over the past year, marking a shift towards models designed for specific tasks.
Initially, models like G3.5 emphasized language modeling, but have since integrated multi-agenda capabilities.
The React pattern allows models to reflect on tasks and choose actions accordingly, improving decision-making processes.
Model families such as o1 have advanced introspection and reasoning capabilities, reflecting a move towards more intelligent systems.
There is a significant development in models that natively support multi-modal inputs and outputs, such as text, image, video, and audio processing.
Gemini 2.0 exemplifies this with its ability to seamlessly process various forms of data, showcasing the potential for comprehensive multi-modal AI systems.

10. 🌐 Interface Agents and Complex Task Frameworks

OpenAI's Operator agent is designed to execute complex tasks by mimicking human interactions with web interfaces, such as booking flights through a browser, and it achieves this with just 40 lines of code, emphasizing simplicity and efficiency.
The Autogen platform features the WebSurfer agent, which drives a Chromium web browser using a multi-model LLM, allowing for text input, clicking, and navigation, making it a powerful tool for integrated multi-agent operations.
Both the Operator and WebSurfer agents highlight a trend towards sophisticated interface agents in 2024, with companies like Anthropic also developing similar tools, indicating a broader industry movement.

11. 🔗 From Chains to More Autonomous Systems

LanChain provided deterministic steps for executing tasks but there is a growing need for more autonomous workflows that can handle complex tasks.
Bill Gates proposed the concept of an 'everything app' where users can express tasks in natural language, and the system executes them by interacting with other systems as needed.
There is significant value in time and effort savings by using unified interfaces that handle multiple tasks autonomously.
Frameworks like Ologen, Landgraf, CREAI, and LAMA index are being developed to create generalist systems that handle complex tasks beyond scripted pipelines.
The shift towards autonomous systems represents a significant change in 2024, moving away from scripted deterministic pipelines to more autonomous solutions.
Examples of practical applications include systems that autonomously manage scheduling, financial transactions, and personalized content delivery.
These frameworks are designed to minimize human intervention, allowing for more efficient and flexible task management across various domains.

12. 📈 Evaluating AI Agents with Benchmarks

12.1. Introduction to Agentic System Evaluation

12.2. Gaia Benchmark Overview

12.3. Examples of Gaia Tasks

12.4. Challenges and Insights

12.5. Conclusion

13. 🧠 Reasoning in AI: Challenges and Insights

AI models that are perceived as having agent-like qualities may not inherently possess these capabilities; rather, they function as autoregressive models predicting subsequent tokens.
Significant differences in AI reasoning emerge when models are primed for iterative thinking processes, moving beyond mere token prediction.
Frameworks from human psychology, such as System 1 and System 2 thinking, provide a useful lens for understanding AI reasoning, though AI does not replicate human cognitive processes.
The practical application of these insights can lead to more efficient AI systems by enhancing models' ability to perform complex reasoning tasks, akin to human-like decision-making.
Understanding these distinctions can inform the development of AI models better suited for tasks requiring nuanced reasoning and decision-making.

14. 📉 Challenges in AI Planning and Reasoning

Humans use heuristics for simple tasks, such as recalling a birthday or determining the time of day, but complex problems require significant computation.
Previously, AI models took equal time to respond to both simple and complex questions, indicating inefficiencies in resource allocation.
To address this, advancements like test time compute, O1 reasoning, and DiffSync focus on optimizing AI's computational resource allocation for complex problems.
Test time compute refers to dynamically adjusting the computational resources used by the AI during inference.
O1 reasoning and DiffSync are specific methods designed to enhance reasoning capabilities and synchronization in AI models, allowing them to handle complex tasks more efficiently.

15. 🧑‍💼 Practical Applications of Autogen Framework

15.1. Introduction to Autogen Framework

15.2. Autogen Framework and Agent Design

15.3. Model Design and Performance

15.4. Performance and Capability Enhancement

16. 💼 Considerations for Building Agentic Systems

16.1. Improving Reliability and Performance

16.2. Using Code for Problem Solving

16.3. Beyond Software Engineering

16.4. Wide Action Space and Unusual Solutions

16.5. Role of Orchestrators and Frameworks

17. 📜 Designing Frameworks for AI Agents

Autogen features two API levels: a core API providing basic message delivery capabilities and an agent chat API offering high-level abstractions similar to Keras, allowing integration with components like LLMs, tools, and memory banks.
Developers can define agents with minimal setup, requiring only a method modification for message handling, offering flexibility in agent behavior.
The framework supports distributed agent deployment across multiple machines, facilitating scalable system design.
Agent chat API allows for intuitive agent creation with presets like basic assistant agents and web server agents, simplifying common tasks.
The concept of 'teams' enables structured collaboration among agents, using containers to manage message flow and execution order.
Round robin teams distribute tasks evenly among agents, utilizing various termination conditions, including message content, budget constraints, or external signals.
The combination of agents, teams, and termination conditions allows for expressive and autonomous multi-agent systems.
Autogen's message-based architecture offers loose coupling advantages, enhancing system flexibility and adaptability.
The current version of Autogen is structured around message passing, with ongoing exploration of emergent patterns in agent interactions.

18. 📈 Patterns in Multi-Agent System Development

18.1. Developing Autonomous Multi-Agent Systems

18.2. Control Flow Patterns in Multi-Agent Systems

19. 🔍 Pragmatic Approaches to Building AI Systems

19.1. Metacognition and Task Evaluation in AI Systems

19.2. Task Management and Risk Assessment

19.3. Pragmatic Use of Autogen for Startups

20. 🖥️ Future of Interface Agents and Computer Use

Interface agents face challenges with UI element interaction on platforms like Google Flights, highlighting a need for better localization and interaction capabilities.
User dissatisfaction arises from agent inefficiency in performing tasks that humans complete faster, stressing the need for improved speed and accuracy.
Models like Microsoft's OmniParser are being developed to enhance UI element recognition, predicting bounding boxes with high accuracy.
New benchmarks and models such as ByteDance's UI Tars show ongoing improvements in UI interaction model performance.
Future strategies should focus on agents autonomously handling back-office tasks rather than requiring human oversight, which is inefficient.
Comparing the value of human versus agent time, it's often cost-effective to have agents perform non-urgent tasks, despite taking longer.
Potential exists for agents to memorize user preferences and improve through feedback, increasing task autonomy.
Despite current inefficiencies compared to direct API usage, agents hold substantial potential in back-office roles due to numerous integration tasks available.

21. 🌐 Agentic Systems in the Digital World

Integration challenges in digital systems are often driven by political decisions rather than technical hurdles, exemplified by Southwest Airlines' deliberate choice to stay off Google flights for strategic reasons.
The rise of agentic systems necessitates the creation of new standards, such as 'agents.txt', to manage digital content interactions in a manner akin to 'robots.txt'.
Websites are increasingly developing mechanisms to block automated access, such as detecting bots or playwright instances, highlighting the need for distinct standards to differentiate human from agent interactions.
The concept of 'agentic noise' emphasizes the competitive nature of digital agents for human attention, stressing the importance of optimizing interactions to prioritize meaningful, unregretted engagements.

22. 📊 Evaluating AI Systems: Methods and Challenges

22.1. Current Evaluation Methods

22.2. Challenges with Objective Evaluation

22.3. Benchmarking the Process, Not Just the Outcome

22.4. Evaluating Reasoning Ability

22.5. Interpreting Evaluation Metrics

23. 🤔 When to Use Multi-Agent Systems

Open-source reasoning models like DeepSeq are becoming more affordable, providing businesses the opportunity to creatively solve problems by integrating these models.
Jason Liu emphasizes the importance of developing low-level, cost-effective metrics to evaluate system changes, suggesting that metrics should be tailored to specific organizational needs.
Using a complex task framework helps determine the appropriateness of multi-agent systems, ensuring they are applied scientifically and fit well within business parameters.
It is critical not to rush the application of multi-agent systems; instead, a deliberate process should ensure alignment with specific business or user problems.
Aligning tools with business goals or user problems is essential for effective multi-agent system implementation.
Case studies highlight successful multi-agent system applications, offering insights into best practices and potential pitfalls.

24. 📊 Applications and Industries for Multi-Agent Systems

24.1. Conditions for Multi-Agent Systems

24.2. Industries and Use Cases

25. 🛠️ Frameworks vs. Building from Scratch

Frameworks provide organized abstractions for tasks like automatic differentiation, optimizers, and neural network architectures, which accelerates development and reduces errors.
Using frameworks allows for assembling complex systems with minimal code; a high-level API implementation can be done in about 40 lines of code, demonstrating efficiency.
Frameworks embed stabilized patterns, enhancing reliability and development speed while simplifying complex configurations.
Building from scratch is feasible for simple systems, but frameworks excel in complex, autonomous systems requiring intricate control flows.
Frameworks come with defaults and transformations that can alter input-output flow; understanding these is crucial for effective debugging and system understanding.
Familiarity with underlying framework operations is essential to mitigate unexpected behaviors and ensure accurate system functionality.
When deciding between frameworks or building from scratch, consider system complexity, development speed, and the need for reliability and stability. Frameworks are ideal for projects needing rapid development and where errors must be minimized. Building from scratch may suit projects with simple requirements or where customization is key.

26. 🔮 Predictions for AI Agents in 2025

26.1. AI Development and APIs

26.2. Reinforcement Learning and Task Optimization

26.3. Agentic Patterns and Multi-Agent Systems

26.4. User Experience and Workforce Impact

27. 👩‍💻 Impact of AI on Software Engineering Careers

Junior engineering roles such as building simple web pages or writing basic scripts are disappearing due to AI's capabilities.
Engineers using AI tools demonstrate significantly higher productivity, able to quickly deliver modularized and error-free code.
AI integration into software development workflows creates stark productivity differences, emphasizing the need for AI literacy among engineers.
Investing in AI literacy is crucial to maintain competitiveness, mirroring the importance of digital literacy in the past.
AI literacy not only enhances productivity but also opens up career progression opportunities by enabling engineers to take on more complex and creative tasks.
Case studies show that companies leveraging AI tools have reduced development time by up to 50%, highlighting the competitive advantage of AI proficiency.

28. 🤔 Reflections on AI Coding Agents and Future Directions

28.1. Challenges and Perceptions of AI Coding Agents

28.2. User Skills and Customization

28.3. Building Intuition for AI Tool Use

28.4. Practical Usage and Learning

View Full Content

Upgrade to Plus to unlock complete episodes, key insights, and in-depth analysis

Starting at $5/month. Cancel anytime.