OpenAI - OpenAI DevDay 2024 | Community Spotlight | Tortus
The presentation by Nina, a research engineer at Torus, highlights the use of LLMs in clinical settings to alleviate clinician burnout by reducing time spent on computer tasks. Torus, an LLM-powered application, allows clinicians to focus more on patient care by automating documentation processes. The video explains the development of a platform that breaks down complex workflows into smaller, manageable blocks, enabling clinicians to design and evaluate workflows themselves. This approach not only speeds up the process but also ensures clinical safety by minimizing errors such as hallucinations and omissions. The platform's iterative framework allows for continuous improvement and safe deployment of new models and workflows. Additionally, the creation of a large-scale dataset of LLM-generated clinical documentation errors aims to automate error detection and enhance product safety.
Key Points:
- Torus application reduces clinician burnout by automating documentation, saving time for patient care.
- Complex workflows are broken into blocks, allowing clinicians to design and evaluate workflows, speeding up deployment.
- The platform minimizes clinical errors by focusing on reducing hallucinations and omissions in LLM outputs.
- Iterative framework ensures continuous improvement and safe deployment of new models and workflows.
- A large-scale dataset of errors is being used to automate error detection, enhancing product safety.
Details:
1. ๐ Introduction to Toris and LLMs
- Nina, a research engineer at Toris, introduces the evaluation of LLMs in a clinical setting.
- Toris focuses on leveraging LLMs to enhance clinical decision-making processes.
- The evaluation aims to assess the effectiveness and reliability of LLMs in providing accurate clinical insights.
- Toris employs a structured methodology to test LLMs, ensuring they meet clinical standards and improve patient outcomes.
- The initiative is part of Toris's broader strategy to integrate AI technologies into healthcare, aiming for a 30% improvement in diagnostic accuracy.
2. ๐ฉบ Clinician Challenges and Toris Solution
2.1. Clinician Challenges
2.2. Toris Solution
3. ๐ Demonstration of Toris in Action
3.1. Toris Functionality in Clinical Documentation
3.2. Clinical Documentation Errors and Implications
4. โ ๏ธ Importance of Clinical Safety
- The typical Silicon Valley approach of 'move fast and break things' is not suitable for clinical settings as it can lead to harmful consequences for patients.
- It is crucial to prioritize clinical safety by involving clinicians, who are the domain experts, in the design and evaluation of systems used in healthcare.
- Ensuring that clinicians are at the center of the development process helps in creating safer and more effective healthcare solutions.
5. ๐ Iterative Development and Workflow Building
- The iterative process between clinicians and developers is slow and labor-intensive due to stringent compliance requirements, emphasizing the need for clinically safe outputs. The platform addresses these compliance challenges by allowing for detailed configuration and validation of outputs.
- The lack of out-of-the-box solutions led to the creation of a platform that breaks down complex workflows into smaller steps called 'blocks', allowing clinicians to take a more active role in development. This empowers clinicians to directly contribute to and modify workflows, enhancing collaboration and efficiency.
- The platform's architecture is centered around LM workflows, ensuring clinicians and engineers communicate effectively by using a common language. This common language facilitates smoother iterations and reduces misunderstandings.
- Blocks are designed around an LM call, with inputs typically being medical transcripts and outputs specified with extra model configurations, such as model type and structured output. This modular approach allows for flexibility and adaptability in meeting specific clinical needs.
- The platform encourages clinicians to share blocks with each other, promoting collaboration and efficiency in workflow development. This sharing capability not only speeds up the development process but also fosters a community of practice among clinicians.
6. ๐ Composing, Sharing, and Experimenting with Workflow Blocks
- Workflow blocks are uniquely identified by a block ID, generated by hashing the parameters. This ensures that any change in parameters results in a new block ID, facilitating version control and traceability.
- Blocks can be shared among clinicians by pulling them from a centralized database, promoting collaboration and reuse of workflows.
- To compose blocks together, the block ID of the previous block is used as the input for the next block, ensuring compatibility and consistency in the workflow.
- Explicit connections between blocks are necessary to avoid discrepancies in outputs, especially when dealing with high-level data like 'facts' that can be formatted differently.
- Iterating on a block, such as updating the model or prompt, generates a new block ID, indicating that the new block may not be compatible with previous ones, thus maintaining integrity in the workflow.
- The system's verbosity in tracking block IDs aids in audits, providing clear documentation of workflow changes and ensuring compliance.
7. ๐งช Experiment Design, Execution, and Error Analysis
7.1. UI Development and Experiment Design
7.2. Experiment Execution and Error Analysis
8. ๐ Results, Insights, and Clinical Safety Evaluation
8.1. Importance of Human Labeling
8.2. Resource Optimization Strategies
8.3. Clinical Safety Evaluation
8.4. Iterative Improvement and Error Management
9. ๐ฏ Impact, Future Directions, and Conclusion
- The implementation of the framework has significantly reduced time spent by developers, allowing them to focus on other tasks within the company.
- Clinicians are now able to design and run workflows independently, increasing their satisfaction and control over the process.
- The speed of deploying new prompts, models, and architectures into production has improved.
- The creation of a large-scale dataset of hallucinations and omissions from LLM-generated clinical documentation is underway.
- Plans are in place to automate error detection to enhance product safety and enable live monitoring and error flagging for users.