DeepLearningAI - AI Dev 25 | Sharon Zhou & Mahdi Ghodsi: Run Deepseek Reasoning and Finetuning on AMD GPUs w/ Lamini
Sharon, the founder and CEO of Lomani, emphasizes the importance of improving factual accuracy in large language models to enhance their practical applications and business value. She highlights the issue of hallucinations in these models, where they generate incorrect facts, and discusses methods to mitigate this problem. Sharon's team has achieved high levels of factual accuracy by developing a mixture of memory experts, known as 'mommy,' which integrates additional weights into the model to improve fact retrieval. This approach allows the model to maintain generality while ensuring factual correctness. Sharon also discusses the importance of high-quality training data, effective evaluation sets, and fast iteration cycles in fine-tuning models. She provides practical insights into creating high-quality datasets and using agentic pipelines for data generation and validation. Sharon shares a case study of Colgate, which used these techniques to significantly improve their model's accuracy, enabling more users to access and utilize their database effectively.
Key Points:
- Focus on improving factual accuracy in language models to enhance business applications.
- Developed 'mommy,' a mixture of memory experts, to improve fact retrieval in models.
- Emphasizes the importance of high-quality training data and effective evaluation sets.
- Fast iteration cycles are crucial for successful model fine-tuning.
- Case study: Colgate improved model accuracy, enabling broader database access.
Details:
1. 🌟 Exploring Generative AI and Factuality
- Sharon is the founder and CEO of Lomani and has extensive experience with generative AI, including a PhD with Andrew at Stamford.
- The focus of the discussion is on hallucinations in large language models and ways to mitigate or eliminate them.
- The team at Lomani has achieved 'nines of accuracy' in factual accuracy, indicating a high level of precision in their models.
- Lomani employs advanced methodologies and technologies to significantly reduce inaccuracies and ensure reliable outputs.
- The introduction sets the stage for a deeper dive into specific techniques and strategies used to enhance AI model accuracy.
2. 🤔 Decoding Hallucinations in AI Models
- There is a significant gap between AI models' general purpose benchmarks and their practical applications within enterprises, emphasizing the need for enhanced factual reasoning capabilities to deliver greater business value.
- Improving the factual accuracy and reasoning capabilities of AI models is crucial for developing more useful use cases and increasing business value.
- Advanced prompting and F tuning capabilities can lead to more sophisticated use cases, such as agent use cases, which require not just reasoning, but factual reasoning.
- AI models, like the Llama models, often hallucinate or provide incorrect information despite being trained on extensive datasets, such as Wikipedia.
- These hallucinations occur because AI models are optimized to minimize average error across all examples on the internet, making them generally good at many things but perfect at none.
- When AI models are queried with specific factual questions, they may provide incorrect or hallucinated answers due to their training process, which involves sampling equally across possible answers.
3. 📈 Enhancing Factual Accuracy in AI
- AI models sometimes generate hallucinations by sampling similar but incorrect data points, leading to factual inaccuracies, e.g., stating a company's revenue as $10 billion instead of the actual $100 billion.
- These models excel in generalizations, like recognizing 'hi' and 'howdy' as similar, but struggle with precise factual information.
- To address factual inaccuracies, the concept of a 'mixture of memory experts' (MoME), referred to as 'mommy', is introduced. This approach integrates an extra set of weights within the AI model to enhance factual retrieval accuracy.
- The MoME approach allows for maintaining generality while reducing factual error to a loss of zero, by incorporating retrieval directly into the model's weights rather than relying on external processes.
- An educational course with Meta and Andrew was developed to guide through implementing this approach, improving accuracy in tasks such as transforming text to SQL.
- Fine-tuning is necessary to achieve high levels of factual accuracy, and the process can be simplified with an API call on the mentioned platform.
4. 🔄 The Role of Data Quality and Iteration Speed
- Meta's training data usage is limited to 1%, underscoring the priority of quality over volume in data selection.
- High-quality data is essential for factual fine-tuning; poor quality can result in models memorizing inaccuracies.
- Models have the capability to accurately memorize provided facts, stressing the need for precise data input.
- Evaluations (evals) serve dual purposes: assessing model performance and defining clear improvement objectives, necessitating objective and consensus-based criteria.
- Rapid iteration cycles are critical, akin to product design methodologies, facilitating quick testing and refinement.
- Small, representative data subsets should be used for initial fast iteration, with scaling up once improvements are validated.
- A case study or example: In a similar approach, a tech company reduced its model training times by 30% by focusing on quality data and iterative testing.
5. 🔍 Data Generation and Validation Techniques
- Fine-tuning models requires high-quality data, but manual labeling is time-consuming. Instead, using a small, manually curated subset (e.g., 20 data points) can effectively start the process.
- The use of a Genentech pipeline can generate accurate data by focusing on limited, well-defined contexts, reducing model hallucination by avoiding overwhelming it with too much information.
- Validation of generated data is crucial for ensuring quality; this involves using LLMs for validation and employing deterministic validation methods.
- Custom and default validators help in filtering out low-quality data, contributing to a higher quality dataset.
- Instead of manual labeling, 'Vibes-based feedback' can be used, which involves teaching models through simple, intuitive prompts, similar to how one might teach a person.
- The model can use 'Vibes-based feedback' to generate and validate its own training data, potentially revolutionizing fine-tuning by making it as simple as prompt engineering.
- This approach allows models to create and refine their own training data, suggesting a future where models can autonomously improve their datasets.
6. 🧠 Advancements in Model Architectures and Tuning
- A SQL generator was developed to transform user questions into valid SQL queries, complemented by validators for SQL execution to ensure accuracy and reduce debugging time.
- A data error analysis pipeline was implemented, enhancing data quality by identifying and reducing noise effectively.
- Laura (Learnable Adapters) technology was introduced, allowing the addition of efficient, small weight adaptations on large models for rapid learning and cost-effective inference.
- The Mixture of Experts (MoE) model was utilized, routing inputs to specialized experts, thus improving processing efficiency, inference speed, and accuracy.
- Laura was combined with MoE to create specialized adapters, enhancing fact retrieval, reducing hallucinations, and boosting accuracy.
- Transformer models, particularly Llama, were adapted to integrate these advancements efficiently through a streamlined API call.
7. 🪥 Colgate Case Study: Real-World Applications
- Colgate employs agentic pipelines to automate dataset creation, significantly boosting efficiency and reducing manual intervention.
- Through Vibes-based feedback mechanisms, Colgate enhances data editing processes, resulting in higher quality datasets and more reliable AI outputs.
- The application of Mommy or memory tuning allows Colgate to fine-tune AI models, achieving high accuracy levels that surpass initial baselines.
- Initially, Colgate's AI models had a baseline accuracy of 30-40% with OpenAI's latest model, underscoring the substantial improvement potential through targeted tuning and model refinement.
8. 💡 Insights from Q&A on Model Training
8.1. Increased Database Access
8.2. Model Training and Expert Assignment
8.3. Pre-training vs. Fine-tuning
8.4. Efficiency of Small Language Models (SLMs)
9. 🎯 Memory Tuning and Achieving Accuracy
9.1. Reducing Cost and Time of Model Retraining
9.2. Innovative Neural Network Architecture
9.3. Addressing Model Hallucinations
9.4. Applicability of Memory Tuning
10. 🚀 AMD's Influence on AI Workloads and Collaboration
- AMD's Mi 300X GPU powers the world's fastest supercomputer, featuring 192 GB HBM per GPU, enabling the execution of large AI models on a single node.
- The Mi 300X allows for running models like DC car1 with 671 billion parameters, showcasing its computational prowess.
- Collaboration with AI frameworks like PyTorch, Onyx, and TensorFlow is strengthened by open-sourcing the ROCm platform, enhancing community contributions.
- AMD's integration with PyTorch is noted for its simplicity, requiring only a pip install for GPU compatibility, demonstrating ease of use.
- Performance of models like DC car1 has improved fourfold in two weeks, highlighting AMD's competitiveness.
- Partnerships with platforms such as Hugging Face ensure comprehensive AI development support on AMD hardware.
- AMD's user-friendly infrastructure allows AI workloads to run with minimal adjustments, facilitating widespread adoption.
- Support for open-source projects and developer resources highlights AMD's commitment to accessibility.
- Advanced AI tasks, like automated web browsing, are supported with minimal setup, emphasizing infrastructure capabilities.
- AMD's emphasis on collaboration and accessibility encourages diverse applications and open-source project use on their GPUs.