OpenAI - OpenAI DevDay 2024 | Community Spotlight | Dimagi
Anna Dixon, an applied research scientist at Dimagi, explains their project funded by the Bill and Melinda Gates Foundation, which aims to use large language models (LLMs) for family planning education in Kenya and Senegal. The project focuses on fine-tuning GPT-4o mini and GPT-4o for health education chatbots in low-resource languages like Sheng, a Swahili-English slang. Initial attempts using zero-shot and few-shot prompting were ineffective, leading to the injection of Sheng sentences as a style guide, which improved language quality but was costly and slow. The team then implemented a machine translation layer to translate English responses into target languages, enhancing modularity and evaluation capabilities. Fine-tuning improved translation quality significantly, with GPT-4o mini's spBLEU score increasing from 22.21 to 65.23. The project is expanding to other languages, showing promising preliminary results in Chihchewa, with fine-tuned models doubling initial scores. The approach demonstrates cost-effective improvements in language model performance for health education in low-resource settings.
Key Points:
- Dimagi uses AI to improve health education in low-resource languages, focusing on family planning in Kenya and Senegal.
- Initial language model attempts were ineffective, leading to the use of a machine translation layer for better results.
- Fine-tuning significantly improved translation quality, with spBLEU scores increasing notably.
- The project is expanding to other languages, showing promising results in Chihchewa.
- The approach offers cost-effective improvements in language model performance for health education.
Details:
1. ๐ฉโ๐ฌ Introduction to Anna Dixon and Dimagi
- Anna Dixon is an applied research scientist at Dimagi, focusing on integrating AI and machine learning advancements into practical applications.
- Her role involves leveraging cutting-edge technology to enhance Dimagi's offerings, indicating a strategic focus on innovation and technology-driven solutions.
- Anna has worked on projects that apply AI to improve healthcare delivery, showcasing her impact on real-world applications.
- Her contributions have led to significant improvements in efficiency and effectiveness in Dimagi's projects, demonstrating the practical value of her work.
2. ๐ Dimagi's Mission and Project Overview
- Dimagi is a social enterprise focused on building digital health tools for low to middle income countries.
- The tools are primarily designed for front-line workers, with some direct-to-user applications.
- Dimagi aims to spread LLM technologies equitably by supporting native low resource languages.
- The project involves fine-tuning GPT-4o mini and GPT-4o for health education chat bots in Kenyan and Malayan languages.
- The initiative targets improving healthcare delivery by leveraging AI to overcome language barriers.
- Dimagi's approach includes collaborating with local communities to ensure the tools are culturally relevant and effective.
- The project is part of a broader strategy to enhance digital health infrastructure in underserved regions.
3. ๐ค LLMs for Family Planning in Kenya and Senegal
3.1. Project Overview and Goals
3.2. Technical Architecture and Challenges
4. ๐ฃ๏ธ Overcoming Sheng Language Challenges
4.1. Initial Challenges with Sheng Language Processing
4.2. Solutions and Their Effectiveness
5. ๐ Innovative Translation Architecture
- The updated architecture involves instructing all instances of GPT-4 to respond only in English, followed by a new machine translation layer that translates from English to the target language.
- This modular approach allows for isolated development efforts, optimizing both health education chatbots and the machine translation layer for different languages.
- Language quality evaluation can now be isolated, which was previously challenging, enhancing the ability to assess and improve translation quality.
- By narrowing the scope to just the machine translation layer for fine-tuning, the risk of degrading the LLM's performance in other areas is minimized.
6. ๐งช Implementing and Evaluating Machine Translation
6.1. Machine Translation Implementation
6.2. Evaluation of Machine Translation
7. ๐ Evaluation Metrics and BLEU Score
- The evaluation data set is structured as a CSV file containing sentence pairs, with one column for input English sentences and another for ground truth translations.
- The BLEU metric is utilized to assess the quality of candidate translations against ground truth translations, with scores ranging from 0 to 100. A score of approximately 40 is considered good.
- BLEU is more effective on large datasets rather than at the sentence level due to its reliance on multiple parameters and tokenizer selection.
- To ensure consistency in metrics, the SacreBLEU package is recommended, offering standardized BLEU metrics.
- The FLORES-200 spBLEU metric, developed by Facebook AI Research Team, is used for the 'No Language Left Behind' initiative, providing a specialized evaluation for diverse languages.
8. ๐ง Fine-Tuning and Results
8.1. Fine-Tuning Process
8.2. Results and Implications
9. ๐ Future Projects and Human Validation
- Collaborating with translators to ensure BLEU scores accurately reflect translation quality through rigorous human validation processes.
- Utilizing open source data sets to enhance evaluation and training, ensuring diverse and comprehensive data coverage.
- Human validation involves cross-referencing machine-generated translations with expert human assessments to improve accuracy metrics.
- The focus is on refining translation models by integrating human feedback, thereby aligning automated scores with real-world accuracy.
- Open source data sets provide a broad spectrum of linguistic examples, crucial for training robust translation models.