Job Summary
A research scientist position is available immediately at the Institute for lnfocomm Research (I2R), A *STAR, Singapore. The position will focus on advancing LLMs for real-world digital health and precision medicine applications.
Specific areas of focus include:
• Supervised fine-tuning data preparation and large-scale instruction dataset construction
• Medical reasoning model development, debugging and benchmarking
• Continued pretraining and domain adaptation of foundation models
• Preference-based alignment methods including data generation and alignment training
• Enhancing LLMs for logical reasoning
• Leaming from large real-world, irregularly sampled, noisy and partially structured clinical data ( e.g., electronic health records, free-text notes)
The role entails leading the development and optimization of large language models for medical reasoning, including continued pretraining, supervised fine-tuning, preference-based alignment. The successful candidate will design scalable training pipelines, develop high-quality medical instruction, reasoning and preference datasets, conduct systematic experimentation and evaluation, and ensure model robustness, reliability, and clinical relevance.
Core responsibilities include R&D project execution, experimentation and benchmarking, development of model training strategies, and drafting of technical documentation, proposals and project scoping materials. The position entails working in a highly interdisciplinary R&D team in close collaboration with experts in NLP, LLMs, clinicians, population health, and other health ecosystem stakeholders on AI systems that have the potential to transform patient care and deliver improved health outcomes
Appointments will be based in Singapore for 3 years duration.
Qualifications and Field of Study
Qualifications
- PhD in Computer Science, Machine Learning, NLP, Biomedical Informatics, or a related field
- Strong experience in LLM pre-training and fine-tuning (CPT, SFT, DPO, RLHF-style methods)
- Experience with large-scale training frameworks (PyTorch, DeepSpeed, FSDP, HuggingFace, etc.)
- Experience handling noisy real-world clinical or biomedical text data
- Strong experimental design and benchmarking skills
- Familiarity with cloud-based GPU training environments
Desirable:
- Experience with medical reasoning LLMs
- Background in medical NLP or healthcare AI
- Publications in ML/NLP/health AI journals/conferences
Experience
2-4 years post-PhD with track record in domain deployment oriented projects in applied health or other domains, and in publishing research in leading digital health venues. Experience in healthcare, corporate or application-oriented environments desirable.
Other Requirements (e.g. Skills, Competencies)
Competencies
- Keen experience and intuition for working with large, complex real-world health datasets
- Ability to innovate with advanced NLP and FM methodologies as well as to direct prototyping for demonstrating research ideas
- Exposure to varied digital health study designs, research methods, human computer interaction frameworks, and clinical evaluation approaches
- Cross-disciplinary experience spanning clinical needs definition, clinical workflow understanding, R&D problem formulation, health or medical technology evaluations, impact evaluation and/or real-world implementation.
- Ability to work independently as well as in multidisciplinary teams with strong interpersonal skills
- Good communication skills for publications, reports and proposals.
- Quick learner; able to acquire the necessary domain knowledge
- Agility in dynamic project environments with impact-oriented mindset
Drive to keep pace with AI/LLM R&D developments, work with full stack data engineering and deployment teams for real world projects, and with healthcare ecosystem partners on clinical translational studies highly desirable.
Skills
- Experience with LLM/FM frameworks as well as NLP/LLM toolkits
- Experience with ETL processes and dataset curation for FM training, especially for large or multimodal health datasets
- Strong programming abilities, particularly in Python and Bash for LLM pipelines, clinical data preprocessing, and ETL workflows. Experience with SQL/PostgreSQL and large-scale data processing (e.g., PySpark) is advantageous.
- Strong quantitative skills including statistics, probability, machine learning, and deep learning, with the ability to design rigorous evaluation frameworks for LLM reasoning quality, trustworthiness, and clinical safety.
- Comfort with biomedical knowledge representation and model evaluation
- Comfort within cloud-based data engineering and ML environments
- Exposure to FMOps/DevOps infrastructure and pipelines for deploying AI/LLM solutions for healthcare applications
Motivated applicants with significant AI expertise and strong programming skills who are looking to switch into domain related roles and committed to building robust and scalable approaches for national scale healthcare impact will also be considered.
The above eligibility criteria are not exhaustive. A*STAR may include additional selection criteria based on its prevailing recruitment policies. These policies may be amended from time to time without notice. We regret that only shortlisted candidates will be notified.