Senior Scientist I Job Details

Job Description

Job Title: Senior Scientist I

Requisition ID: 2472

Posting Start Date: 04/06/2026

Job Summary

A research scientist position is available immediately at the Institute for lnfocomm Research (I2R), A *STAR, Singapore. The position will focus on advancing LLMs for real-world digital health and precision medicine applications.

Specific areas of focus include:
• Supervised fine-tuning data preparation and large-scale instruction dataset construction
• Medical reasoning model development, debugging and benchmarking
• Continued pretraining and domain adaptation of foundation models
• Preference-based alignment methods including data generation and alignment training
• Enhancing LLMs for logical reasoning
• Leaming from large real-world, irregularly sampled, noisy and partially structured clinical data ( e.g., electronic health records, free-text notes)
The role entails leading the development and optimization of large language models for medical reasoning, including continued pretraining, supervised fine-tuning, preference-based alignment. The successful candidate will design scalable training pipelines, develop high-quality medical instruction, reasoning and preference datasets, conduct systematic experimentation and evaluation, and ensure model robustness, reliability, and clinical relevance.

Core responsibilities include R&D project execution, experimentation and benchmarking, development of model training strategies, and drafting of technical documentation, proposals and project scoping materials. The position entails working in a highly interdisciplinary R&D team in close collaboration with experts in NLP, LLMs, clinicians, population health, and other health ecosystem stakeholders on AI systems that have the potential to transform patient care and deliver improved health outcomes

Appointments will be based in Singapore for 3 years duration.

Qualifications and Field of Study

Qualifications

PhD in Computer Science, Machine Learning, NLP, Biomedical Informatics, or a related field
Strong experience in LLM pre-training and fine-tuning (CPT, SFT, DPO, RLHF-style methods)
Experience with large-scale training frameworks (PyTorch, DeepSpeed, FSDP, HuggingFace, etc.)
Experience handling noisy real-world clinical or biomedical text data
Strong experimental design and benchmarking skills
Familiarity with cloud-based GPU training environments

Desirable:

Experience with medical reasoning LLMs
Background in medical NLP or healthcare AI
Publications in ML/NLP/health AI journals/conferences

Experience

2-4 years post-PhD with track record in domain deployment oriented projects in applied health or other domains, and in publishing research in leading digital health venues. Experience in healthcare, corporate or application-oriented environments desirable.

Other Requirements (e.g. Skills, Competencies)

Competencies

Keen experience and intuition for working with large, complex real-world health datasets
Ability to innovate with advanced NLP and FM methodologies as well as to direct prototyping for demonstrating research ideas
Exposure to varied digital health study designs, research methods, human computer interaction frameworks, and clinical evaluation approaches
Cross-disciplinary experience spanning clinical needs definition, clinical workflow understanding, R&D problem formulation, health or medical technology evaluations, impact evaluation and/or real-world implementation.
Ability to work independently as well as in multidisciplinary teams with strong interpersonal skills
Good communication skills for publications, reports and proposals.
Quick learner; able to acquire the necessary domain knowledge
Agility in dynamic project environments with impact-oriented mindset

Drive to keep pace with AI/LLM R&D developments, work with full stack data engineering and deployment teams for real world projects, and with healthcare ecosystem partners on clinical translational studies highly desirable.

Skills

Experience with LLM/FM frameworks as well as NLP/LLM toolkits
Experience with ETL processes and dataset curation for FM training, especially for large or multimodal health datasets
Strong programming abilities, particularly in Python and Bash for LLM pipelines, clinical data preprocessing, and ETL workflows. Experience with SQL/PostgreSQL and large-scale data processing (e.g., PySpark) is advantageous.
Strong quantitative skills including statistics, probability, machine learning, and deep learning, with the ability to design rigorous evaluation frameworks for LLM reasoning quality, trustworthiness, and clinical safety.
Comfort with biomedical knowledge representation and model evaluation
Comfort within cloud-based data engineering and ML environments
Exposure to FMOps/DevOps infrastructure and pipelines for deploying AI/LLM solutions for healthcare applications

Motivated applicants with significant AI expertise and strong programming skills who are looking to switch into domain related roles and committed to building robust and scalable approaches for national scale healthcare impact will also be considered.

The above eligibility criteria are not exhaustive. A*STAR may include additional selection criteria based on its prevailing recruitment policies. These policies may be amended from time to time without notice. We regret that only shortlisted candidates will be notified.