
Job title: Senior Machine Learning Engineer – Healthcare
Company: MD Anderson Cancer Center
Job description: Summary:The mission of The University of Texas M. D. Anderson Cancer Center is to eliminate cancer in Texas, the nation, and the world through outstanding programs that integrate patient care, research, prevention, and education. Core to the success of our mission is the ability to orchestrate multidimensional data, data analytics, and machine learning to create sustainable impact within a framework of responsible AI. We are building a dynamic team of machine learning engineers and data scientists that can help us consistently and responsibly accelerate the impact of AI across the enterprise, driving long-lasting improvements in cancer care.We are actively seeking a Senior MLOps Engineer who will play a pivotal role in advancing MLOps initiatives across the enterprise. This role is critical for orchestrating an AI lifecycle management framework, encompassing the development, deployment, and maintenance of production-quality machine learning models to support clinical and business operations. Additionally, the Senior MLOps Engineer will support the assessment and validation of external machine learning models and AI-driven products. The role extends beyond technical expertise, as it is also about forging team dynamics, cultivating a culture of innovation, and supporting processes and technological foundations necessary to accelerate strong MLOps practices across the enterprise.Key responsibilities include:
Oversee the lifecycle of AI models, encompassing training, evaluation, deployment, monitoring, and maintenance of production quality machine learning models, in compliance with standards and best practices.
Develop CI/CD pipelines for ML model training, deployment, and monitoring while upholding security, scalability, reliability, reproducibility, and performance.
Provide rigorous testing, versioning, and documentation, ensuring impact, risk mitigation, and reproducibility.
Develop and support a culture responsible AI by minimizing bias, enhancing fairness, and maximizing transparency in AI models.
Maintain diligent records of model development experiments, data and model lineage tracking, as well as data and model scorecards.
Engage with stakeholders to gather requirements, convey AI concepts understandably, and capture feedback.
Design fallback and decommissioning strategies for AI solutions to ensure operational continuity.
Support the evaluation and onboarding of third-party machine learning models, ensuring they meet institutional standards, enhance institutional value, and minimize organizational risk.
Deliver training on AI solutions to enhance understanding and application across the organization.
Engage with technology trends, contribute to tech communities, and foster a culture of continuous learning and innovation.Technical Expertise
Proficient in developing, deploying, and maintaining AI/ML algorithms in production environments.
Skilled in constructing scalable data pipelines, feature and artifact management, and analytics.
Experienced with MLOps tools and processes for data, code, and model management.
Strong proficiency in Python and either C++ or C#, with practical knowledge of TensorFlow, PyTorch, and Scikit-learn.
Knowledgeable about AI/ML platform infrastructure, including cloud and on-premises architectures.
Familiar with cloud-native tools, services, and computing environments (eg. Azure, AWS, GCP).
Proficient in DevOps practices and CI/CD pipelines, including Azure DevOps and GitHub Actions.
Experienced with containerization using Docker and orchestration with Kubernetes, along with DAGs tools.Analytical Expertise
Skilled in project management methodologies (SAFe agile, PRINCE2, Lean) for end-to-end AI/ML project lifecycle management, ensuring timely delivery, adherence to budget, and quality compliance.
In-depth knowledge of AI/ML Model Lifecycle Management aligned with ISO standards for software and AI development.
Proficient in decision-making, problem-solving, and executing AI/ML healthcare solutions.
Skilled at the quantitatively assessing machine learning models for performance, workflow impact, and potential risks.
Adept at collaborating with vendors and partners for evaluating and integration third-party AI solutions into current systems and processes.
Competent in identifying risks and formulating mitigation plans to prevent project delays.Oral and Written Communication
Collaborate with data scientists, ML engineers, and software engineers to integrate machine learning models into existing systems.
Document CI/CD pipelines, deployment workflows, and infrastructure setups.
Report project metrics, including progress, impact, and risks, to leadership, offering strategic recommendations for AI/ML use-case prioritization.
Manage stakeholder relations to facilitate solution adoption and address issues.
Share knowledge and offer technical assistance to researchers and colleagues.
Deliver both technical and non-technical updates in meetings and at professional gatherings.
Engage effectively with team leaders, peers, end-users, and support staff as needed.Other duties as assigned
Expected salary:
Location: Houston, TX
Job date: Wed, 05 Nov 2025 06:52:42 GMT
Apply for the job now!
