Merck
Senior AI/ML Data Scientist - Natural Language Processing (Hybrid)
Cambridge, MA
Jul 27, 2024
Full-time
Full Job Description
Job Requirements

This posting has been created to pipeline talent for prospective roles that we anticipate will be needed in the future for our organization. By applying to this Pipeline Advertisement you will be submitting your interest to be contacted for future roles similar to what is described in the Pipeline Advertisement.


The Senior AI/ML Data Scientist - Natural Language Processing (NLP) role involves helping to develop and deploy production-grade NLP products for unstructured and semi-structured data from across our company's research and development pipeline. These models and workflows will help solve real-world problems and contribute to Artificial Intelligence and Machine Learning (AI/ML) in therapeutic research and development. Key focus areas will include the scalable deployment of ML and Generative AI approaches (such as Large Language Models, or LLMs) for surfacing insights from proprietary unstructured research data and biomedical literature, as well as developing fit-for-purpose approaches for the likes of text classification, relation extraction, and entity linking. The position is embedded in a cross-disciplinary team of data scientists, bioinformaticians, and engineers that are all focused on using cutting-edge software, AI/ML, and data science techniques to drive drug discovery and development.

Key responsibilities:

  • Staying updated on the newest methods in NLP, ML, and generative AI
  • Building novel tools that enable the discovery, development, and delivery of new therapeutics to patients in need
  • Understanding real-world challenges and developing automated data solutions for them
  • Opportunities to directly interact with users of your data science, ML, and AI products
  • Evaluating, developing, testing, and deploying new techniques for natural language understanding
  • Freedom to propose projects that interest you and to collaborate cross-functionally on delivery
  • Sharing the approaches you implement and their impact with internal company audiences and externally

Additional job details:

The types of datasets we focus on are both internal (e.g., electronic lab notebooks, safety reports, regulatory documents, clinical results) and external (e.g., public literature and Electronic Medical Records). In addition to new tool development, we often consult with some of our 5,000+ stakeholders (scientists, engineers, regulatory liaisons, data scientists, etc.) on their own projects, as well as additional stakeholders from across our company. We strive to enhance data science, NLP, and AI literacy across these groups. As part of our work, we have opportunities to co-author presentations, reports, manuscripts, and/or public code releases.


Work Experience

Education Requirements:

  • B.S. with 5 years industry experience focused on NLP, data science, AI/ML/LLM engineering, computer science, semantic engineering or a related discipline
  • OR M.S. with 2 years industry experience
  • OR PhD in data science, AI/ML/LLM engineering, computer science, semantic engineering or a related discipline

Minimum Requirements:

  • 1 year experience with Natural Language Processing, Generative AI or related techniques for machine understanding of natural language (i.e., written text, omics data, or similar)
  • 2 years experience with Python, Spark, or related frameworks in AI, machine learning, data science, data engineering or similar context

Preferred skills and experience, not required

  • Fluency in Python programming, version control and collaboration with git, environment management (e.g., poetry, conda, docker), standard Python packages (e.g., pandas, numpy, matplotlib), and at least one ML framework (e.g., pytorch, tensorflow, fairseq)
  • Experience with scalable data engineering frameworks such as Apache Spark and orchestration frameworks such as Airflow, and/or experience with semantic search and retrieval frameworks (e.g., development and benchmarking of embedding models and retrieval approaches in the context of Retrieval Augmented Generation, RAG)
  • Experience with ML model deployment and operations (e.g., DevOps, MLOps, LLMOps), including CI/CD workflows and tooling (e.g., Github actions)
  • Experience with standard operations on non-relational (e.g., Elasticsearch/Opensearch, MongoDB, Neptune), relational databases (e.g., PostgreSQL), and vector databases (e.g., pgvector, Elasticsearch dense vectors) and deployment of APIs and web applications (e.g., flask, fastAPI, django, or dash)
  • Working knowledge of statistical learning, such as supervised, unsupervised, and weakly supervised learning, particularly in NLP contexts
  • Working knowledge of NLP and/or Generative AI libraries (e.g., regular expressions, spacy, langchain), text annotation tools, and/or semantic frameworks (e.g. RDF triplestores, property graphs, ontology management)
  • A demonstrated ability to engage cross-functional teams and stakeholders, including an eagerness to acquire a level of domain knowledge
  • Excellent communication, teamwork, didactic, and leadership skills, including skills for scientific communication (authoring scientific articles and presenting) and guidance and mentorship of junior employees and less experienced collaborators

Requisition ID:P-100850

PDN-9c9e995f-f258-4a84-8b9d-652f1100f3a9
Job Information
Job Category:
Information Technology
Spotlight Employer
Related jobs
Registered Nurse (RN)
The Laurels of Hillsboro
$10,000 SIGN-ON BONUS for FULL-TIME RNs! Current RN FT 6p-630a PT 6p-630a Are you a critical thinker, a skilled communicator, and passionate about caring for seniors? Are you seeking career advancem...
Nov 20, 2024
HILLSBORO, OH
Licensed Practical Nurse (LPN)
The Laurels of West Carrollton
CURRENT AVAILABLE POSITIONS: DAYSHIFT 6A6P WEEKEND WARRIOR FULLTIME DAYS 6A6P 3 DAYS A WEEK NO WEEKEND SCHEDULE W/ ONCALL SCHEDULE EVERY 5TH WEEKEND FULLTIME NIGHTSHIFT 6P6A Schedules are NO weekends...
Nov 20, 2024
WEST CARROLTON, OH
State Tested Nursing Assistant (STNA)
The Laurels of Blanchester
Want to make a difference in someone's life? If you have patience, empathy and a true desire to care for those in need, you will love working as a Certified Nursing Assistant (CNA) at The The Laurels...
Nov 20, 2024
BLANCHESTER, OH
©2024 NAACP.
Powered by TalentAlly.
Apply for this job
Senior AI/ML Data Scientist - Natural Language Processing (Hybrid)
Merck
Cambridge, MA
Jul 27, 2024
Full-time
Your Information
First Name *
Last Name *
Email Address *
Zip Code *
Password *
Confirm Password *
Create your Profile from your Resume
By clicking the Apply button, you agree to the terms of use and privacy policy.
Continue to Apply

Merck would like you to finish the application on their website.

Ace your interview with AI-powered interview practice

Get comfortable talking to hiring managers, receive personalized feedback on areas for improvement, sharpen your ability to answer the most common questions, and build confidence in formulating strong responses on the spot. Click the button below to begin your three free virtual interviews!