MIT 6.S085: IAP 2024

Machine Learning for Molecular Design

Description

This course provides an introduction and hands-on practices to the applications of machine learning in molecular design and engineering. The covered topics include: analyzing molecular properties using data-driven methods, using generative modeling, and applying combinatorial optimization approaches to design novel functional molecules such as new drugs. Unlike 6.C51+10.C51, this class adopts a bootcamp-style format that focuses on designing novel functional molecules. We don't only introduce new concepts but immediately demonstrate their applications with code, which is complemented by in-class coding exercises. The course includes a competition-based project that simulates real-world molecular discovery scenarios. In the final week of the course, we will host notable guest lecturers who will introduce students to cutting-edge research topics.

Time and Location

Mon Jan 8 - Fri Feb 2, 2024
10 AM - 12 PM EST M/W/F
32-082

Announcements

  • Pre-register for 6.S085 on the registrar's website by 1/8/2024!
  • The syllabus is published!
  • The final project leaderboard is now available under the Leaderboard section!

Staff Members

Instructor 1

Wenhao Gao

Wenhao is a PhD candidate in the Coley Research Group at MIT. His research focuses on accelerating and scaling up the process of molecular discovery by leveraging the capabilities of AI for decision-making. He is the recipient of a Google Ph.D. fellowship and an MIT-Takeda fellowship. Wenhao also serves as one of the organizers of multiple AI for Science workshops at NeurIPS, ICML, and Machine Learning and AI for Organic Chemistry Symposium at ACS.

Instructor 2

Ron Shprints

Ron is a third year undergraduate student at MIT, majoring in mathematics and computer science. His research interest lies in deep learning and its applications to the natural sciences. He has been doing research in the intersection of machine learning and molecular discovery since his freshman year. Ron joined the Coley Research Group as a UROP student in summer 2022 and has collaborated with Wenhao on several projects. Before that he worked at the Jensen Research Group where he collaborated with Andrew Zahrt on the machine learning discovery of electrochemical reactions.

Course Schedule

Date Day of the Week Topic Link to Colab
1/8/2024 Monday Course overview + Broad review of historical development and common workflows [slides]
1/10/2024 Wednesday Data process and analysis: focus on dimensionality reduction and clustering [problems], [solutions]
1/12/2024 Friday Structure-property relationship modeling (Part 1): featurization of molecules [problems], [solutions]
1/16/2024 Tuesday Literature presentation (session 1)
1/17/2024 Wednesday Structure-property relationship modeling (Part 2): deep learning architectures [Colab]
1/19/2024 Friday Molecular generation and inverse design: combinatorial approach and generative modeling [Colab]
1/22/2024 Monday Zak Costello and Ahmed Ismail -- Illuminating protein space with a programmable generative model [API key], [notebook]
1/24/2024 Wednesday Literature presentation (session 2)
1/26/2024 Friday Wengong Jin -- Geometric deep learning for antibody design
1/29/2024 Monday Chenru Duan -- Diffusion models on sampling rare events
1/31/2024 Wednesday Connor W. Coley -- Balancing the design of molecular structures with the design of their syntheses
2/2/2024 Friday Final project presentations [slides]

Final Project Leaderboard (showing top-10 teams only, finalized!)

Useful Resources

Explore our curated list of resources to enhance your learning experience:

Python and notebook:

Machine Learning:

Molecular science package:

Version control and other miscellaneous:

Frequently Asked Questions

For any other questions, please reach out to the team at moleculedesigner@gmail.com.

Q: Is it possible to take the class as a listener?

A: Partially yes. You are welcome to audit the course and access all the material from this website. However, this class is structured as a bootcamp-style class that expects students to learn from actively participating in various activities, including literature presentations and projects. Due to capacity constraints, we're unable to include listeners in all of these activities. Nonetheless, we hope you find the resources helpful and informative!

Q: How can I access the course materials?

A: All the course materials are open-sourced and available on our webpage. If you find something missing, feel free to send us an email.

Q: What are the machine learning pre-requisites?

A: There are no hard machine learning pre-requisites. We will cover some basic machine learning concepts in the first few classes, but familiarity with the basics of machine learning will definitely help you out. Some of the introductory machine learning classes at MIT, like 6.3900, provide a great background for this class. If you are not sure about if you have a sufficient machine learning background, feel free to send us an email.

Q: What are the science pre-requisites?

A: There are no specific science pre-requisites. However, being familiar with undergraduate level organic chemistry, for example, how to draw organic molecules, identify common functional groups, and understand basic structure to property patterns, will help you understand the material better. If you are not sure about if you have a sufficient science background, feel free to send us an email.

Q: How comfortable should I feel with coding and in what languages?

A: As our course is a bootcamp-style class, we will have many coding exercises. So being fluent in Python, or being able to learn Python in a short time would help you gain more from our class. We recommend some background with packages like PyTorch and RDKit, but it's not required to know how to use them in advance.