Neuro Data Team I & II

580.237/437/697 and 580.238/438/698

Location: Kavli North Hub (3rd floor of Clark) unless stated otherwise below in the Schedule or in the slack channel

Instructor: Joshua Vogelstein

TA:

  • Ronak Mehta (@ronak.mehta)
  • Jaewon Chung (@j1c)
  • Ben Pedigo (@BPedigo)
  • Hayden Helm (@hayden)

Meeting Time: MW 12 - 1:15, and lab session to be determined

Credits: This course will provide 4 design-credits for the first semester and six credits for the second semester, and will be a team-based year-long undertaking in data science and neuroscience. The 2xx level is 3 credits both semesters.

Pre-Requisites: You must be more excited about this class than any other class. If you are planning on doing research in a lab it is expected that you instead dedicate more time to this class.

About

In this year long course, students will work together in small teams to either add functionality to an existing data science tool, or use said tool to analyze some brain science data. The course is loosely divided into quarters, each associated with its own deliverable

  • Quarter 1 (first month of class): (1) an AWS research credits grant on the chosen project, (2) identify an issue in a widely used open source repository to address in the fall, and (3) a neurobiological question to investigate for the year, and potentially (4) a new algorithm to develop.
  • Quarter 2 (rest of fall semester): (1) a link to a merged PR that your team made to a widely used open source repository, (2) a link to an overleaf document that has completed steps 1-3 from this for a manuscript you plan to complete in the spring, (3) a link to a jupyter notebook demonstrating functionality of the analysis (pipeline) for your manuscript.
  • Quarter 3 (spring semester until spring break): (1) a link to a PR of a newly developed method into one of the approved repositories, and (2) a link to an overleaf draft completing up to step 7 of this.
  • Quarter 4 (spring break until the final): (1) A link showing that the PR was merged in, (2) an email forwarded demonstrating that the manuscript was submitted.
The set of approved open source repositories to contribute to include:
  • scipy
  • sklearn
  • networkx
  • skimage
  • dipy
  • mne
The TA's will fork each of these repo's, as appropriate. You will make a PR into the NeuroData fork, and iterate with the TAs until it passes their tests. The TA's will then PR into the main fork. The first "quarter" will focus on scoping, including determining the degree of
  • feasibility (for us in a year),
  • significance (for the targeted brain/data science community), and
  • intrinsic motivation (for the project amongst team members).
See this for more details. Throughout, we will be (approximately) using the agile/lean development process.

This course is based on my experience as an academic, and entrepreneur, an advisor, and an instructor. It is designed to be the best class you'll ever take. You will learn (by doing and getting feedback) the skills that I have found particularly useful in my endeavors. It will be organized into sprints, loosely based on Agile software development practices. Weekly progress will be reported documenting goals towards your team sprints, with sprint demo's to happen at the close of each sprint. Each team will be graded jointly on the basis of meeting the sprint goals, as well as providing clear and concise weekly progress reports.

The main "skills" you will learn in this class include:

  • How to choose a project significant for the world, feasible for you, and that you are passionate about.
  • How to scope work so that you can achieve weekly progress towards quarterly goals.
  • How to effectively communicate technical content.
  • How to generate publication quality figures (see figure checklist for details).
  • How to complete a wide set of data science tasks, spanning from data wrangling to statistical modeling.
  • How to peacefully and productively work with a diverse team of passionate individuals.

Course Requirements

Admittance to this course requires approval from the instructor, to ensure the students in the course are sufficient diverse along a number of dimensions. To gain admittance, you must:
  1. Be more excited about this course than any other academic endeavor for the year. Nobody is required to teach or take this course, and since so many people want to take it each year, preference is given to those that are most passionate about the material.
  2. Because this course is based on team projects, I will approve entire teams all at once. You can join the course slack channel here to chat with other potential team mates.
  3. Send the CV and transcript of all members of your planned team to me for joint approval.

Potential Projects

Each team will extend and apply machine learning tools to tool to answer an important neurobiology question. Potential neurobiology questions to answer include:
  • Characterize connectomic variability as a function of phenotypic variability.
  • Mitigate batch effects in human MRI derived connectomes.
  • Build models of connectomes, potentially incorporating joint and individual variation.

Tools

We will extend/modify one of the following tools, listed in approximate order of interest:

Data

We will be applying our tools to various datasets, such as the Healthy Brain Network data:

Recommended Background

Numerical programming (in Python), basics of statistics, linear algebra, machine learning, and familiarity or interest in neuroscience.

Code of Conduct

Our course is dedicated to providing a harassment-free conference experience for everyone, regardless of gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, ethnicity, religion (or lack thereof), or technology choices. We do not tolerate harassment of course participants in any form. Sexual language and imagery is not appropriate for any course content, including talks, slides, or presentations. Course participants violating these rules may be sanctioned or given a failing grade for the year.

Feedback on Class

NDD I (Fall 2017)
Taken from student affairs
Overall quality of the class: 4.63
Summary: The best aspects of this course included the professor's guidance and attentiveness, the opportunity to conduct independent research, and the ample resources provided. Negative feedback was limited. Suggestions for improvement included meeting more often and providing some guidance on team dynamics based on previous classes. Prospective students should have a background in advanced math and computer science.

NDD II (Spring 2017)
Taken from student affairs

Overall quality of this course: 4.58

Summary: The best aspects of this course included the applicable course material, interesting project, and real world knowledge gained. Some students felt that the material was challenging for those without math experience, and that some more complicated concepts had to essentially be self-taught. Suggestions for improvement included providing more transparent grades and frequent feedback, as well as better availability and communication from the instructors. Having more specific expectations from data providers, spending more time researching a topic, and advertising the course as something more CA-geared would also be beneficial. Prospective students should have some knowledge of programming, particularly Python, before enrolling.

Feedback forms from past years

Spring 2016

Spring 2017

Fall 2017

Spring 2018

Fall 2018



Class Updates on the Basis of Feedback

We plan to meet with each team at least twice a week. TA is available via slack for questions, and both instructor and TA are available MWF in person during/after class for questions. More lectures will be provided about effective team dynamics.

Syllabus

Assignments

There will be four main assignments due over the course of the year. In each case, the instructor will provide you with successful examples. The schedule of sprints will be:
  1. Sprint 0: August 30 - September 30th. Scope Sprint
  2. Sprint 1: October 1 - December 12. Code and/or Data Sprint
  3. Sprint 2: January 28 - March 15. Code and/or Data Sprint
  4. Sprint 3: March 25 - May 8. Finalize and submit manuscript sprint.
All teams will participate in the scope and finalize sprint. For subsequent sprints, each team can choose whether to do either a code, data, or joint code and data sprint. The only constraint is that by the end of the year, each team must complete at least 1 full code sprint, and at least 1 full data sprint (a sprint that is both code/data counts as half of both).

Scope Sprint

In the first month of class you will scope a feasible and significant project for the year with your team members, consulting the instructor and reading background literature to establish a gap that you can fill. The Statement of Work for this sprint will include:
  • A 2-page description of your project for the year, so be submitted to AWS as a research grant to get AWS credits for the year for your project. See here for details.
  • A statement of work listing the deliverables for each of the subsequent three sprints during the semester. This statement of work will include a 1-paragraph description of the work per sprint, as well as a clear and concise enumerated list of milestones with appropriate performance metrics. It will require a detailed graphical description (e.g., a figure) providing an example of both the qualitative and quantitative evaluation procedures that you will be using for the duration of the course.
  • A "pitch deck" to be presented to the class.
  • A feedback form to presented to the class that other members of the class will be required to fill out at the end of Sprint 2 and will partially determine your grade.
For each of the following sprints, there are two options: (1) merge in a "pull request" adding some functionality to one of the above listed repositories, or (2) analyze data using the above tools. So, the main goal for Sprint 0 is to plan out which of these two options you are choosing for each of the subsequent three sprints (combinations of the two are also welcome).

Code Sprint

A code sprint deliverable is a succesfully merged pull request into one of the above code repositories. That PR will comply with the packages "contribution guidelines". It will also require coordinating with the maintainer of the package. My recommendation is that you make your PR about 2 weeks prior to the end of the sprint, to give the maintainer sufficient time to evaluate the PR, and possibly iterate with you (which is the norm, even amongst professionals). The PR should of course include suitable documentation and a tutorial for all additional functionality (as a Jupyter notebook). At the conclusion of the sprint, the team will therefore deliver:
  • A github commit message indicating a successful PR into an "integration" branch, with a detailed change log
  • Demo notebooks for each additional function including publication quality figures, both qualitative and quantitative ones.
  • A 20-minute presentation presenting the motivation, challenge, action, and result of your work. This presentation will document the degree to which you have achieved all the goals established for this sprint.
  • A live demo during the final showcasing the primary functionality of the tool.
  • A 4 page report that could be submitted to a conference or workshop describing the repository.
  • If it is the 2nd and/or 3rd code sprint, it must also include an update to either CRAN or PyPi (note that this often takes multiple weeks, so again I recommend you try weeks in advance).

Data Sprint

A data sprint deliverable is a "conference publication" style report documenting an analysis on a given data, with publication quality figures. The first data sprint will result in a short 4 page report, with ~10 references, the 2nd one will result in an 8-page report. All reports must have publication quality figures. All figures must be fully reproducible (end-to-end, starting with the same raw data that you started with) using Jupyter notebooks or the like. Grades will consider whether we are able to reproduce your analysis. At the conclusion of the sprint, the team will therefore deliver:
  • A conference report with publication quality figures
  • A notebook to reproduce each of the quantitative results from the report
  • A 20-minute presentation presenting the motivation, challenge, action, and result of your work. This presentation will document the degree to which you have achieved all the goals established for this sprint.
  • A live demo during your analysis.
  • If it is the 2nd and/or 3rd data sprint, data derivatives must also be deposited in an open access repository.

Credits

This course will provide 4 design-credits for the first semester and six credits for the second semester, and will be a team-based year-long undertaking in data science and neuroscience. The 2xx level does not require extensive numerical programming, and is designed for underclass persons to learn about the design process by contributing in a variety of ways to the teams progress, and includes written assignments. The 4xx level is designed for upperclass persons, counts for 4-credits of design per semester in BME. Each team will have a team captain who is required to take the 6xx level, and will have additional requirements. Other graduate students can also take the 6xx level. Students in the 4xx level and above will be expected to have some background in programming, statistics, and linear algebra, and should be willing to learn more. The software produced will be open-source under the Apache 2.0 licence, and must be reproducible (i.e. another can run it and recreate the findings documented/reported).

Weekly Class Organization

This is subject to change, pending the feedback from the course participants. There will be four teams, each with 4-6 team members (including a captain). The current plan is as follows: Mondays and Wednesdays will follow the same basic structure:
  • Minutes 1-15: jovo will provide a short lecture/tutorial on a topic of mutual interest.
  • Minutes 15-40: Team A will make a formal presentation to the class, tactfully accepting constructive criticism and feedback.
  • Minutes 40-50: 10 minute bio break.
  • Minutes 50-75: Team B will make a formal presentation to the class, tactfully accepting constructive criticism and feedback.
The weekly schedule will be, for fall semester: Monday: Team 1 and 2, Wednesdays: Team 3 and 4. For spring semester: Monday: Team 3 and 4, Wednesdays: Team 1 and 2.

The formal presentations will take the following form: students from the group will present google slides. Each team member will have about 3+ slides detailing the following:
  1. 1 slide: The sprint and weekly goals, each color coded to denote the degree of completion.
  2. 1 slide per task: data (limited to: figures, tables, demos, or webpages) that the deliverable attempted was completed successfully, with links. Note that the content on the slide should include incontrovertible evidence of task completion, i.e., stating "i did it" and/or linking to code that you claim does the job is insufficient.
  3. 1 slide: The deliverables the student will attempt the following week.

Students are expected to practice presenting as a group each week. See these example slides, and these guidelines on making good slides.

During and after the presentations, jovo + TA + classmates will provide feedback with regard to plan and presentation.

Lab seeions each team will have 20 minutes to present updated slides and revised/updated plan for the following week. This meeting is essentially geared to address any potential communication/presentation, for example, if a figure was not yet presentation quality, or a completed analysis was severely problematic for some reason.

Communication

This class communicates largely in Slack. Please join our Slack channel at spiralscience to get involved. Regular updates are provided in Slack, and the instructors will assume you have received them. TA will answer questions from slack approximately within 48 hours.

External Help

Jovo and TA are typically available after class to provide feedback/guidance as requested.

Contributing

Each team will have:
  1. A github repo containing all the code and other content for the course (possibly graduating to multiple repo's as appropriate).
  2. A slack channel on the team workspace.
  3. A google drive folder containing the weekly slides (and jovo has edit permissions).
It is expected that each team member participates in each of the above three media.

Grading

In general, it is expected that all students will be most excited about this class, and therefore invest a minimum of 12 hours of effort per week in the fall, and over 20 hours of effort per week in the spring. Assuming you at least convince jovo and TAs of this level of effort, and address their weekly feedback, you should expect an A. Anybody in danger of not receiving an A will be told so by jovo, and given clear requirements in order to bring your grade back up to an A. Anybody not receiving an A in the first semester will not be invited back for the second semester.

Weekly

See our grading rubric here

Each week students will be graded on the degree of completion of their deliverables, typically including both a qualitative and quantitative result. After your presentations, instructors will provide detailed feedback on your slides, with instructions on how to improve them. Grades will be provided on the basis of your ability to complete the tasks you set out to perform, and update slides on the basis of other's input. In addition, slides should have research artifacts for each task (whether completed or not).

Sprint Deliverable Grades

The sprint deliverables will be graded as follows. Jovo will send a link to your webpage to a technically competent student or collaborator, as well as instructions. That person will attempt to run the tool from start to finish. If she is unable to complete the analysis, you get a zero. If she is able to complete it, but gets results dramatically different from what she should have gotten, you get a 50%. If she is able to complete it, and gets answers that are reasonable, you get a 100%.

Feedback

Steven Covey, author of "Seven Habits", states that the 7th habit is "sharpening our sword", which means (for this class), getting feedback and seriously considering it. Every person involved in the class will therefore be required to provide feedback to others, including the instructor. Students will also identify properties of the class which gruntle or disgruntle them, so that we may make adjustments. Link to Evaluation Form.

ABET Student Outcomes

  • (a1) Apply knowledge of advanced mathematics (calculus, differential equations, linear algebra, statistics) to problems at the interface of engineering, biology and medicine
  • (a5) Mathematically model and simulate biological systems using computers
  • (b1) Formulate hypothesis for experiments, including those on living systems
  • (b4) Display, describe, summarize and interpret experimental results in a lab report
  • (c1) Identify a desired need and define the biomedical engineering problem to be solved
  • (c2) Determine the constraints to the problem and assess the successful likelihood for different approaches
  • (d1) Communicate opinions, viewpoints and expertise with other team members
  • (d2) Understand team goals and assume and fulfill individual responsibilities within the team
  • (e3) Solve problem using experimental, mathematical and/or computational tools
  • (g1) Synthesize, summarize and explain technical content in a written report
  • (g2) Synthesize, summarize and explain technical content in an oral presentation
  • (k1) Gain proficiency in computer simulations and mathematical analysis tools

Online Resources

Books

Articles

Visualization tools

2018-2019 Projects