You will be dealing with real-world problems, messy data sets, and the chance to work on an end-to-end solution to a problem using computational methods.

Professors and TFs have worked with partners/collaborators from science, industry, government or the non-profit sector to select and prepare project goals and data. Projects will be selected to combine the statistical, computational, and engineering challenges and social issues involved in solving complex real-world problems. Additional mentors for the project may be recruited to assist with particular projects.

Students will be placed in groups and each group will work with the instructor who together will identify a particular problem they would like to solve. Students will also work with the instructors and mentors or partners to understand the overall problem, define the problem, and propose a solution. These solutions will be either in the form of a software package with documentation, recommendations report or a research paper.

Students will go through the whole cycle of solving a real-world problem in a project team:

  • acquiring, organizing and processing data
  • creating and outlining solutions
  • implementing those solutions
  • communicating and defending their work.

Students will be setting their own deadlines, determining roles within the group and managing group dynamics. Professors and TFs will guide students through the process and provide feedback. TF's will be available every week to guide and help students.

A very important component of the capstone project will be the significant feedback provided to the students, which should be a learning tool in itself. Students will be given explicit measures for evaluation, regular feedback on all aspects of the project and its implementation, as well as opportunity for self-assessment.

Students will be required to give scheduled updates and to attend weekly face-to-face review sessions with the instructor and/or mentors. This continuous open dialogue will ensure the achievement of the learning outcomes.

The course will be broken down into the following four phases:

PHASE 1: Investigate/research

Topics and CSE learning outcome covered

  • Overview of problem solving strategies
  • Investigate known techniques
  • Evaluate and select appropriate statistical and mathematical models (CSE learning outcome)
  • Use visualization (CSE learning outcome)
  • Evaluate and compare multiple computational approaches and choose the most appropriate and efficient one (CSE learning outcome)
  • Problem statement definition
  • Project management
  • Communicate across disciplines and collaborate in a team (CSE learning outcome)

Major Deliverables

  • Literature search and background
  • Presentation on findings
  • Presentation on problem statement
  • Interaction with partners

PHASE 2: Data acquisition/experimental design and/or data preparation/exploration

Topics and CSE learning outcome covered

  • Understand and design data structures, data management, and data cleaning techniques (CSE learning outcome)
  • Design experiments and strategies for data collection (CSE learning outcome)
  • Use visualization (CSE learning outcome)
  • Communicate across disciplines and collaborate in a team (CSE learning outcome)

Major Deliverables

  • Data exploration and strategies for data cleaning and data structure
  • Presentation on data aspects
  • Interaction with partners

PHASE 3: Formulation of solution, prototype implementation

Topics and CSE learning outcome covered

  • Brainstorming
  • Concept selection
  • Produce a computational solution that is reproducible (CSE learning outcome)
  • Model complex systems with consideration of efficiency, cost and data availability (CSE learning outcome)
  • Apply machine learning techniques (CSE learning outcome)

Major Deliverables

  • Present multiple ideas and comparison amongst those ideas
  • Obtain partner feedback
  • Presentation of final concept selection with quantifiable choice specifications

PHASE 4: Implementation

Topics and CSE learning outcome covered

  • Use computation for advanced data analysis (CSE learning outcome)
  • Apply techniques and tools from software engineering to build robust, reliable, and maintainable software (CSE learning outcome)
  • Take advantage of parallel and distributed computing and other emerging modes of computation, both in algorithms and in code implementation (CSE learning outcome)
  • Project management, continued
  • Communicate across disciplines and collaborate in a team (CSE learning outcome) Evaluate and apply appropriate optimization methods (CSE learning outcome)

Major Deliverables:

  • Final solution presentation
  • Software implementation delivered
  • Self and peer evaluations (submitted)


  1. Final report (group)
  2. Final presentation (group, one or more students may give the presentation)
  3. Self evaluation (individual)
  4. Peer evaluation (individual)
  5. Project website (group): Students will develop a web page which briefly describes their project and includes a demonstration video (below).
  6. Project video (group): Each group will create a three minute screencast with narration showing a demo of their work.

Course Grade

Course grades for this course will be based on the overall performance of each phase. The grade will be determined based on the novelty and robustness of the final solution, how well it addressed the problem statement, and how it was presented in the final presentation and report.

Your course grade is calculated as:

  • phase 1: 15
  • phase 2: 15
  • phase 3: 15
  • phase 4: 25
  • presentation+deliverables: presentation+paper/report/software-with-docs+site+video:20
  • participation: 5 (piazza discussions, attendance at tutorial)s
  • teamwork: 5 (as peer evaluated as well as evaluated by us)

There will be a piazza page for all discussions and announcements. tudents are encouraged to talk about their projects, difficulties and all, on the forum. Students are encouraged even more to engage with students doing other projects: we promise it will make your experience more fun, and your knowledge base larger.

Remember that you will be graded on all the 4 phases of your project talked about in syllabus.

Note that there is 5% grade for teamwork. This depends on your own self-evaluation as well as the evaluation of your contribution by your teammates.

The dropbox for the project will close on the deadline for that piece of work. NO work will be accepted after the deadline. Do not submit in the last seconds before the dropbox closes.

Tutorial Topics

It is strongly recommended that you attend all the tutorials, as they will be fun. You will learn and work hands-on on advanced topics during the tutorials. Tutorial notes and notebooks will be posted online, but do not expect all the material talked about in class, as well as questions posed and their answers to be in these notes.

These are the topics which will be covered in the tutorials. The list below reflects a rough ordering of topics. The actual order will likely shift around a bit. The course will be taught using Python as the language of implementation.

  1. Numpy/scipy/matplotlib/seaborn/pandas (DONE)
  2. Exploratory statistics, visualization, and data analysis; publish on github. (DONE)
  3. Software Development and Github.
  4. Machine Learning using scikit-learn 1
  5. Machine Learning using scikit-learn 2
  6. Deep Learning using Theano
  7. Scientific Writing and presentation
  8. The bayesian paradigm
  9. Scaling Algorithms 1
  10. Scaling Algorithms 2
  11. Bayesian prediction and inference using pymc
  12. Model Averaging and Ensembles