CS221 Final Project Guidelines

In the final project, you will work in groups of up to four to apply the techniques that you've learned in CS221 to a new setting that you're interested in. Note that regardless of the group size, all groups must submit the work detailed in each milestone and will be graded on the same criteria. Additionally, we expect each team to submit a completed project. We encourage teams of 3-4 students. All projects require students to spend time gathering data and setting up the infrastructure to reach an end result, and thus 3 or 4 person teams can share these tasks much better. This allows the team to focus more on the interesting results and discussion in the project. Each member of the team should contribute in both technical and non-technical components of the project.

You will build a system to solve a well-defined task. Which task you choose is completely open-ended, but the methods you use should draw on the ones from the course.

If you are interested in doing the class project, submit the project interest form any time before Oct 7. We encourage all students who might be interested in a project to fill this form out. There is no commitment (you can fill it out and then not follow through with doing a project). You can also refer to this Ed Post to find project group team members.

After you submit the form, you will be assigned one of the CAs as your official mentor. They will grade all your work and really get to know your project. You are required to have a 15-minute check-in meeting with your mentor in the week before or after the progress report deadline, and are encouraged to drop by your mentor's office hours often to discuss your project. Note that it will take several iterations to find the right project, so be patient; this exploration is an essential part of research, so learn from it. Have fun and don't wait until the last minute!

Milestones

Throughout the quarter, there will be several milestones so that you can get adequate feedback on the project. While the project is ungraded (except for potential extra credit), we can only give feedback on milestones that are submitted on time.

  1. Proposal (2 pages max). The proposal is due Oct 21. It should include the following items.
    You should have the majority of the infrastructure (e.g., building a simulator, cleaning data) completed to do something interesting by now. For machine learning tasks, setting up the infrastructure involves collecting data (either by scraping, using crowdsourcing, or hand labeling). For game-based tasks, this involves building the game engine/simulator. While infrastructure is necessary, try not to spend too much time on it. You can sometimes take existing datasets or modify existing simulators to save time, but if you want to solve a task you care about, this is not always an option. Note that if you download existing datasets which are already preprocessed (e.g., Kaggle), then you will be expected to do more with the project.

    Note that you can still adjust your project topic after submitting the proposal, but your progress report (described next) should be on the same topic as your final report.

  2. Progress report (4 pages max). The progress report is due on Nov 14. It should include the following items.

  3. Progress check-in with CA mentor (15-minute meeting). The check-in should be completed by Nov 14. To provide more mentorship for the final project, we introduce a progress check-in between project groups and their assigned mentor. In the week before or after the progress report deadline, you will have a 15-minute meeting with your mentor to discuss the progress of your project, clear any confusion, and get advice. Your assigned CA mentor will reach out to give information on how to schedule the meeting for check-in.

  4. Final Video (5 minutes): The final video is due on Dec 3. The final video is a 5 minute video presentation of your project. In the video, you should briefly describe the motivation, problem definition, challenges, approaches, results, and analysis. You should include diagrams, figures and charts to illustrate the highlights of your work. If possible, try to come up with creative visualizations of your project. These could include system diagrams, more detailed examples of data that do not fit in the space of your report, or live demonstrations for end-to-end systems. The goal of the video is to convey the important high-level ideas and give intuition rather than be a super-detailed specification of everything you did. Use lots of diagrams and concrete examples, and avoid slides that are too wordy or have extremely complex equations.

  5. Final Report (5-10 pages max): The final report is due on Dec 3. Your final report should be a comprehensive account of your project. This final report structure is very similar to the progress report, except we would like to see some new results, experimentation and/or analysis, and a Future Works section. Below is a full description of what you should include in your project final report.

Note: you can have an appendix for each of the assignments the beyond the maximum number of allowed pages with any figures, plots, or examples that you need. References do not count for the page limit.

Submission

Submit the milestones on Gradescope and make sure all group members are added to the submission.

All milestones are due at 11:59pm.

For each milestone, you should submit: For the Final Project Report, be sure to include a link to your code (uploaded to Github/Bitbucket/Google Drive/etc.) and data in the writeup. Any language is fine; it does not have to run out-of-the-box. You should also include a README.md file within the repo/zip documenting what everything is and what commands you ran.

Rubric

We will give feedback on the following dimensions:

Of course, the experiments may not always be successful. Getting negative results is normal, and as long as you make a reasonably well-motivated attempt and you explained why the results came out negative, you will get credit.

An example strategy

This is a suggestion of how to approach the final project with an example.

Datasets

You are free to use existing datasets, but these might be not necessarily the best match for your problem, in which case you are probably better off making your own dataset.

Libraries

You are free to use existing tools for parts of your project as long as you're clear what you used. When you use existing tools, the expectation is that you will do more on other dimensions.

Some project ideas

You can also get inspiration from previous years' CS221 projects (student access only).

Frequently asked questions

Can I use the same project for CS221 and another class (CS229, etc.)? The short answer is that you cannot turn in the identical project for both classes, but you can share common infrastructure across the two classes. First, you should make sure that you follow the guidelines for the CS221 project, which are likely different from those of other classes. Second, if any part of the project is done for a purpose outside CS221 (for the final project in CS229 or other classes, or even for your own research), then in the progress and final reports, you must clearly indicate which part of the project was done for CS221 and which part was not. For example, if you're taking CS229, then you cannot turn in the same pure machine learning project for CS221. But you can work on the same broad problem (e.g., news recommendation) for both classes and share the same dataset / generic wrapper code. You should then explore the machine learning aspect of the problem for CS229 (e.g., classifying news relevance) and another topic for CS221 (e.g., optimizing diversity across news articles using search or CSPs).

Are there restrictions on who I can partner up with for the final project? The only hard requirement is that each member of your group must be enrolled in CS221. Thus, if you choose to use the same project for CS221 and another class, all of your partners must be in CS221. If you feel like you have a compelling case for an exception, please submit a request on Ed detailing the parts of the project used for each class and the reasons for deviating from the project policies.

How do you choose a good baseline and oracle?
Baselines are simple algorithms, which might include using a small set of hand-crafted rules, training a simple classifier, etc. (Note that baselines are extremely simple, but you might be surprised at how effective they are.) While a predictor that guesses randomly provides a lower bound (and can be reported in the paper), it is too simple and doesn't give you much information. Predicting the majority label is a slightly less trivial baseline, and whether it's acceptable depends on how insightful it is. For classification, if the different labels have very different proportions, then it could be useful; otherwise it won't be. You are encouraged to have multiple baselines. Please note that we expect an implementation of the baseline for the project progress report (not project proposal).

Oracles are algorithms that "cheat" and look at the correct answer or involve humans. For human-like classification problems (e.g., sentiment classification), you can have each member of your project team try to annotate ~50 examples and measuring the agreement rate. Note that some tasks are subjective, so even though humans are providing ground truth labels, human accuracy will not be 100%. When the classification problem is not human-like, you can try to use the training error of an expressive classifier (e.g., nearest neighbors) as a proxy for oracle error. The idea is that if you can't even fit the training data using a very expressive classifier, there is probably a lot of noise in your dataset, and you have a slim chance of building any classifier that does well on test. While returning 100% is an upper bound, it is not a valid oracle since it is vacuously an upper bound. Sometimes, oracles might be difficult to come by. If you think that no good oracles exist, explain why. Please note that we do not expect an implementation of the oracle at any point during your final project. However if the implementation is very easy, you are free to implement the oracle.

Both baselines and oracles should be simple and not take much time. The point is not to do something fancy, but to work with the data / problem that you have in a substantive way and learn something from it. Here are some examples of baselines:

Guessing completely at random is technically a baseline, but is a really bad one because it doesn't really tell you much about how easy the problem is.
Here are some examples of oracles: Always guessing the correct label is technically an oracle, but it's a really bad one, because you'd always get 100% and you don't learn from it.
Overall, the main point of baselines and oracles is to get you to look at the problem carefully and think about what's possible. The accuracies of state-of-the-art systems on the dataset could be either a baseline or an oracle. Sometimes, there are data points that are neither baselines nor oracles: for example, in a two component system, you use an oracle for one and a baseline for another.