Projects


Stability and optimality in stochastic gradient descent

Dustin Tran

Abstract:Stochastic gradient methods have increasingly become popular for large-scale optimization. However, they are often numerically unstable because of their sensitivity to hyperparameters in the learning rate; furthermore they are statistically inefficient because of their suboptimal usage of the data's information. We propose a new learning procedure, termed averaged implicit stochastic gradient descent (ai-SGD), which combines stability through proximal (implicit) updates and statistical efficiency through averaging of the iterates.


Distance Query in Volume Data

Ye Kuang

Abstract:Demonstrate a distance query system for any two arbitrary shape objects in 3D space.


A Statistical Analysis of MBTA Ridership

Lyla Fadden, Micah Lanier, Fil Piasevoli, Aaron Zampaglione

Abstract:As one of the largest public transit systems in the United States, the MBTA serves 4.8 million people throughout the Boston metro area and facilitates an average weekly ridership of 1.3 million trips. One central principle underlies the entirety of our project: the improvement of MBTA service. Intuition and familiarity have enabled planners to keep the trains and busses running on time, but delays and overcrowding have led stakeholders to seek out a more rigorous understanding of the transit system from IACS students. We will discuss our statistical and machine learning approach to improving MBTA service through predicting demand, understanding how to compare and categorize T stations, and how weather and public events influence passenger needs across the system.


Modeling Homicide in Honduras

Charlotte Lloyd, Jamie Rogers

Abstract: Honduras has the highest homicide rate in the world, which is the reason for the founding of the Honduras Convived program in July 2012 by our client USAID/OTI. The programs goal is to disrupt the systems, perceptions, and behaviors that support violence. Specifically, USAID/OTI works towards the creation of low-tech and low-cost models of violence disruption that can be implemented by the Government of Honduras and has implemented roughly 300 different project activities to date. For our Capstone Project, we will analyze detailed homicide data for the city of San Pedro Sula from 2013-2014 and present results that characterize the diverse patterns of violence at the neighborhood level. Since the majority of USAID/OTI projects are developed for a single or small group of neighborhoods, this approach will directly inform their understanding of the specific systems of violence occurring in these locales. For instance, interventions appropriate for stemming particular kinds of homicides can be implemented in neighborhoods where those kinds of homicides are most prevalent.


Conversion to Customers - Boston Globe

Jeffrey Shen, Kai Sheng, Simon Malian

Abstract:The typical cyber-life of a BostonGlobe user starts with anonymous visits- from casually visiting the site, to ultimately becoming a subscriber. The BostonGlobe would like to understand the idiosyncrasies and patterns of a subscriber and use that knowledge to increase subscriptions.


Social Media Epidemiology

Ryan Lee, Sail Wu, Jacob Zhu

Abstract:Healthmap (http://www.healthmap.org/en/) is a collaboration between epidemiologists and computational scientists to track in real-time ongoing outbreaks of major diseases. Currently, incoming alerts such as news articles, medical update (WHO) feeds, and twitter alerts are curated by humans on the Healthmap team. In this project, one of our goals is to build an automated pipeline that will take in raw articles and label them with currently human-labeled tags, specifically, the disease the article is talking about and the location of the subject of the article. We also create a visualization that includes a map displaying the estimated locations of alerts for each disease in affected regions as well as a timeline graph of showing alert data overlaid onto the known and estimated weekly case/death count. We also attempt to estimate the current weekly case counts given current weekly Twitter data for Ebola, so as to inform epidemiologists about current conditions of the disease without lag.


High-speed 3D volume registration for motion correction in magnetic resonance imaging

Yingzhuo (Diana) Zhang

Abstract:I will briefly introduce the problems caused by movements during MRI scan and the importance of motion detection. With some background information as well as description of the data, I will mainly focus on the methods and models I have used, in particular new methods that have never been tried before in registering this kind of volume.


Sentiment Analysis and Opinion Mining of TripAdvisor Hotel Reviews

Yaxiong Cai, Arjun Sanghvi, Peiheng Hu, Ruitao Du

Abstract:TripAdvisor, founded in 2000, is a public company that operates various travel websites providing content such as editorial reviews of hotels in addition to providing a free forum for user discussion. Websites that are part of the TripAdvisor Group attract greater than 100 million monthly views, and the company has become such an integral part of travel business worldwide that it holds enough sway to single-handedly make or break a hotel depending on how that hotel is portrayed. The massive amount of user-generated content posted on Trip Advisor is an invaluable resource of feedback for hotels and for Trip Advisor itself. To date, the company has used the labor intensive and potentially biased method of manually analyzing a small subset of the reviews. The goal of this project is to write software for automatic analysis of these free-form text responses.


Market Beta

Wei Dai, Shi Fang, Zhenyang Pan

Abstract:We implemented different beta estimation methods to find efficient estimator, given information available today, the relationship between the price of a stock and a market index in the future. We compared different time periods, as well as different industries in finding the best estimating method.


Bitcoin Trading Strategies

Daniel Rajchwald, Wenshuai Ye, Wenwan Yang

Abstract:Bitcoin is a digital currency that has properties that make it a unique trading instrument. For instance, it is a decentralized peer-to-peer currency, exhibits high volatility, and has a maximum circulation. We investigate trading opportunities for Bitcoin by using statistical techniques to compare and evaluate trading strategies. MCMC, ensemble techniques, pattern recognition, and other statistical methods are applied to Bitcoin time series data. The relative successes of the algorithms also give insight into Bitcoin market behavior.


Automated Synapse Classification Using Convolutional Neural Networks

Cole Diamond, Raphael Pestourie, Hallvard Nydal, Mina Nassif

Abstract:We believe that the application of convolutional neural networks to the task of identifying synapses is more advantageous than the application of random forests. We plan to measure our performance in terms of F1 for a direct com- parison, but will provide our confusion matrix on predictions, and the Area under the ROC curve statistic. Moreover, we will measure performance time to estimate the throughput of our algorithm in predicting large quantities of data.