Google Summer of Code Summary
29 Aug 2017Overview
Using the data obtained from the scan tool of various models of electric vehicles from ChargeCar, we were able to get the terrain information of various routes and the time series of the fuel usages accompanying those routes depending on several other factors such as velocity and acceleration as well as vehicle weight. Our job was to train multiiple models on this data to best estimate the fuel consumption so as to be able to select the best route from several competing ones. What follows is a summary timeline of the implementation of the various approaches and how they compare to one another:
Week 1 :
- Organize scan tool data sources
- Visualize data using dimensionality reduction by t-SNE
Week 2 :
- Set a baseline using simple vanilla linear regression
- Implemented a best-subsets variant of linear regression by eliminating predictors by variance inflation factors
Week 3-4 :
- Implemented L1 and L2 regularized variants of linear regression
- Tested different penalty coefficients for the aforementioned ridge and lasso regression
Week 5-6 :
- Implemented random forest regressor
- Performed hyperparameter search for the aforementioned random forest regressor using random search and grid search.
Week 7-8 :
- Implemented xgboost regressor
- Performed hyperparameter search for the aforementioned xgboost regressor using random search and grid search
Week 9-11 :
- Implemented windowed random forest and windowed xgboost regressor feeding not only telemetry data but also first order and second order gradient information
- Performed hyperparameter search for the aforementioned windowed random forest and windowed xgboost regressor using random search and grid search
Week 12-13 :
- Implemented an LSTM network with dropout that used not only telemetry data but also first order and second order gradient and historical observations
- Fixed some bugs in the data reader
The L1 regularized linear regression slightly ouperformed the L2 variant, ostensibly due to its better feature selection abilities. Both these methods significantly outperformed the vanilla linear regression and the best subsets linear regression. While all the other methods surpassed this baseline set by the linear regression approach, the two methods that stood out were xgboost with gradient inputs (Week 9-11) and the LSTM with gradient and historical inputs (Week 12-13).
Proposal link
Prediction of Energy Consumption of Electric Vehicles
Possible Extensions
- Since the recurrent neural network approach using LSTMs has been promising, a more sophisticated stacked LSTM architechture capable of learning more complicated fuctions may yield better results
- Ensemble methods that combine predictions from various different types of regressors, may also be very promising
- The Grid Search that was used for hyperparameter selection was only run on a very narrow sub-range of the complete space of values. A Gaussian Process based Bayesian Optimisation of the parameter values over their complete space was tried but found to be intractable on consumer hardware. This method could however, yield an improvement that may help infer better combinations of values.