Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feedback on the proposal #14

Open
miguelbiron opened this issue Feb 19, 2020 · 0 comments
Open

feedback on the proposal #14

miguelbiron opened this issue Feb 19, 2020 · 0 comments

Comments

@miguelbiron
Copy link
Collaborator

miguelbiron commented Feb 19, 2020

Very complete proposal, congratulations on the work you have done. Some suggestions:

  • The purpose of the "data aggregation" stage is not stated. It appears that you are doing this for visualization purposes. If this is the case, then there is no need to describe the aggregation, the description of the plots suffices.

  • Even though you state that aim is to predict "property tax", you are modelling only the mill rates and not the property assessments. It would be desirable that you made that distinction explicit, and that you mentioned how you compute the property tax from the mill rates and the property assessments (to show how everything fits together).

  • Recall that the client assumed that the government aims to match its budget and its income by adjusting the mill rates. It would be nice to have a plot of the difference of the total property tax income (i.e., sum of all property taxes for a year) and the budget through time, to check that the assumption makes sense.

  • Explain why one model uses "year", in terms of the qualities of a time series (i.e, "linear trend").

  • You are confusing some terms in the "regression family" description. The point of regularization in Lasso/Ridge/Elastic-net regression is to reduce the mean squared error of the predictions. The advantage of L1 over L2 is to have sparse coefficients. Therefore, the claim "L1 is robust to outliers" is imprecise, because outlier-robustness occurs when the L2 loss is replaced with L1 loss (equivalent to replacing Gaussian errors with Laplace-distributed errors), regardless of the penalty, which is NOT the case in the standard formulation of Lasso/Ridge/Elastic-net.

  • Related to the above point: before trying discussing ways to control the effect of outliers, it would be good that you discussed first if and where you expect outliers to appear. For example, mill rates are constrained between 0 and 1, so outliers should not be meaningful.

  • I'm not sure that you will have enough data to fit a meaningful neural network. Since you are predicting mill rates, you will only have 18 (municipalities) times 13 (years) = 234 data points.

  • Since the goal of your project is to predict, I would like to see what precautions you are taking to avoid overfitting. Are you keeping separate train/test data? In that case, it is also important to decide how you are going to split train/test. For example, are you going to allow the same properties to appear in both splits at different years? When thinking about this, it is important to recall that in theory both datasets must be independent realizations of the same data generating process.

  • Finally, in terms of participation, I would like to see more activity either in Slack (I'm in the group but there's not much activity there) or in Github issues (up to now these are mostly used by the 550 students to give you feedback). Remember that if you have face to face meetings, you should upload a small summary of the meeting to Github.

Miguel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant