You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Very complete proposal, congratulations on the work you have done. Some suggestions:
The purpose of the "data aggregation" stage is not stated. It appears that you are doing this for visualization purposes. If this is the case, then there is no need to describe the aggregation, the description of the plots suffices.
Even though you state that aim is to predict "property tax", you are modelling only the mill rates and not the property assessments. It would be desirable that you made that distinction explicit, and that you mentioned how you compute the property tax from the mill rates and the property assessments (to show how everything fits together).
Recall that the client assumed that the government aims to match its budget and its income by adjusting the mill rates. It would be nice to have a plot of the difference of the total property tax income (i.e., sum of all property taxes for a year) and the budget through time, to check that the assumption makes sense.
Explain why one model uses "year", in terms of the qualities of a time series (i.e, "linear trend").
You are confusing some terms in the "regression family" description. The point of regularization in Lasso/Ridge/Elastic-net regression is to reduce the mean squared error of the predictions. The advantage of L1 over L2 is to have sparse coefficients. Therefore, the claim "L1 is robust to outliers" is imprecise, because outlier-robustness occurs when the L2 loss is replaced with L1 loss (equivalent to replacing Gaussian errors with Laplace-distributed errors), regardless of the penalty, which is NOT the case in the standard formulation of Lasso/Ridge/Elastic-net.
Related to the above point: before trying discussing ways to control the effect of outliers, it would be good that you discussed first if and where you expect outliers to appear. For example, mill rates are constrained between 0 and 1, so outliers should not be meaningful.
I'm not sure that you will have enough data to fit a meaningful neural network. Since you are predicting mill rates, you will only have 18 (municipalities) times 13 (years) = 234 data points.
Since the goal of your project is to predict, I would like to see what precautions you are taking to avoid overfitting. Are you keeping separate train/test data? In that case, it is also important to decide how you are going to split train/test. For example, are you going to allow the same properties to appear in both splits at different years? When thinking about this, it is important to recall that in theory both datasets must be independent realizations of the same data generating process.
Finally, in terms of participation, I would like to see more activity either in Slack (I'm in the group but there's not much activity there) or in Github issues (up to now these are mostly used by the 550 students to give you feedback). Remember that if you have face to face meetings, you should upload a small summary of the meeting to Github.
Miguel.
The text was updated successfully, but these errors were encountered:
Very complete proposal, congratulations on the work you have done. Some suggestions:
The purpose of the "data aggregation" stage is not stated. It appears that you are doing this for visualization purposes. If this is the case, then there is no need to describe the aggregation, the description of the plots suffices.
Even though you state that aim is to predict "property tax", you are modelling only the mill rates and not the property assessments. It would be desirable that you made that distinction explicit, and that you mentioned how you compute the property tax from the mill rates and the property assessments (to show how everything fits together).
Recall that the client assumed that the government aims to match its budget and its income by adjusting the mill rates. It would be nice to have a plot of the difference of the total property tax income (i.e., sum of all property taxes for a year) and the budget through time, to check that the assumption makes sense.
Explain why one model uses "year", in terms of the qualities of a time series (i.e, "linear trend").
You are confusing some terms in the "regression family" description. The point of regularization in Lasso/Ridge/Elastic-net regression is to reduce the mean squared error of the predictions. The advantage of L1 over L2 is to have sparse coefficients. Therefore, the claim "L1 is robust to outliers" is imprecise, because outlier-robustness occurs when the L2 loss is replaced with L1 loss (equivalent to replacing Gaussian errors with Laplace-distributed errors), regardless of the penalty, which is NOT the case in the standard formulation of Lasso/Ridge/Elastic-net.
Related to the above point: before trying discussing ways to control the effect of outliers, it would be good that you discussed first if and where you expect outliers to appear. For example, mill rates are constrained between 0 and 1, so outliers should not be meaningful.
I'm not sure that you will have enough data to fit a meaningful neural network. Since you are predicting mill rates, you will only have 18 (municipalities) times 13 (years) = 234 data points.
Since the goal of your project is to predict, I would like to see what precautions you are taking to avoid overfitting. Are you keeping separate train/test data? In that case, it is also important to decide how you are going to split train/test. For example, are you going to allow the same properties to appear in both splits at different years? When thinking about this, it is important to recall that in theory both datasets must be independent realizations of the same data generating process.
Finally, in terms of participation, I would like to see more activity either in Slack (I'm in the group but there's not much activity there) or in Github issues (up to now these are mostly used by the 550 students to give you feedback). Remember that if you have face to face meetings, you should upload a small summary of the meeting to Github.
Miguel.
The text was updated successfully, but these errors were encountered: