layout | title | nav_order | description |
---|---|---|---|
default |
Project 3 |
6 |
Project 3 instructions, specifications, and rubrics. |
{:.no_toc}
{: .no_toc .text-delta }
- TOC {:toc}
The project is due on Friday, April 26, 2024. This is the Friday before RRR week. Once you have finished this project, you will have finished all assignments in this course - congratulations!
In addition, there will be one checkpoint due on Friday, April 12, 2024.
Note: During the time of the project, all discussion sections will be turned into optional OH for your group to ask any clarifiying questions. We strongly encourage you to attend these!
This is a group project. Good collaboration and leadership will be needed for a successful reproduction. We ask that you organize yourself in groups of 4. Groups may have members from different discussion sections. We encourage you to use Ed as a place to look for team members, as well as the following form.
If you are looking for a group to join or if you are in a group looking for additional members, please fill out this form before Friday April 5th, 11:59pm. We will fill groups short of members by Monday April 8th.
Every group must have only 1 member fill out this form, even if you already have 4 people in your group.
In this project, you will have the opportunity to apply the data science and economic tools and techniques that you have learned throughout the semester to explore and reproduce a paper from Berkeley Professor Ted Miguel's extensive work.
Ted Miguel's main research focus is African economic development, including work on the economic causes and consequences of violence; the impact of ethnic divisions on local collective action; interactions between health, education, environment, and productivity for the poor; and methods for transparent social science research. He has conducted field work in Kenya, Sierra Leone, Tanzania, and India (sourced from his own bio).
Many of the datasets used in his research are posted online, either on the relevant articles page or on his Dataverse.
As a group, you'll choose one paper from Ted Miguel's Dataverse and attempt to reproduce its findings. reproduce in the context of this project refers to computational reproducibility, which is:
The ability to duplicate the results of a prior study using the same data and procedures as were used by the original investigator. Computational reproducibility is done using the same computer code (possibly rebuilt from scratch), but can be achieved using a different software package. (Dreber and Johannesson (2023))
In other words, you should pick a paper that triggers your group's interests, access its data and source code, and re-code the analysis in your own coding environment, then present your main findings as regression outputs and figures as in the original paper.
Before doing so, you should, as a group, read and understand the paper's main objectives, policy implications, and especially its methodology section to have the necessary context to pursue a reproduce.
For more information on different forms of reproducibility, please see The Institute for Replication (I4R)'s main page.
Your final delivarable should be a Reproduction Report following the I4R framework. It should be submitted as a PDF, typed up in LaTeX (Chat-GPT can be really helpful for this). Please use the LaTeX template attached. Read the reproduction report template carefully as it contains all the information you need to pursue for a successful reproduction. Knowing LaTeX is a super-power and makes your work look really pretty!
Past reproduction reports have taken 10-15 pages, although we will not be checking the page count very carefully. That being said, please be concise in your work. Writing more information than necessary often comes at the cost of clarity. Your communication and presentation is very important for a reproduction exercise.
In addition, you should submit a well-organized, compressed (.zip
) folder that contains a Jupyter notebook and all the data used. We should be able to reproduce your work by running the notebook locally on our computers.
Here is the list of deliverables and the grading rubrics. See the reproduction report for more details.
Deliverable | Points |
---|---|
Abstract | 5% |
Introduction | 5% |
Reproducibility | 20% |
Checkpoint 2 (see link) | 10% |
Conclusion | 5% |
Clarity, Style and Presentation | 5% |
Deliverable | Points |
---|---|
Checkpoint 1 (see details below) | 15% |
Data Cleaning & Pre-Processing | 5% |
Modeling (regressions) | 10% |
Visualizations (figures, tables) | 15% |
Clarity, Style and Presentation | 5% |
For each deliverable, we will award points according to the following percentage scale:
Grade | Description |
---|---|
Excellent (above 90%) | Work that is free of all but the most minor errors and demonstrates creativity and/or a very deep understanding of what you are doing. |
Good (80-90%) | Work that is free of fundamental errors and demonstrates a basic understanding of what you're doing. |
Fair (60-80%) | Work with fundamental errors in analysis and/or conveys a lack of understanding of the basics of the work you are attempting to do. |
Lacking (below 60%) | Work that is severely lacking or incomplete. |
There are a total of 58 papers published on Ted Miguel's Dataverse. Some contain reproduction data for RCTs with 1000s of participants over several years with multiple Stata (.do
/.dta
) files. Others are far less complex in their data structures, yet seek to answer questions no less interesting. Take this into consideration when your group chooses what paper to reproduce. Below are some suggestions that course staff think would be a great options given our limited time:
Paper | Link |
---|---|
Fisman, Raymond, and Edward Miguel. 2007. "Corruption, Norms and Legal Enforcement: Evidence from Diplomatic Parking Tickets." Journal of Political Economy 115 (6): 1020-1048. | data, PDF |
Miguel, Edward. 2005. "Poverty and Witch Killing." Review of Economic Studies 72 (4): 1153-1172. | data, PDF |
Miguel, Edward, Sebastian M. Saiegh, and Shanker Satyanath. 2011. "Civil War Exposure and Violence." Economics & Politics 23 (1): 59-73 | data, PDF |
Hsieh, Chang-Tai, Edward Miguel, Daniel Ortega, and Francisco Rodriguez. 2011. "The Price of Political Opposition: Evidence from Venezuela's Maisanta." American Economic Journal: Applied Economics 3 (2): 196-214. | data, PDF |
Kramon, Eric, Joan Hamory, Sarah Baird, and Edward Miguel. 2022. “Deepening or Diminishing Ethnic Divides? The Impact of Urban Migration in Kenya.” American Journal of Political Science 66 (2): 365-84 | data, PDF |
Baysan, Ceren; Burke, Marshall; González, Felipe; Hsiang, Solomon; Miguel, Edward, 2020, "Non-economic factors in violence: Evidence from organized crime, suicides and climate in Mexico" | data, PDF |
Bauer, Michal, Julie Chytilova, and Edward Miguel. (2020). "Using survey questions to measure preferences: Lessons from an experimental validation in Kenya", forthcoming European Economic Review. | data, PDF |
Burke, Marshall, Lauren Falcao Bergquist, and Edward Miguel, "Sell Low and Buy High: Arbitrage and Local Price Effects in Kenyan Markets," The Quarterly Journal of Economics, v.134 2, May 2019, 785-842. | data, PDF |
Miguel, Edward; Burgess, Robin; Jedwab, Remi; Morjaria, Ameet; Padró i Miquel, Gerard, 2016, "The Value of Democracy: Evidence from Road Building in Kenya" | data, PDF |
Miguel, Edward; Friedman, Willa; Kremer, Michael; Thornton, Rebecca, 2016, "Education as Liberation" | data, PDF |
This will be a separate gradescope assignment that will be due earlier. There is no need to submit any code, a write-up of your progress so far (preferably in LaTeX) will suffice. Everything must be submitted in a single pdf. You must include:
- Group composition (names and assigned responsibilities).
- Chosen paper.
- Screenshots of the project code organization (including any relevant data/notebook/code).
- Screenshots of LaTeX project in Overleaf following the I4R template.
- A description of what you have attempted to do so far.
- A summary of what's going well and what you are struggling with.
- A note on how course staff may help you succeed.
One person from your group must fill out this google form for the checkpoint. Note that while you have space to discuss up to 5 results, you are only required to discuss 3.
- A PDF of the Reproduction Report (submitted to the project 3 written assignment on Gradescope)
- A PDF file of the Jupyter Notebook containing all the analysis (submitted to the project 3 written assignment on Gradescope)
- A well-organized compressed (
.zip
) file containing: (submitted to the project 3 coding assignment on Gradescope)- A Jupyter Notebook that can easily be run to reproduce all your results
- All datasets that you downloaded and used in the notebook
- Figures and Plots (if not already included in the report or notebook)
- Please do not reach out to Professor Ted Miguel or any of his co-authors with questions about the reproduction. Direct all questions to course staff: Peter, Rohan, or Professor Van Dusen.
- During the time of the project, all discussion sections will be turned into optional OH for your group to ask any clarifiying questions. We strongly encourage you (and your group) to attend these!
- Course staff may share, with the consent of the group, the best reproductions with the authors of the paper.
- A lot of the code for the original papers is written in Stata (using
.do
/.dta
files). Rohan will hold a lecture on translating Stata to Python using a Python package he built (Stata2Python
{:target="_blank"}). You may also find Google and ChatGPT helpful for this, but please ensure your code is free of errors and you understand what's going on. Do not simply just copy-and-paste!