Skip to content

Commit e37c56d

Browse files
committed
feat (lab): food intake advanced analysis
1 parent c318a07 commit e37c56d

File tree

1 file changed

+317
-0
lines changed

1 file changed

+317
-0
lines changed
Lines changed: 317 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,317 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Análise com SQL Avançado\n",
8+
"## U.S. EPA Food Commodity Intake Database (FCID)\n",
9+
"### [https://fcid.foodrisk.org/](https://fcid.foodrisk.org/)"
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"Ativando uma conexão de banco de dados em memória usando o SGBD H2:"
17+
]
18+
},
19+
{
20+
"cell_type": "code",
21+
"execution_count": 1,
22+
"metadata": {},
23+
"outputs": [],
24+
"source": [
25+
"%defaultDatasource jdbc:h2:mem:db"
26+
]
27+
},
28+
{
29+
"cell_type": "markdown",
30+
"metadata": {},
31+
"source": [
32+
"# Importando Tabelas do FCID"
33+
]
34+
},
35+
{
36+
"cell_type": "code",
37+
"execution_count": 2,
38+
"metadata": {},
39+
"outputs": [],
40+
"source": [
41+
"DROP TABLE IF EXISTS Crop_Group;\n",
42+
"DROP TABLE IF EXISTS FCID_Description;\n",
43+
"DROP TABLE IF EXISTS Recipes;\n",
44+
"DROP TABLE IF EXISTS Intake;\n",
45+
"\n",
46+
"CREATE TABLE Crop_Group (\n",
47+
" CGN VARCHAR(2),\n",
48+
" CGL VARCHAR(6),\n",
49+
" Crop_Group_Description VARCHAR(80),\n",
50+
" PRIMARY KEY (CGL)\n",
51+
") AS SELECT\n",
52+
" CGN, CGL, Crop_Group_Description\n",
53+
"FROM CSVREAD('../../data/food-intake/basics/FCID_Cropgroup_Description.csv');\n",
54+
"\n",
55+
"CREATE TABLE FCID_Description (\n",
56+
" CGN VARCHAR(2),\n",
57+
" CG_Subgroup VARCHAR(6),\n",
58+
" FCID_Code VARCHAR(10),\n",
59+
" FCID_Desc VARCHAR(55),\n",
60+
" PRIMARY KEY (FCID_Code),\n",
61+
") AS SELECT\n",
62+
" cgn, CG_Subgroup, FCID_Code, FCID_Desc\n",
63+
"FROM CSVREAD('../../data/food-intake/basics/FCID_Code_Description.csv');\n",
64+
"\n",
65+
"CREATE TABLE Recipes (\n",
66+
" Food_Code VARCHAR(8),\n",
67+
" Mod_Code VARCHAR(8),\n",
68+
" Ingredient_Num TINYINT,\n",
69+
" FCID_Code VARCHAR(10),\n",
70+
" Cooked_Status TINYINT,\n",
71+
" Food_Form TINYINT,\n",
72+
" Cooking_Method TINYINT,\n",
73+
" Commodity_Weight DECIMAL(5, 2),\n",
74+
" CSFII_9498_IND TINYINT,\n",
75+
" WWEIA_9904_IND TINYINT,\n",
76+
" WWEIA_0510_IND TINYINT,\n",
77+
" PRIMARY KEY(Food_Code, Mod_Code, Ingredient_Num),\n",
78+
" FOREIGN KEY(FCID_Code)\n",
79+
" REFERENCES FCID_Description(FCID_Code)\n",
80+
" ON DELETE NO ACTION\n",
81+
" ON UPDATE NO ACTION\n",
82+
") AS SELECT\n",
83+
" Food_Code, Mod_Code, Ingredient_Num, FCID_Code, Cooked_Status, Food_Form, Cooking_Method,\n",
84+
" Commodity_Weight, CSFII_9498_IND, WWEIA_9904_IND, WWEIA_0510_IND\n",
85+
"FROM CSVREAD('../../data/food-intake/recipes/Recipes_WWEIA_FCID_0510.csv');\n",
86+
"\n",
87+
"CREATE TABLE Intake (\n",
88+
" SeqN INTEGER NOT NULL,\n",
89+
" DayCode TINYINT NOT NULL,\n",
90+
" DraBF TINYINT,\n",
91+
" FCID_Code VARCHAR(10),\n",
92+
" Cooked_Status TINYINT,\n",
93+
" Food_Form TINYINT,\n",
94+
" Cooking_Method TINYINT,\n",
95+
" Intake DECIMAL(13,7),\n",
96+
" Intake_BW DECIMAL(13,10),\n",
97+
" PRIMARY KEY(SeqN, DayCode, FCID_Code, Cooked_Status, Food_Form, Cooking_Method),\n",
98+
" FOREIGN KEY(FCID_Code)\n",
99+
" REFERENCES FCID_Description(FCID_Code)\n",
100+
" ON DELETE NO ACTION\n",
101+
" ON UPDATE NO ACTION\n",
102+
") AS SELECT\n",
103+
" SEQN, DAYCODE, DRABF, FCID_Code, Cooked_Status, Food_Form, Cooking_Method, Intake,Intake_BW\n",
104+
"FROM CSVREAD('../../data/food-intake/consumption/Commodity_CSFFM_Intake_0510-cropped.csv');"
105+
]
106+
},
107+
{
108+
"cell_type": "markdown",
109+
"metadata": {},
110+
"source": [
111+
"# Visualizando as Tabelas"
112+
]
113+
},
114+
{
115+
"cell_type": "code",
116+
"execution_count": 3,
117+
"metadata": {},
118+
"outputs": [
119+
{
120+
"data": {
121+
"application/vnd.jupyter.widget-view+json": {
122+
"model_id": "b89a3f80-02b3-4acb-bb7c-5d3d4f855e42",
123+
"version_major": 2,
124+
"version_minor": 0
125+
},
126+
"method": "display_data"
127+
},
128+
"metadata": {},
129+
"output_type": "display_data"
130+
}
131+
],
132+
"source": [
133+
"SELECT * FROM Crop_Group LIMIT 10;"
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": 4,
139+
"metadata": {},
140+
"outputs": [
141+
{
142+
"data": {
143+
"application/vnd.jupyter.widget-view+json": {
144+
"model_id": "9b197073-9158-4939-8e60-adfcfb546c1e",
145+
"version_major": 2,
146+
"version_minor": 0
147+
},
148+
"method": "display_data"
149+
},
150+
"metadata": {},
151+
"output_type": "display_data"
152+
}
153+
],
154+
"source": [
155+
"SELECT * FROM FCID_Description LIMIT 10;"
156+
]
157+
},
158+
{
159+
"cell_type": "code",
160+
"execution_count": 5,
161+
"metadata": {},
162+
"outputs": [
163+
{
164+
"data": {
165+
"application/vnd.jupyter.widget-view+json": {
166+
"model_id": "10c6feb9-2454-4656-bba0-ece47f008442",
167+
"version_major": 2,
168+
"version_minor": 0
169+
},
170+
"method": "display_data"
171+
},
172+
"metadata": {},
173+
"output_type": "display_data"
174+
}
175+
],
176+
"source": [
177+
"SELECT * FROM Recipes LIMIT 10;"
178+
]
179+
},
180+
{
181+
"cell_type": "code",
182+
"execution_count": 6,
183+
"metadata": {
184+
"scrolled": true
185+
},
186+
"outputs": [
187+
{
188+
"data": {
189+
"application/vnd.jupyter.widget-view+json": {
190+
"model_id": "7840e179-1311-409f-9ecf-6689a574ee1d",
191+
"version_major": 2,
192+
"version_minor": 0
193+
},
194+
"method": "display_data"
195+
},
196+
"metadata": {},
197+
"output_type": "display_data"
198+
}
199+
],
200+
"source": [
201+
"SELECT * FROM Intake LIMIT 10;"
202+
]
203+
},
204+
{
205+
"cell_type": "markdown",
206+
"metadata": {},
207+
"source": [
208+
"# Métricas\n",
209+
"\n",
210+
"Considere que a tabela Intake registra alimentos consumidos por 1.489 pessoas. Considere as seguintes métricas para um alimento:\n",
211+
"\n",
212+
"| Métrica | Descrição |\n",
213+
"| --- | --- |\n",
214+
"| Popularidade | número de pessoas (dentre as 1.489) que consumiram o alimento |\n",
215+
"| Intake_Sum | total consumido do alimento pelas 1.489 pessoas em gramas |\n",
216+
"| Intake_AVG | média de consumo do alimento em gramas |\n",
217+
"| Intake_AVG_BW | média de consumo do alimento x peso da pessoa |\n",
218+
"| Recipes | número de receitas (dentre as 7.154 receitas) que têm o produto como ingrediente |"
219+
]
220+
},
221+
{
222+
"cell_type": "markdown",
223+
"metadata": {},
224+
"source": [
225+
"## 1) Construa uma View que apresente essas métricas por produto\n",
226+
"\n",
227+
"* Veja exemplo em: `/data/food-intake/computed/commodity-profile.csv`\n",
228+
"* Importante: esta tabela foi feita com um número maior de registros, portanto os valores não serão iguais aos seus"
229+
]
230+
},
231+
{
232+
"cell_type": "code",
233+
"execution_count": null,
234+
"metadata": {},
235+
"outputs": [],
236+
"source": []
237+
},
238+
{
239+
"cell_type": "markdown",
240+
"metadata": {},
241+
"source": [
242+
"## 2) Como você analisaria a correlação entre as métricas?\n",
243+
"\n",
244+
"* Por exemplo, produtos mais populares são mais consumidos (em número de pessoas ou em quantidade)?\n",
245+
"* Proponha uma ou mais queries para fazer esta análise"
246+
]
247+
},
248+
{
249+
"cell_type": "code",
250+
"execution_count": null,
251+
"metadata": {},
252+
"outputs": [],
253+
"source": []
254+
},
255+
{
256+
"cell_type": "markdown",
257+
"metadata": {},
258+
"source": [
259+
"## 3) Podemos criar grupos de consumidores conforme um perfil?\n",
260+
"* por exemplo, consumidores podem ser agrupados por alimentos que comem predominantemente?\n",
261+
"* como você associaria grupos a classes?"
262+
]
263+
},
264+
{
265+
"cell_type": "code",
266+
"execution_count": null,
267+
"metadata": {},
268+
"outputs": [],
269+
"source": []
270+
},
271+
{
272+
"cell_type": "markdown",
273+
"metadata": {},
274+
"source": [
275+
"## 4) Que métricas podem ser analisadas para a comparação de perfis?\n",
276+
"* escreva uma query SQL que calcule pelo menos uma métrica comparativa"
277+
]
278+
},
279+
{
280+
"cell_type": "code",
281+
"execution_count": null,
282+
"metadata": {},
283+
"outputs": [],
284+
"source": []
285+
}
286+
],
287+
"metadata": {
288+
"kernelspec": {
289+
"display_name": "SQL",
290+
"language": "SQL",
291+
"name": "sql"
292+
},
293+
"language_info": {
294+
"codemirror_mode": "sql",
295+
"file_extension": ".sql",
296+
"mimetype": "",
297+
"name": "SQL",
298+
"nbconverter_exporter": "",
299+
"version": ""
300+
},
301+
"toc": {
302+
"base_numbering": 1,
303+
"nav_menu": {},
304+
"number_sections": false,
305+
"sideBar": false,
306+
"skip_h1_title": false,
307+
"title_cell": "Table of Contents",
308+
"title_sidebar": "Contents",
309+
"toc_cell": false,
310+
"toc_position": {},
311+
"toc_section_display": false,
312+
"toc_window_display": false
313+
}
314+
},
315+
"nbformat": 4,
316+
"nbformat_minor": 2
317+
}

0 commit comments

Comments
 (0)