Skip to content

Commit 6dcea09

Browse files
committed
full_stack/to_json
1 parent 88f62de commit 6dcea09

File tree

2 files changed

+249
-1
lines changed

2 files changed

+249
-1
lines changed

docs/source/user_guide_full_stack.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Full Stack
3737
.. grid-item::
3838

3939
.. card:: XGBoost.to_json
40-
:link: user_guide.full_stack.geopandas
40+
:link: user_guide.full_stack.to_json
4141
:link-type: ref
4242
:text-align: center
4343
:class-card: custom-card-15
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
.. _user_guide.full_stack.to_json:
2+
3+
=========================
4+
Example: XGBoost.to_json
5+
=========================
6+
7+
Connect to Vertica
8+
--------------------
9+
10+
11+
For a demonstration on how to create a new connection to Vertica,
12+
see :ref:`connection`. In this example, we will use an
13+
existing connection named 'VerticaDSN'.
14+
15+
.. code-block:: python
16+
17+
import verticapy as vp
18+
vp.connect("VerticaDSN")
19+
20+
21+
Create a Schema (Optional)
22+
---------------------------
23+
24+
25+
Schemas allow you to organize database objects in a collection,
26+
similar to a namespace. If you create a database object
27+
without specifying a schema, Vertica uses the 'public'
28+
schema. For example, to specify the 'example_table' in 'example_schema',
29+
you would use: 'example_schema.example_table'.
30+
31+
To keep things organized, this example creates the 'xgb_to_json'
32+
schema and drops it (and its associated tables, views, etc.) at the end:
33+
34+
.. ipython:: python
35+
:suppress:
36+
37+
import verticapy as vp
38+
39+
.. ipython:: python
40+
41+
vp.drop("xgb_to_json", method = "schema")
42+
vp.create_schema("xgb_to_json")
43+
44+
Load Data
45+
----------
46+
47+
VerticaPy lets you load many well-known datasets like Iris, Titanic, Amazon, etc.
48+
For a full list, check out :ref:`datasets`.
49+
50+
.. ipython:: python
51+
52+
from verticapy.datasets import load_titanic
53+
vdf = load_titanic(
54+
name = "titanic",
55+
schema = "xgb_to_json",
56+
)
57+
58+
59+
You can also load your own data. To ingest data from a CSV file,
60+
use the :py:func:`verticapy.read_csv` function.
61+
62+
Create a vDataFrame
63+
--------------------
64+
65+
vDataFrames allow you to prepare and explore your data without modifying its representation in your Vertica database. Any changes you make are applied to the vDataFrame as modifications to the SQL query for the table underneath.
66+
67+
To create a vDataFrame out of a table in your Vertica database, specify its schema and table name with the standard SQL syntax. For example, to create a vDataFrame out of the 'titanic' table in the 'xgb_to_json' schema:
68+
69+
.. ipython:: python
70+
71+
vdf = vp.vDataFrame("xgb_to_json.titanic")
72+
73+
Create an XGB model
74+
-------------------
75+
76+
Create a :py:func:`verticapy.machine_learning.vertica.ensemble.XGBClassifier` model.
77+
78+
Unlike a vDataFrame object, which simply queries the table it
79+
was created with, the VerticaPy :py:func:`verticapy.machine_learning.vertica.ensemble.XGBClassifier` object creates
80+
and then references a model in Vertica, so it must be stored in a
81+
schema like any other database object.
82+
83+
This example creates the 'my_model' :py:func:`verticapy.machine_learning.vertica.ensemble.XGBClassifier` model in
84+
the 'xgb_to_json' schema:
85+
86+
This example loads the Titanic dataset with the load_titanic function
87+
into a table called 'titanic' in the 'xgb_to_json' schema:
88+
89+
.. ipython:: python
90+
91+
from verticapy.machine_learning.vertica.ensemble import XGBClassifier
92+
model = XGBClassifier(
93+
"xgb_to_json.my_model",
94+
max_ntree = 4,
95+
max_depth = 3,
96+
)
97+
98+
Prepare the Data
99+
-----------------
100+
101+
102+
While Vertica XGBoost supports columns of type VARCHAR,
103+
Python XGBoost does not, so you must encode the categorical
104+
columns you want to use. You must also drop or impute missing values.
105+
106+
This example drops 'age,' 'fare,' 'sex,' 'embarked,' and
107+
'survived' columns from the vDataFrame and then encodes the
108+
'sex' and 'embarked' columns. These changes are applied to
109+
the vDataFrame's query and does not affect the main
110+
"xgb_to_json.titanic' table stored in Vertica:
111+
112+
.. ipython:: python
113+
114+
vdf = vdf[["age", "fare", "sex", "embarked", "survived"]];
115+
vdf.dropna();
116+
vdf["sex"].label_encode();
117+
vdf["embarked"].label_encode();
118+
119+
120+
.. ipython:: python
121+
:suppress:
122+
:okwarning:
123+
124+
res = vdf
125+
html_file = open("/project/data/VerticaPy/docs/figures/ug_fs_to_json_vdf.html", "w")
126+
html_file.write(res._repr_html_())
127+
html_file.close()
128+
129+
.. raw:: html
130+
:file: /project/data/VerticaPy/docs/figures/ug_fs_to_json_vdf.html
131+
132+
133+
134+
Split your data into training and testing:
135+
136+
.. ipython:: python
137+
138+
train, test = vdf.train_test_split(0.05);
139+
140+
Train the Model
141+
----------------
142+
143+
Define the predictor and the response columns:
144+
145+
.. ipython:: python
146+
147+
relation = train;
148+
X = ["age", "fare", "sex", "embarked"]
149+
y = "survived"
150+
151+
Train the model with fit():
152+
153+
.. ipython:: python
154+
:okwarning:
155+
156+
model.fit(relation, X, y)
157+
158+
Evaluate the Model
159+
--------------------
160+
161+
Evaluate the model with ``.report()``:
162+
163+
.. code-block:: ipython
164+
165+
model.report()
166+
167+
.. ipython:: python
168+
:suppress:
169+
:okwarning:
170+
171+
res = model.report()
172+
html_file = open("/project/data/VerticaPy/docs/figures/ug_fs_to_json_report.html", "w")
173+
html_file.write(res._repr_html_())
174+
html_file.close()
175+
176+
.. raw:: html
177+
:file: /project/data/VerticaPy/docs/figures/ug_fs_to_json_report.html
178+
179+
Use to_json() to export the model to a JSON file. If you omit a filename, VerticaPy prints the model:
180+
181+
.. ipython:: python
182+
183+
model.to_json()
184+
185+
186+
To export and save the model as a JSON file, specify a filename:
187+
188+
.. ipython:: python
189+
190+
model.to_json("exported_xgb_model.json");
191+
192+
Unlike Python XGBoost, Vertica does not store some information like
193+
'sum_hessian' or 'loss_changes,' and the exported model from
194+
``to_json()`` replaces this information with a list of zeroes
195+
These information are replaced by a list filled with zeros.
196+
197+
Make Predictions with an Exported Model
198+
----------------------------------------
199+
200+
This exported model can be used with the Python XGBoost API right away,
201+
and exported models make identical predictions in Vertica and Python:
202+
203+
.. ipython:: python
204+
205+
import pytest
206+
import xgboost as xgb
207+
model_python = xgb.XGBClassifier();
208+
model_python.load_model("exported_xgb_model.json");
209+
# Convert to numpy format
210+
X_test = test["age","fare","sex","embarked"].to_numpy() ;
211+
y_test_vertica = model.to_python(return_proba = True)(X_test);
212+
y_test_python = model_python.predict_proba(X_test);
213+
result = (y_test_vertica - y_test_python) ** 2;
214+
result = result.sum() / len(result);
215+
assert result == pytest.approx(0.0, abs = 1.0E-14)
216+
217+
For multiclass classifiers, the probabilities returned by the VerticaPy and the exported model may differ slightly because of normalization; while Vertica uses multinomial logistic regression, XGBoost Python uses Softmax. Again, this difference does not affect the model's final predictions. Categorical predictors must be encoded.
218+
219+
220+
Clean the Example Environment
221+
------------------------------
222+
223+
Drop the 'xgb_to_json' schema, using CASCADE to drop any
224+
database objects stored inside (the 'titanic' table, the
225+
:py:func:`verticapy.machine_learning.vertica.ensemble.XGBClassifier`
226+
model, etc.), then delete the 'exported_xgb_model.json' file:
227+
228+
.. ipython:: python
229+
230+
import os
231+
os.remove("exported_xgb_model.json")
232+
vp.drop("xgb_to_json", method = "schema")
233+
234+
Conclusion
235+
-----------
236+
237+
VerticaPy lets you to create, train, evaluate, and export
238+
Vertica machine learning models. There are some notable
239+
nuances when importing a Vertica XGBoost model into
240+
Python XGBoost, but these do not affect the accuracy of the model or its predictions:
241+
242+
Some information computed during the training phase may not
243+
be stored (e.g. 'sum_hessian' and 'loss_changes').
244+
The exact probabilities of multiclass classifiers in a
245+
Vertica model may differ from those in Python, but bot h
246+
will make the same predictions.
247+
Python XGBoost does not support categorical predictors,
248+
so you must encode them before training the model in VerticaPy.

0 commit comments

Comments
 (0)