some custom presto functions
Java 8 Update 60 or higher (8u60+)
Maven 3.3.9+ (for building)
cd ${project_home}
mvn clean package
If you want skip unit tests, please run:
mvn clean package -DskipTests
It will generate presto-third-functions-{version}-shaded.jar in target directory.
You can also directly download file from release page .
version
description
0.2.0
support presto-0.147
~presto-0.149
0.3.0
support presto-0.150
~presto-0.151
0.4.0
support presto-0.152
0.5.0
support presto-0.153
~presto-0.166
0.5.1
support presto-0.167
~presto-0.168
function
description
pinyin(string) -> string
convert chinese to pinyin
md5(string) -> string
md5 hash
sha256(string) -> string
sha256 hash
function
description
array_union(array, array) -> array
return union result of two array.
value_count(array(T), T value) -> int
count ARRAY's element number that element value equals given value.
I had already proposed a pull request about array_union
. Currently, it be merged to presto master branch. So, If your presto version > 0.151, it already include array_union function.
To support presto-0.150+
, from 0.3.0
, it had rename to arr_union
. (from 0.5.0
, I had delete arr_union
function, please use array_union
.)
function
description
dayofweek(date_string | date) -> int
day of week,if monday,return 1, sunday return 7, error return -1.
zodiac(date_string | date) -> string
convert date to zodiac
zodiac_cn(date_string | date) -> string
convert date to zodiac chinese
function
description
json_array_extract(json, jsonPath) -> array(varchar)
extract json array by given jsonPath.
json_array_extract_scalar(json, jsonPath) -> array(varchar)
like json_array_extract
, but returns the result value as a string (as opposed to being encoded as JSON).
function
description
value_count(MAP(K,V), V value) -> int
count MAP's element number that element value equals given value.
function
description
wgs_distance(double lat1, double lng1, double lat2, double lng2) -> double
calculate WGS84 coordinate distance, in meters.
gcj_to_bd(double,double) -> json
GCJ-02(火星坐标系) convert to BD-09(百度坐标系), 谷歌、高德——>百度
bd_to_gcj(double,double) -> json
BD-09(百度坐标系) convert to GCJ-02(火星坐标系), 百度——>谷歌、高德
wgs_to_gcj(double,double) -> json
WGS84(地球坐标系) convert to GCJ02(火星坐标系)
gcj_to_wgs(double,double) -> json
GCJ02(火星坐标系) convert to GPS84(地球坐标系), output coordinate WGS-84 accuracy within 1 to 2 meters.
gcj_extract_wgs(double,double) -> json
GCJ02(火星坐标系) convert to GPS84, output coordinate WGS-84 accuracy within 0.5 meters. but compute cost more time than gcj_to_wgs
.
关于互联网地图坐标系的说明见: 当前互联网地图的坐标系现状
7. china id card functions
function
description
id_card_province(string) -> string
get user's province
id_card_city(string) -> string
get user's city
id_card_area(string) -> string
get user's area
id_card_birthday(string) -> string
get user's birthday
id_card_gender(string) -> string
get user's gender
is_valid_id_card(string) -> boolean
determine is valid china id card No.
id_card_info(string) -> json
get china id card info. include province, city, area etc.
function
description
is_null(all_type) -> boolean
whether is null or not
put presto-third-functions-{version}-shaded.jar into ${presto_home}/plugin/hive-hadoop2
and restart presto server. for example:
mv presto-third-functions-{version}-shaded.jar /home/presto/presto-server-0.147/plugin/hive-hadoop2/
cd /home/presto/presto-server-0.147
bin/launcher restart
cd /home/presto/
ln -s presto-client-0.147/presto-cli-0.147-executable.jar presto-cli
export JAVA_HOME=/home/java8/jdk1.8.0_91/;
export PATH=/home/java8/jdk1.8.0_91/bin/:$PATH;
alias presto="/home/presto/presto-cli --server localhost:8080 --catalog hive --schema default"
presto:default> select pinyin(country) from (values '中国') as t(country);
_col0
----------
zhongguo
(1 row)
Query 20160707_073649_00006_iya2r, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
presto:default> select md5(col1), sha256(col1) from (values 'aaronshan') as t(col1)\G;
-[ RECORD 1 ]-----------------------------------------------------------
_col0 | 95686bc0483262afe170b550dd4544d1
_col1 | d16bb375433ad383169f911afdf45e209eabfcf047ba1faebdd8f6a0b39e0a32
Query 20160712_071936_00006_hkbes, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
presto:default> select array_union(arr1, arr2) from (values (ARRAY [1,3,5,null], ARRAY [2,3,4,null])) as t(arr1, arr2);
_col0
-----------------------
[1, 3, 5, null, 2, 4]
(1 row)
Query 20160713_061707_00004_82kmt, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
presto:default> select value_count(arr1, 'a') from (values (ARRAY['a', 'b', 'a'])) t(arr1);
_col0
-------
2
(1 row)
Query 20160721_111719_00008_xgf26, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
presto
presto:default> select dayofweek(my_day) from (values '2016-07-07') as t(my_day);
_col0
-------
4
(1 row)
Query 20160707_073523_00005_iya2r, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
presto:default> select json_array_extract(arr1, '$.book.id') from (values ('[{"book":{"id":"12"}}, {"book":{"id":"14"}}]')) t(arr1);
_col0
--------------
["12", "14"]
(1 row)
Query 20160721_105423_00006_xgf26, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
presto:default> select json_array_extract_scalar(arr1, '$.book.id') from (values ('[{"book":{"id":"12"}}, {"book":{"id":"14"}}]')) t(arr1);
_col0
----------
[12, 14]
(1 row)
Query 20160721_105426_00007_xgf26, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
presto:default> select map1, value_count(map1, 'a') from (values (map(ARRAY[1,2,3], ARRAY['a', 'b', 'a']))) t(map1);
map1 | _col1
-----------------+-------
{1=a, 2=b, 3=a} | 2
(1 row)
Query 20160721_111906_00011_xgf26, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
3.6 china id card functions
presto:default> select id_card_info(card) from (values '110101198901084517') as t(card);
_col0
----------------------------------------------------------------------------------
{"area":"东城区","valid":true,"province":"北京市","gender":"男","city":"北京市"}
(1 row)
Query 20160712_071700_00004_hkbes, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
presto:default> select gcj_to_bd(lat,lng), bd_to_gcj(lat,lng), wgs_to_gcj(lat,lng), gcj_to_wgs(lat,lng), gcj_extract_wgs(lat,lng) from (values (39.915, 116.404)) as t(lat, lng)\G;
-[ RECORD 1 ]----------------------------------------------
_col0 | {"lng":116.41036949371029,"lat":39.92133699351022}
_col1 | {"lng":116.39762729119315,"lat":39.90865673957631}
_col2 | {"lng":116.41024449916938,"lat":39.91640428150164}
_col3 | {"lng":116.39775550083061,"lat":39.91359571849836}
_col4 | {"lng":116.39775549316407,"lat":39.913596801757805}
Query 20160712_024714_00003_9rund, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
presto:default> select is_null(col0),is_null(col1),is_null(col2),is_null(col3) from (values ('test', 1, 0.5, ARRAY [1]),(null, null, null, null)) as t(col0, col1, col2,col3);
_col0 | _col1 | _col2 | _col3
-------+-------+-------+-------
false | false | false | false
true | true | true | true
(2 rows)
Query 20160713_061435_00003_82kmt, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]