forked from qubole/qds-sdk-py
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
68 lines (37 loc) · 1.95 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
Qubole Data Service Python SDK
==============================
A Python module that provides the tools you need to authenticate with, and use the Qubole Data Service API.
Installation
------------
Run the following command (may need to do this as root):
$ python setup.py install
This should place a command line utility 'qds.py' somewhere in your path
$ which qds.py
/usr/bin/qds.py
CLI
---
qds.py allows running Hive, Hadoop, Pig and Shell commands against QDS. Users can run commands synchronously - or submit a command and check it's status.
$ qds.py -h # will print detailed usage
Examples:
1. run a hive query and print the results
$ qds.py --token 'xxyyzz' hivecmd run --query "show tables"
$ qds.py --token 'xxyyzz' hivecmd run --script_location /tmp/myquery
$ qds.py --token 'xxyyzz' hivecmd run --script_location s3://my-qubole-location/myquery
2. pass in api token from bash environment variable
$ export QDS_API_TOKEN=xxyyzz
3. run the example hadoop command
$ qds.py hadoopcmd run streaming -files 's3n://paid-qubole/HadoopAPIExamples/WordCountPython/mapper.py,s3n://paid-qubole/HadoopAPIExamples/WordCountPython/reducer.py' -mapper mapper.py -reducer reducer.py -numReduceTasks 1 -input 's3n://paid-qubole/default-datasets/gutenberg' -output 's3n://example.bucket.com/wcout'
4. check the status of command # 12345678
$ qds.py hivecmd check 12345678
{"status": "done", ... }
SDK API
-------
An example Python application needs to do the following:
1. Set the api_token:
from qds_sdk.qubole import Qubole
Qubole.configure(api_token='ksbdvcwdkjn123423')
2. Use the Command classes defined in commands.py to execute commands. To run Hive Command:
from qds_sdk.commands import *
hc=HiveCommand.create(query='show tables')
print "Id: %s, Status: %s" % (str(hc.id), hc.status)
example/mr_1.py contains a Hadoop Streaming example