-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.html
566 lines (524 loc) · 39.9 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
<!DOCTYPE HTML>
<html lang="en" class="light" dir="ltr">
<head>
<!-- Book generated using mdBook -->
<meta charset="UTF-8">
<title>Home - docs</title>
<!-- Custom HTML head -->
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="theme-color" content="#ffffff">
<link rel="icon" href="favicon.svg">
<link rel="shortcut icon" href="favicon.png">
<link rel="stylesheet" href="css/variables.css">
<link rel="stylesheet" href="css/general.css">
<link rel="stylesheet" href="css/chrome.css">
<link rel="stylesheet" href="css/print.css" media="print">
<!-- Fonts -->
<link rel="stylesheet" href="FontAwesome/css/font-awesome.css">
<link rel="stylesheet" href="fonts/fonts.css">
<!-- Highlight.js Stylesheets -->
<link rel="stylesheet" href="highlight.css">
<link rel="stylesheet" href="tomorrow-night.css">
<link rel="stylesheet" href="ayu-highlight.css">
<!-- Custom theme stylesheets -->
</head>
<body class="sidebar-visible no-js">
<div id="body-container">
<!-- Provide site root to javascript -->
<script>
var path_to_root = "";
var default_theme = window.matchMedia("(prefers-color-scheme: dark)").matches ? "navy" : "light";
</script>
<!-- Work around some values being stored in localStorage wrapped in quotes -->
<script>
try {
var theme = localStorage.getItem('mdbook-theme');
var sidebar = localStorage.getItem('mdbook-sidebar');
if (theme.startsWith('"') && theme.endsWith('"')) {
localStorage.setItem('mdbook-theme', theme.slice(1, theme.length - 1));
}
if (sidebar.startsWith('"') && sidebar.endsWith('"')) {
localStorage.setItem('mdbook-sidebar', sidebar.slice(1, sidebar.length - 1));
}
} catch (e) { }
</script>
<!-- Set the theme before any content is loaded, prevents flash -->
<script>
var theme;
try { theme = localStorage.getItem('mdbook-theme'); } catch(e) { }
if (theme === null || theme === undefined) { theme = default_theme; }
var html = document.querySelector('html');
html.classList.remove('light')
html.classList.add(theme);
var body = document.querySelector('body');
body.classList.remove('no-js')
body.classList.add('js');
</script>
<input type="checkbox" id="sidebar-toggle-anchor" class="hidden">
<!-- Hide / unhide sidebar before it is displayed -->
<script>
var body = document.querySelector('body');
var sidebar = null;
var sidebar_toggle = document.getElementById("sidebar-toggle-anchor");
if (document.body.clientWidth >= 1080) {
try { sidebar = localStorage.getItem('mdbook-sidebar'); } catch(e) { }
sidebar = sidebar || 'visible';
} else {
sidebar = 'hidden';
}
sidebar_toggle.checked = sidebar === 'visible';
body.classList.remove('sidebar-visible');
body.classList.add("sidebar-" + sidebar);
</script>
<nav id="sidebar" class="sidebar" aria-label="Table of contents">
<div class="sidebar-scrollbox">
<ol class="chapter"><li class="chapter-item expanded affix "><a href="../../index.html" class="active">Home</a></li><li class="chapter-item expanded "><a href="intro.html"><strong aria-hidden="true">1.</strong> Intro</a></li><li class="chapter-item expanded "><a href="quick_start.html"><strong aria-hidden="true">2.</strong> Quick Start Guide</a></li><li class="chapter-item expanded "><a href="configuration.html"><strong aria-hidden="true">3.</strong> Configuration</a></li><li class="chapter-item expanded "><a href="endpoints.html"><strong aria-hidden="true">4.</strong> Endpoints</a></li><li class="chapter-item expanded "><a href="quering.html"><strong aria-hidden="true">5.</strong> Querying</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="rest.html"><strong aria-hidden="true">5.1.</strong> Rest</a></li><li class="chapter-item expanded "><a href="sql.html"><strong aria-hidden="true">5.2.</strong> Sql</a></li><li class="chapter-item expanded "><a href="graphQl.html"><strong aria-hidden="true">5.3.</strong> GraphQL</a></li></ol></li><li class="chapter-item expanded "><a href="authorization.html"><strong aria-hidden="true">6.</strong> Authorization</a></li><li class="chapter-item expanded "><a href="text-to-sql.html"><strong aria-hidden="true">7.</strong> TEXT-TO-SQL</a></li></ol>
</div>
<div id="sidebar-resize-handle" class="sidebar-resize-handle"></div>
</nav>
<!-- Track and set sidebar scroll position -->
<script>
var sidebarScrollbox = document.querySelector('#sidebar .sidebar-scrollbox');
sidebarScrollbox.addEventListener('click', function(e) {
if (e.target.tagName === 'A') {
sessionStorage.setItem('sidebar-scroll', sidebarScrollbox.scrollTop);
}
}, { passive: true });
var sidebarScrollTop = sessionStorage.getItem('sidebar-scroll');
sessionStorage.removeItem('sidebar-scroll');
if (sidebarScrollTop) {
// preserve sidebar scroll position when navigating via links within sidebar
sidebarScrollbox.scrollTop = sidebarScrollTop;
} else {
// scroll sidebar to current active section when navigating via "next/previous chapter" buttons
var activeSection = document.querySelector('#sidebar .active');
if (activeSection) {
activeSection.scrollIntoView({ block: 'center' });
}
}
</script>
<div id="page-wrapper" class="page-wrapper">
<div class="page">
<div id="menu-bar-hover-placeholder"></div>
<div id="menu-bar" class="menu-bar sticky">
<div class="left-buttons">
<label id="sidebar-toggle" class="icon-button" for="sidebar-toggle-anchor" title="Toggle Table of Contents" aria-label="Toggle Table of Contents" aria-controls="sidebar">
<i class="fa fa-bars"></i>
</label>
<button id="theme-toggle" class="icon-button" type="button" title="Change theme" aria-label="Change theme" aria-haspopup="true" aria-expanded="false" aria-controls="theme-list">
<i class="fa fa-paint-brush"></i>
</button>
<ul id="theme-list" class="theme-popup" aria-label="Themes" role="menu">
<li role="none"><button role="menuitem" class="theme" id="light">Light</button></li>
<li role="none"><button role="menuitem" class="theme" id="rust">Rust</button></li>
<li role="none"><button role="menuitem" class="theme" id="coal">Coal</button></li>
<li role="none"><button role="menuitem" class="theme" id="navy">Navy</button></li>
<li role="none"><button role="menuitem" class="theme" id="ayu">Ayu</button></li>
</ul>
<button id="search-toggle" class="icon-button" type="button" title="Search. (Shortkey: s)" aria-label="Toggle Searchbar" aria-expanded="false" aria-keyshortcuts="S" aria-controls="searchbar">
<i class="fa fa-search"></i>
</button>
</div>
<h1 class="menu-title">docs</h1>
<div class="right-buttons">
<a href="print.html" title="Print this book" aria-label="Print this book">
<i id="print-button" class="fa fa-print"></i>
</a>
</div>
</div>
<div id="search-wrapper" class="hidden">
<form id="searchbar-outer" class="searchbar-outer">
<input type="search" id="searchbar" name="searchbar" placeholder="Search this book ..." aria-controls="searchresults-outer" aria-describedby="searchresults-header">
</form>
<div id="searchresults-outer" class="searchresults-outer hidden">
<div id="searchresults-header" class="searchresults-header"></div>
<ul id="searchresults">
</ul>
</div>
</div>
<!-- Apply ARIA attributes after the sidebar and the sidebar toggle button are added to the DOM -->
<script>
document.getElementById('sidebar-toggle').setAttribute('aria-expanded', sidebar === 'visible');
document.getElementById('sidebar').setAttribute('aria-hidden', sidebar !== 'visible');
Array.from(document.querySelectorAll('#sidebar a')).forEach(function(link) {
link.setAttribute('tabIndex', sidebar === 'visible' ? 0 : -1);
});
</script>
<div id="content" class="content">
<main>
<h1 id="ruspie"><a class="header" href="#ruspie">RUSPIE</a></h1>
<h2 id="introduction"><a class="header" href="#introduction">INTRODUCTION</a></h2>
<p>Ruspie is a query engine for datasets stored in CSV and Parquet formats. It allows you to query your data using SQL, REST API, and GraphQL. Ruspie is built on top of Apache Arrow and Datafusion, and it is written in Rust.</p>
<p>To use Ruspie, you can start the server by running the <code>cargo run</code> command, and specifying the path to your dataset files using the <code>FILE_PATH</code> environment variable. You can then send queries(SQL, GraphQL, REST Query params) to the server using the REST API. Ruspie supports a variety of query operators, such as filtering, sorting, and limiting the number of results, which can be specified in the query.</p>
<h2 id="quickstart"><a class="header" href="#quickstart">QUICKSTART</a></h2>
<p>To quickly setup Ruspie, follow these steps:</p>
<p>Install Rust, if you don't already have it installed. You can do this by following the instructions on the Rust website: https://www.rust-lang.org/tools/install.</p>
<p>Clone the Ruspie repository from GitHub: git clone https://github.com/factly/ruspie.git</p>
<p>Navigate to the cloned repository and build the project using <code>cargo build</code></p>
<p>Set the <code>FILE_PATH</code> environment variable to the path of the dataset files that you want to serve through the API. For example: <code>FILE_PATH=./data</code></p>
<p>Start the Ruspie server by running <code>cargo run</code></p>
<p>You can now send queries to the Ruspie server using the REST API, SQL, or GraphQL. For more information, see the documentation for the project.</p>
<p>Note: If you want to enable token-based authorization, you will need to set the <code>MASTER_KEY</code> environment variable before starting the server. This will enable authentication for all the endpoints, and you will need to pass a valid authorization token in the <code>AUTHORIZATION</code> header of your requests.</p>
<h2 id="configuration"><a class="header" href="#configuration">CONFIGURATION</a></h2>
<p>You can configure Ruspie using environment variables. The following environment variables are available:</p>
<p><code>SOURCE</code>: This specifies the source of the files to be fetched from. It can either be <code>S3</code> or <code>FILESYSTEM</code>, by default it is set to <code>FILESYSTEM</code>.</p>
<p><code>S3_PATH</code>: If the <code>SOURCE</code> env variable is set to <code>S3</code>, you have to spicify this env variable. <code>S3_PATH</code> can also be passed from request headers which overrides the value set in env. If the env variable is not set it is defaulted to <code>ruspie/</code>.Ex:</p>
<pre><code class="language-bash"># if your a have s3 url bucket/path/to/file.csv
export SOURCE=S3
export S3_PATH=bucket/path/to
cargo run
# In this request it will serve from s3://bucket/path/to/blogs.csv
curl -H "FILE-EXT: csv" http://localhost:8080/api/tables/blogs
# In this request it will override env value as serve from s3://newbucket/newpath/to/blogs.csv
curl -H "S3_PATH: newbucket/newpath/to" http://localhost:8080/api/tables/blogs
</code></pre>
<p><code>FILE_PATH</code>: If <code>SOURCE</code> is set to <code>FILESYSTEM</code> this specifies the path to the dataset files that you want to serve through the API. If not set, the default is the test directory in the root of the project.</p>
<p><code>PORT</code>: This specifies the port that the Ruspie server will listen on. If not set, the default is 8080.</p>
<p><code>MASTER_KEY</code>: This enables token-based authorization for all endpoints. If not set, authentication is disabled.</p>
<p><code>LIMIT</code>: This specifies the default limit on the number of results returned by a query. If not set, the default is 100.</p>
<p><code>MAX_LIMIT</code>: This specifies the maximum limit that can be specified in a query. If not set, the default is 1000.</p>
<p><code>DEFAULT_EXT</code>: This specifies the default file extension that Ruspie will look for when serving files through the API. If not set, the default is csv.</p>
<p>To set an environment variable, you can use the export command before starting the Ruspie server. For example, to set the FILE_PATH variable to the data directory and the PORT variable to 8080, you can run the following commands:</p>
<pre><code class="language-bash">export SOURCE=FILESYSTEM
export FILE_PATH=./data
export PORT=8080
cargo run
</code></pre>
<p>You can also set the environment variables in your shell configuration file (e.g. .bashrc or .zshrc) so that they are automatically set when you open a new terminal. For more information, see the documentation for your shell.</p>
<h2 id="endpoints"><a class="header" href="#endpoints">ENDPOINTS</a></h2>
<p>Ruspie exposes the following endpoints:</p>
<ol>
<li>
<p>GET /api/tables/{table_name}: This endpoint allows you to query a dataset using the REST query params. You can specify query operators such as filters, sorting, and limits in the URL query parameters.</p>
</li>
<li>
<p>POST /api/sql: This endpoint allows you to query a dataset using SQL queries. You can pass the SQL query in the request body as plain text.</p>
</li>
<li>
<p>POST /api/graphql: This endpoint allows you to query a dataset using GraphQL queries. You can pass the GraphQL query in the request body as plain text.</p>
</li>
<li>
<p>GET /api/schemas/{table_name}: This endpoint returns the schema of the specified dataset.</p>
</li>
</ol>
<p>Note: If you have enabled token-based authorization, these endpoints will require a valid authorization token to be passed in the AUTHORIZATION header of the request.</p>
<p>Additionally, the following endpoints are available for managing API keys when token-based authorization is enabled:</p>
<ol>
<li>
<p>GET /auth/keys: This endpoint allows you to list all the API keys that have been generated.</p>
</li>
<li>
<p>POST /auth/keys: This endpoint allows you to generate a new API key.</p>
</li>
<li>
<p>PATCH /auth/keys/{key_id}: This endpoint allows you to update the name and description of an existing API key.</p>
</li>
<li>
<p>DELETE /auth/keys/{key_id}: This endpoint allows you to delete an existing API key.</p>
</li>
<li>
<p>POST /auth/keys/invalidate/{key_id}: This endpoint allows you to invalidate an existing API key, preventing it from being used to access the API.</p>
</li>
</ol>
<h2 id="querying"><a class="header" href="#querying">QUERYING</a></h2>
<h3 id="rest"><a class="header" href="#rest">REST</a></h3>
<p>To query a dataset using query parameters in Ruspie, you can send a <code>GET</code> request to the <code>/api/tables/{table_name}</code> endpoint, where <code>table_name</code> is the name of the dataset you want to query. You can then specify the query operators in the URL query parameters.</p>
<p>For example, the following request will return the first 10 rows of the customers dataset, sorted in ascending order by the last_name column:</p>
<pre><code class="language-bash">curl "localhost:8080/api/tables/customers?sort=last_name&limit=10"
</code></pre>
<p>You can also use the filter parameter to specify a filter condition. For example, the following request will return all rows from the customers dataset where the first_name column is equal to John:</p>
<pre><code class="language-bash">curl "localhost:8080/api/tables/customers?filter[first_name]=John"
</code></pre>
<p>You can use the page and limit parameters to paginate the results. For example, the following request will return the second page of results, where each page contains 10 rows:</p>
<pre><code class="language-bash">curl "localhost:8080/api/tables/customers?page=2&limit=10"
</code></pre>
<p>The <code>/api/tables/{table_name}</code> endpoint supports the following query parameters:</p>
<ol>
<li>
<p><code>columns</code>: specifies which columns of the dataset to include in the response.</p>
</li>
<li>
<p><code>sort</code>: specifies the order in which the rows of the dataset should be sorted. This parameter can be used to sort the rows in ascending or descending order by one or more columns.</p>
</li>
<li>
<p><code>limit</code>: specifies the maximum number of rows to include in the response.</p>
</li>
<li>
<p><code>filter</code>: specifies a condition that rows must satisfy in order to be included in the response.</p>
</li>
<li>
<p><code>page</code>: specifies which page of the dataset to include in the response, when pagination is used.
For example, to sort the rows of the dataset by the col1 and col2 columns in ascending and descending order, respectively, and return only the first 100 rows, the query might look like this:</p>
</li>
</ol>
<pre><code>/api/tables/{table_name}?sort=col1,-col2&limit=100
</code></pre>
<p>To filter the rows of the dataset to only include those where the col1 column has the value foo, the query might look like this:</p>
<pre><code>/api/tables/{table_name}?filter[col1]=foo
</code></pre>
<p>And to retrieve the second page of the dataset, with 10 rows per page, the query might look like this:</p>
<pre><code>/api/tables/{table_name}?page=2&limit=10
</code></pre>
<p>These query parameters can be combined in various ways to retrieve the desired subset of the dataset.</p>
<h3 id="sql"><a class="header" href="#sql">SQL</a></h3>
<p>To query a dataset using SQL in Ruspie, you can send a <code>POST</code> request to the <code>/api/sql</code> endpoint, and pass the SQL query in the request body as plain text.</p>
<p>For example, the following request will return the first 10 rows of the customers dataset, sorted in ascending order by the last_name column:</p>
<pre><code class="language-bash">curl -X POST -d "SELECT * FROM customers ORDER BY last_name ASC LIMIT 10" localhost:8080/api/sql
</code></pre>
<p>You can also use the WHERE clause in your SQL query to specify a filter condition. For example, the following request will return all rows from the customers dataset where the first_name column is equal to John:</p>
<pre><code class="language-bash">curl -X POST -d "SELECT * FROM customers WHERE first_name = 'John'" localhost:8080/api/sql
</code></pre>
<h3 id="graphql"><a class="header" href="#graphql">GRAPHQL</a></h3>
<p>To query a dataset using GraphQL in Ruspie, you can send a <code>POST</code> request to the <code>/api/graphql</code> endpoint, and pass the GraphQL query in the request body as plain text.</p>
<p>For example, the following request will return the first 10 rows of the customers dataset, sorted in ascending order by the last_name column:</p>
<pre><code class="language-bash">curl -X POST -d "query { customers(sort: [{ field: "last_name", order: "asc" }], limit: 10) { id,first_name,last_name } }" localhost:8080/api/graphql
</code></pre>
<p>You can also use the filter parameter in your GraphQL query to specify a filter condition. For example, the following request will return all rows from the customers dataset where the first_name column is equal to John:</p>
<pre><code class="language-bash">curl -X POST -d "query { customers(filter: { first_name: "John" }) { id,first_name,last_name } }" localhost:8080/api/graphql
</code></pre>
<h3 id="note"><a class="header" href="#note">NOTE</a></h3>
<p>To override the <code>DEFAULT_EXT</code> use <code>FILE-EXT</code> header while querying. For example, to query a dataset in Parquet format, the <code>FILE-EXT</code> header have to be set to parquet:</p>
<pre><code class="language-bash">curl -H "FILE-EXT: parquet" localhost:8080/api/tables/{table_name}
</code></pre>
<h3 id="prefetching-with-robinpie"><a class="header" href="#prefetching-with-robinpie">PREFETCHING WITH ROBINPIE</a></h3>
<p>Prefetching is a technique used in Ruspie to optimize schema infereing process. Robinpie is a component that prefetches dataset file schemas from ruspie and then stores them in a specified source. This is only when the <code>SOURCE</code> env is set to <code>S3</code>.</p>
<h4 id="environment-variables"><a class="header" href="#environment-variables">Environment Variables</a></h4>
<ul>
<li><code>PRE_FETCH_ENABLE</code>: set to true to enable prefetching (Default is false).</li>
<li><code>RUSPIE_PREFETCH_INTERVAL</code>: specifies how often Ruspie should fetch schemas from Source(Default is 60s).</li>
<li><code>ROBINPIE_PREFETCH_INTERVAL</code>: specifies how often Robinpie should fetch schemas from the Ruspie(Default is 30s).</li>
<li><code>PRE_FETCH_SOURCE</code>: specifies the source for fetching schemas (S3, Mongo, or the filesystem) (Default is Mongo).</li>
<li><code>MONGO_URI</code>: specifies the URI for MongoDB when using it as the source.</li>
</ul>
<h4 id="working-of-robinpie"><a class="header" href="#working-of-robinpie">Working of Robinpie</a></h4>
<p>Robinpie fetches all dataset file schemas and stores them in the specified source, creating <code>schemas.json/schemas.parquet</code> when set to <code>S3</code> it uses same <code>S3_PATH</code> . On restart, Ruspie loads schemas from the source instead of inferring them. Robinpie also periodically fetches schemas from Ruspie and updates the source for new files.</p>
<h2 id="authorization"><a class="header" href="#authorization">AUTHORIZATION</a></h2>
<p>In Ruspie, authorization is enabled by setting the <code>MASTER_KEY</code> environment variable. Once this variable is set, users must provide a valid key in the <code>AUTHORIZATION</code> header of their request in order to access the Ruspie APIs.</p>
<p>Ruspie provides a set of endpoints for managing keys, which can be accessed by making a request to the <code>/auth/keys</code> endpoint. This endpoint supports the following methods:</p>
<ol>
<li><code>GET</code>: lists all keys that have been generated.</li>
<li><code>POST</code>: generates a new key.</li>
<li><code>PATCH</code> /{key_id}: edits the name and description of a key.</li>
<li><code>DELETE</code> /{key_id}: deletes a key.</li>
<li><code>POST</code> /invalidate/{key_id}: invalidates a key, rendering it no longer valid for authentication.</li>
</ol>
<p>To access these endpoints, users must provide the AUTHORIZATION header with their request, in the form <code>Bearer {MASTER_KEY}</code>, where <code>{MASTER_KEY}</code> is the value of the <code>MASTER_KEY</code> environment variable.</p>
<p>Once a key has been generated, it can be used to authenticate requests to the Ruspie APIs by providing it in the <code>AUTHORIZATION</code> header, in the form <code>Bearer {key}</code>, where <code>{key}</code> is the generated key. For example, to query the <code>/api/tables/{table_name}</code> endpoint with a generated key, the request might look like this:</p>
<pre><code class="language-bash">curl -H "AUTHORIZATION: Bearer {key}" localhost:8080/api/tables/{table_name}
</code></pre>
<p>Note that keys can be invalidated or deleted, in which case they will no longer be valid for authentication. Users should manage their keys carefully to ensure that only valid keys are used to access the Ruspie APIs.</p>
<h1 id="text-to-sql-service-deprecated"><a class="header" href="#text-to-sql-service-deprecated">Text to SQL Service (Deprecated)</a></h1>
<p><strong>Note: The Text to SQL service that was previously deployed as a cloudflare worker has been deprecated. The functionality has been integrated into the Ruspie API as an optional endpoint. This section provides information about the deprecated service for reference.</strong></p>
<h2 id="introduction-1"><a class="header" href="#introduction-1">Introduction</a></h2>
<p>The Text to SQL functionality allowed users to convert natural language text queries into valid SQL queries. It provided a convenient way to interact with your dataset using plain English queries, making it accessible to users who may not be familiar with SQL.</p>
<h2 id="how-it-worked"><a class="header" href="#how-it-worked">How it Worked</a></h2>
<p>The Text to SQL service was a standalone component that accepted a text query and other parameters as input. It returned a valid SQL query that could be executed against the dataset stored in Ruspie. This service was deployed as a cloudflare worker.</p>
<h2 id="deprecated-cloudflare-worker-endpoint"><a class="header" href="#deprecated-cloudflare-worker-endpoint">Deprecated Cloudflare Worker Endpoint</a></h2>
<p>The Text to SQL service was previously available as a cloudflare worker endpoint. Users could send a POST request to this endpoint, providing the necessary input parameters, and receive a valid SQL query as the response.</p>
<p><strong>Note: You can still refer the code to this service text-to-sql directory</strong></p>
<h2 id="transition-to-ruspie-api"><a class="header" href="#transition-to-ruspie-api">Transition to Ruspie API</a></h2>
<p>As of the latest version, the Text to SQL functionality has been deprecated as a cloudflare worker and is no longer maintained as a separate component. The functionality of translating text queries into SQL queries is now available as an optional endpoint within the Ruspie API. Users can use the <code>/text_to_sql</code> endpoint to achieve the same results.</p>
<h2 id="updated-workflow"><a class="header" href="#updated-workflow">Updated Workflow</a></h2>
<p>To enable the optional endpoint set the following envs:</p>
<ol>
<li>TEXT_TO_SQL: true</li>
<li>OPENAI_API_KEY: openai api key(get one from openai)</li>
</ol>
<h3 id="making-a-request-to-text_to_sql-endpoint"><a class="header" href="#making-a-request-to-text_to_sql-endpoint">Making a Request to <code>/text_to_sql</code> Endpoint</a></h3>
<p>The <code>/text_to_sql</code> endpoint in the Ruspie API allows you to convert natural language text queries into valid SQL queries. To make a request to this endpoint, follow these steps:</p>
<h3 id="1-set-the-http-method"><a class="header" href="#1-set-the-http-method">1. Set the HTTP Method</a></h3>
<p>Send a POST request to the <code>/text_to_sql</code> endpoint. This is because you are sending data to the server to process.</p>
<h3 id="2-set-the-request-headers"><a class="header" href="#2-set-the-request-headers">2. Set the Request Headers</a></h3>
<p>Ensure that your request includes the necessary headers:</p>
<ul>
<li><strong>Content-Type</strong>: Set this header to <code>application/json</code> to specify that the request body contains JSON data.</li>
</ul>
<h3 id="3-create-the-request-body"><a class="header" href="#3-create-the-request-body">3. Create the Request Body</a></h3>
<p>The request body should be in JSON format and include the following parameters:</p>
<ul>
<li>
<p><strong>query</strong>: A string parameter that represents the natural language text query you want to convert into SQL. Provide a clear and concise description of the data you want to retrieve from the dataset.</p>
</li>
<li>
<p><strong>tablename</strong>: A string parameter specifying the name of the dataset table on which you want to perform the query.</p>
</li>
<li>
<p><strong>schema</strong> (Optional): If the schema of the dataset is not already known or needs to be explicitly defined, you can include it as a string parameter. The schema should list the columns available in the dataset.</p>
</li>
<li>
<p><strong>rows</strong> (Optional): Intial rows of datasets.</p>
</li>
</ul>
<p>Here is an example of the request body in JSON format:</p>
<pre><code class="language-json">{
"query": "Retrieve the names of customers who made a purchase in the last month",
"tablename": "customer_purchases",
"schema": "customer_name, purchase_date, ...",
"rows": "10"
}
</code></pre>
<h3 id="4-send-the-request"><a class="header" href="#4-send-the-request">4. Send the Request</a></h3>
<p>Once you have set the HTTP method, headers, and created the request body, you can send the POST request to the <code>/text_to_sql</code> endpoint.</p>
<h3 id="5-receive-the-response"><a class="header" href="#5-receive-the-response">5. Receive the Response</a></h3>
<p>The server will process your request and respond with a JSON object that contains the valid SQL query. The response may also include additional information, such as a status indicating the success of the operation.</p>
<p>Here's an example of a response:</p>
<pre><code class="language-json">{
"query": "SELECT customer_name FROM customer_purchases WHERE purchase_date >= '2023-09-01'"
}
</code></pre>
<p>The <code>sql_query</code> field in the response contains the valid SQL query that can be used to retrieve the requested data from the dataset.</p>
<p>That's it! You've successfully made a request to the <code>/text_to_sql</code> endpoint and received a valid SQL query in response, allowing you to perform dataset queries using natural language text.</p>
<h1 id="studiooptional"><a class="header" href="#studiooptional">Studio(Optional)</a></h1>
<p>Simple web interface built with nextjs and golang enables to create organisations and projects to upload datasets and query using ruspie.</p>
<p><strong>Note: To run the server make sure to copy the envs from config.env.example in /studio/server/ directory into config.env file</strong></p>
<h2 id="environment-variables-for-server-component"><a class="header" href="#environment-variables-for-server-component">Environment Variables for server component</a></h2>
<ul>
<li>
<p><strong>SERVER_PORT</strong>: The port on which the Ruspie Studio Server will listen for incoming requests. The default is set to <code>8800</code>.</p>
</li>
<li>
<p><strong>DATABASE_HOST</strong>: The host name or IP address of the PostgreSQL database used by the server. In your setup, it's configured to <code>postgres</code>.</p>
</li>
<li>
<p><strong>DATABASE_PORT</strong>: The port on which the PostgreSQL database is running. It's set to the default <code>5432</code>.</p>
</li>
<li>
<p><strong>DATABASE_USERNAME</strong>: The username used to authenticate with the PostgreSQL database. In your configuration, it's set to <code>postgres</code>.</p>
</li>
<li>
<p><strong>DATABASE_PASSWORD</strong>: The password associated with the specified database username (<code>DATABASE_USERNAME</code>). In your setup, it's configured as <code>postgres</code>.</p>
</li>
<li>
<p><strong>DATABASE_NAME</strong>: The name of the PostgreSQL database where Ruspie Studio Server stores its data. It's set to <code>ruspie</code> in your configuration.</p>
</li>
<li>
<p><strong>DATABASE_SSLMODE</strong>: This variable specifies the SSL mode for the PostgreSQL database connection. In your setup, it's set to <code>disable</code>, which means SSL is not used.</p>
</li>
<li>
<p><strong>LOG_LEVEL</strong>: The log level for server logs. It's set to <code>debug</code> in your configuration, providing detailed log information.</p>
</li>
<li>
<p><strong>LOG_OUTPUT</strong>: The output destination for logs. In your setup, it's configured to <code>stdout</code>, which means logs are directed to the standard output.</p>
</li>
<li>
<p><strong>KAVACH_ENABLED</strong>: This variable enables or disables integration with Kavach, an internal authentication and authorization tool.</p>
</li>
</ul>
<h3 id="configuration-note"><a class="header" href="#configuration-note">Configuration Note</a></h3>
<p>To configure the Ruspie Studio Server, you should adjust these environment variables to match your specific setup. Ensure that the database connection details (<code>DATABASE_HOST</code>, <code>DATABASE_PORT</code>, <code>DATABASE_USERNAME</code>, <code>DATABASE_PASSWORD</code>, and <code>DATABASE_NAME</code>) are correctly set to connect to your PostgreSQL database.</p>
<p>Additionally, if you decide to enable Kavach integration, set <code>KAVACH_ENABLED</code> to <code>true</code> and configure the Kavach-related environment variables accordingly.</p>
<h2 id="environment-variables-for-frontend-component"><a class="header" href="#environment-variables-for-frontend-component">Environment Variables for frontend component</a></h2>
<ul>
<li>
<p><strong>NEXT_PUBLIC_SERVER_URL</strong>: This variable defines the URL for accessing the Ruspie Studio Server. This URL is used for making API requests to the server.</p>
</li>
<li>
<p><strong>NEXT_PUBLIC_COMPANION_URL</strong>: The Companion URL of companion service</p>
</li>
<li>
<p><strong>NEXT_PUBLIC_KAVACH_ENABLED</strong>: This variable enables or disables integration with Kavach, an internal authentication and authorization tool..</p>
</li>
<li>
<p><strong>NEXT_PUBLIC_BASEPATH</strong>: The Base Path is set to <code>/.factly/ruspie/web</code> in <code>KAVACH+STUDIO+RUSPIE</code> configurations. It defines the base URL path for accessing the Ruspie Studio web interface.</p>
</li>
<li>
<p><strong>NEXT_PUBLIC_S3_URL</strong>: This variable specifies the URL for accessing the S3 storage service.</p>
</li>
<li>
<p><strong>NEXT_PUBLIC_TEXT_TO_SQL</strong>: The <code>TEXT_TO_SQL</code> environment variable is referenced here. Depending on its value, Text to SQL functionality may be enabled or disabled in the frontend.</p>
</li>
<li>
<p><strong>NEXT_PUBLIC_KRATOS_URL</strong>: The Kratos URL .</p>
</li>
</ul>
<h1 id="configurations"><a class="header" href="#configurations">Configurations:</a></h1>
<p>You can find docker-compose files to all these configurations in /docker-compose-files directory</p>
<h2 id="ruspie--studio--kavach"><a class="header" href="#ruspie--studio--kavach">Ruspie + Studio + Kavach</a></h2>
<p>Ruspie + Studio + Kavach is a configuration of the Ruspie that combines the core Ruspie query engine with the Ruspie Studio web interface and the Kavach authentication and authorization tool. This version offers a environment for managing datasets, querying data, and controlling access.</p>
<h3 id="setup-and-configuration"><a class="header" href="#setup-and-configuration">Setup and Configuration</a></h3>
<p>To get started with Ruspie + Studio + Kavach, follow these steps:</p>
<ol>
<li>
<p><strong>Run Docker Compose</strong>: Navigate to the <code>/docker-compose-files/</code> directory and run the <code>docker-compose-studio-kavach.yml</code> file to start the integrated environment.</p>
</li>
<li>
<p><strong>Kavach Integration</strong>: Ensure that Kavach is configured with the required user roles and permissions for secure access control.</p>
</li>
<li>
<p><strong>Access Web Interface</strong>: Access Ruspie Studio via <code>http://127.0.0.1:4455/.factly/ruspie/web/home/organisations</code> for project and data management. Access the Ruspie API at <code>http://127.0.0.1:4455/.factly/ruspie/server</code> for querying datasets.</p>
</li>
</ol>
<h2 id="ruspie--studio"><a class="header" href="#ruspie--studio">Ruspie + Studio</a></h2>
<p>Ruspie + Studio is a configuration of the Ruspie that combines the core Ruspie query engine with the Ruspie Studio web interface. This version offers a environment for managing datasets, querying data without any authentication and very lightweight compared to the previos configuration.</p>
<h3 id="setup-and-configuration-1"><a class="header" href="#setup-and-configuration-1">Setup and Configuration</a></h3>
<p>To get started with Ruspie + Studio + Kavach, follow these steps:</p>
<ol>
<li>
<p><strong>Run Docker Compose</strong>: Navigate to the <code>/docker-compose-files/</code> directory and run the <code>docker-compose-studio.yml</code> file to start the integrated environment.</p>
</li>
<li>
<p><strong>Access Web Interface</strong>: Access Ruspie Studio via <code>http://127.0.0.1:3000/home/organisations</code> for project and data management. Access the Ruspie API at <code>http://127.0.0.1:8800/server</code> for querying datasets.</p>
</li>
</ol>
<p><strong>Note: If you need text to sql service in the above two setups, use the same envs as mentioned above to enable it on this setup as well</strong></p>
<h2 id="ruspie-barebones"><a class="header" href="#ruspie-barebones">Ruspie Barebones</a></h2>
<p>Ruspie is the core query engine of the Ruspie ecosystem, designed for data querying using SQL and various query operators.</p>
<h2 id="setup-and-configuration-2"><a class="header" href="#setup-and-configuration-2">Setup and Configuration</a></h2>
<p>To set up and configure Ruspie, follow these steps:</p>
<ol>
<li>
<p><strong>Environment Configuration</strong>: Configure the necessary environment variables for Ruspie. These variables control various aspects of Ruspie's behavior, including the source of data and authentication.</p>
</li>
<li>
<p><strong>Run Docker Compose</strong>: .Navigate to the <code>/docker-compose-files/</code> directory and run the <code>docker-compose.yml</code> file to start the integrated environment</p>
</li>
<li>
<p><strong>Dataset Management</strong>: Upload datasets to Minio for use with Ruspie.</p>
</li>
</ol>
<p>To Upload datasets to Minio goto http://127.0.0.9001 and use <code>MINIO_ACCESS_KEY</code>, <code>MINIO_SECRET_KEY</code> used the docker-compose.yml file as username and password. As per the config in docker-compose.yml, datasets uploaded to the ruspie bucker can only be queried.</p>
</main>
<nav class="nav-wrapper" aria-label="Page navigation">
<!-- Mobile navigation buttons -->
<a rel="next prefetch" href="intro.html" class="mobile-nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
<i class="fa fa-angle-right"></i>
</a>
<div style="clear: both"></div>
</nav>
</div>
</div>
<nav class="nav-wide-wrapper" aria-label="Page navigation">
<a rel="next prefetch" href="intro.html" class="nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
<i class="fa fa-angle-right"></i>
</a>
</nav>
</div>
<!-- Livereload script (if served using the cli tool) -->
<script>
const wsProtocol = location.protocol === 'https:' ? 'wss:' : 'ws:';
const wsAddress = wsProtocol + "//" + location.host + "/" + "__livereload";
const socket = new WebSocket(wsAddress);
socket.onmessage = function (event) {
if (event.data === "reload") {
socket.close();
location.reload();
}
};
window.onbeforeunload = function() {
socket.close();
}
</script>
<script>
window.playground_copyable = true;
</script>
<script src="elasticlunr.min.js"></script>
<script src="mark.min.js"></script>
<script src="searcher.js"></script>
<script src="clipboard.min.js"></script>
<script src="highlight.js"></script>
<script src="book.js"></script>
<!-- Custom JS scripts -->
</div>
</body>
</html>