Skip to content

Commit 0cf6d19

Browse files
authored
330 improve spark integration with notebook (#331)
* Add Spark session creation button to NotebookToolbar component * Refactor hover styles in NotebookToolbar component to use CSS-in-JS approach for SparkIcon, improving code readability and maintainability. * Enhance NotebookToolbar component by replacing SparkIcon with VscFlame icon and updating hover styles to use inline event handlers, improving user interaction and visual consistency. * Add Spark session creation functionality and update Spark app configuration handling - Implemented a new endpoint in `spark_app.py` for creating Spark sessions via a POST request. - Enhanced `Notebook.js` to include a method for initiating Spark session creation and handling success/error responses. - Updated `SparkModel.js` to fetch Spark configuration and create a Spark session, improving integration with the backend API. - Added state management for Spark configuration in the Notebook component. * Integrate NotebookToolbar into Notebook component and enhance Spark session creation logging - Added NotebookToolbar component to the Notebook, providing controls for running all cells, saving, deleting, and creating Spark sessions. - Improved logging in the createSparkSession method to provide better visibility into the session creation process, including notebook path and configuration responses. - Updated error handling to maintain user feedback during Spark session creation. * Refactor import path for NotebookToolbar in Notebook component - Updated the import statement for NotebookToolbar to reflect its new location in the 'content' directory, improving project structure and organization. * Refactor Notebook component by removing Spark session creation logic and NotebookToolbar integration - Eliminated the handleCreateSparkSession function to streamline the Notebook component. - Removed NotebookToolbar component from the Notebook, simplifying the UI and focusing on core functionalities. - Updated runAllCells prop to directly reference the runAllCells function, enhancing code clarity. * Add runAllCells function to Notebook component for executing all cells * Enhance Notebook component by integrating createSparkSession prop - Added createSparkSession prop to both Notebook and Code components, enabling Spark session creation functionality. - Updated the Code component to pass createSparkSession to the relevant child components, ensuring consistent functionality across the notebook interface. * Add handleCreateSparkSession function to Notebook component for improved Spark session management * Refactor Spark session creation in SparkModel.js to generate unique Spark app IDs and streamline API requests - Updated the createSparkSession method to generate a unique Spark app ID using the current timestamp. - Modified the API request to use the new Spark app ID for session creation, enhancing session management. - Improved error handling and logging for better visibility during the Spark session creation process. * Enhance demo notebook with Spark session initialization and update Notebook component cell handling - Added a new code cell in the demo notebook to create a Spark session with detailed configuration settings. - Updated the Notebook component to append new cells at the bottom instead of the top, improving user experience. - Adjusted the execution logic to correctly handle the index of newly added cells during execution. * Enhance demo notebook and Cell component for Spark output handling - Added a markdown cell to the demo notebook for better documentation. - Updated the last execution time and execution count in the notebook. - Refactored the Cell component to include a new renderOutput function for improved handling of Spark output types, specifically integrating SparkOutputBox for Spark session information display. - Streamlined output rendering logic to enhance code clarity and maintainability. * Add SparkOutputBox component for rendering Spark output in notebooks * Refactor SparkOutputBox styling and simplify component structure - Updated the StyledBox component to use theme-based styling for better responsiveness. - Changed the CSS selectors to target MUI Card components instead of the previous output area. - Streamlined the return statement in SparkOutputBox for improved readability. * Revert "Add SparkOutputBox component for rendering Spark output in notebooks" This reverts commit 389bdac. * Revert "Enhance demo notebook and Cell component for Spark output handling" This reverts commit 0a10aba.
1 parent 5f389e6 commit 0cf6d19

File tree

6 files changed

+235
-16
lines changed

6 files changed

+235
-16
lines changed

examples/user_0@gmail.com/demo.ipynb

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,70 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "code",
5+
"isExecuted": true,
6+
"lastExecutionResult": "success",
7+
"lastExecutionTime": "2024-12-10 03:27:50",
8+
"metadata": {},
9+
"outputs": [
10+
{
11+
"data": {
12+
"text/html": [
13+
"\n",
14+
" <div>\n",
15+
" <p><b>SparkSession - in-memory</b></p>\n",
16+
" \n",
17+
" <div>\n",
18+
" <p><b>SparkContext</b></p>\n",
19+
"\n",
20+
" <p><a href=\"http://8e207d700c27:4040\">Spark UI</a></p>\n",
21+
"\n",
22+
" <dl>\n",
23+
" <dt>Version</dt>\n",
24+
" <dd><code>v3.5.0</code></dd>\n",
25+
" <dt>Master</dt>\n",
26+
" <dd><code>spark://spark-master:7077</code></dd>\n",
27+
" <dt>AppName</dt>\n",
28+
" <dd><code>spark-1733801092185</code></dd>\n",
29+
" </dl>\n",
30+
" </div>\n",
31+
" \n",
32+
" </div>\n",
33+
" "
34+
],
35+
"text/plain": [
36+
"<pyspark.sql.session.SparkSession at 0x7fffe2dc9ed0>"
37+
]
38+
},
39+
"execution_count": 7,
40+
"metadata": {},
41+
"output_type": "execute_result"
42+
}
43+
],
44+
"source": [
45+
"\n",
46+
"from pyspark.sql import SparkSession\n",
47+
"\n",
48+
"spark = SparkSession.builder\\\n",
49+
" .appName(\"spark-1733801270245\")\\\n",
50+
" .master(\"spark://spark-master:7077\")\\\n",
51+
" .config(\"spark.jars.packages\", \"io.delta:delta-spark_2.12:3.0.0\")\\\n",
52+
" .config(\"spark.sql.extensions\", \"io.delta.sql.DeltaSparkSessionExtension\")\\\n",
53+
" .config(\"spark.sql.catalog.spark_catalog\", \"org.apache.spark.sql.delta.catalog.DeltaCatalog\")\\\n",
54+
" .config(\"spark.eventLog.enabled\", \"true\")\\\n",
55+
" .config(\"spark.eventLog.dir\", \"/opt/data/spark-events\")\\\n",
56+
" .config(\"spark.history.fs.logDirectory\", \"/opt/data/spark-events\")\\\n",
57+
" .config(\"spark.sql.warehouse.dir\", \"/opt/data/spark-warehouse\")\\\n",
58+
" .config(\"spark.executor.memory\", \"1g\")\\\n",
59+
" .config(\"spark.executor.cores\", 1)\\\n",
60+
" .config(\"spark.executor.instances\", 1)\\\n",
61+
" .config(\"spark.driver.memory\", \"1g\")\\\n",
62+
" .config(\"spark.driver.cores\", 1)\\\n",
63+
" .getOrCreate()\n",
64+
"\n",
65+
"spark\n"
66+
]
67+
},
368
{
469
"cell_type": "markdown",
570
"isExecuted": true,
@@ -14,7 +79,7 @@
1479
{
1580
"cell_type": "code",
1681
"execution_count": null,
17-
"isExecuted": true,
82+
"isExecuted": false,
1883
"lastExecutionResult": "success",
1984
"lastExecutionTime": "2024-08-04 15:29:17",
2085
"metadata": {},

server/app/routes/spark_app.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,22 @@ def get_spark_app_config(notbook_path):
2525
def update_spark_app_config(notbook_path):
2626
logging.info(f"Updating spark app config for notebook path: {notbook_path}")
2727
data = request.get_json()
28-
return SparkApp.update_spark_app_config_by_notebook_path(notbook_path, data)
28+
return SparkApp.update_spark_app_config_by_notebook_path(notbook_path, data)
29+
30+
@spark_app_blueprint.route('/spark_app/session', methods=['POST'])
31+
def create_spark_session():
32+
data = request.get_json()
33+
notebook_path = data.get('notebookPath')
34+
spark_config = data.get('config')
35+
36+
try:
37+
spark_app_id = SparkApp.create_spark_session(notebook_path, spark_config)
38+
return jsonify({
39+
'status': 'success',
40+
'sparkAppId': spark_app_id
41+
})
42+
except Exception as e:
43+
return jsonify({
44+
'status': 'error',
45+
'message': str(e)
46+
}), 500

webapp/src/components/notebook/Notebook.js

Lines changed: 55 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ function Notebook({
3636
const [cellStatuses, setCellStatuses] = useState(notebookState.content ? notebookState.content.cells.map(() => CellStatus.IDLE) : []);
3737
const [cellExecutedStatuses, setCellExecutedStatuses] = useState(notebookState.content ? notebookState.content.cells.map(cell => cell.cell_type === 'markdown') : []);
3838

39+
const [sparkConfig, setSparkConfig] = useState(null);
40+
3941
const setCellStatus = (index, status) => {
4042
setCellStatuses(prevStatuses => {
4143
const newStatuses = [...prevStatuses];
@@ -265,6 +267,57 @@ function Notebook({
265267
}
266268
};
267269

270+
const runAllCells = async () => {
271+
console.log('Running all cells');
272+
try {
273+
await NotebookModel.runAllCells(
274+
jupyterBaseUrl,
275+
notebookState,
276+
kernelId,
277+
setKernelId,
278+
cellStatuses,
279+
setCellStatus,
280+
cellExecutedStatuses,
281+
setCellExecutedStatus
282+
);
283+
} catch (error) {
284+
console.error('Failed to run all cells:', error);
285+
}
286+
};
287+
288+
const handleCreateSparkSession = async () => {
289+
console.log('Create Spark session clicked');
290+
try {
291+
const { sparkAppId, initializationCode } = await SparkModel.createSparkSession(notebookState.path);
292+
293+
// Create a new cell with the initialization code
294+
const newCell = {
295+
cell_type: 'code',
296+
source: initializationCode,
297+
metadata: {},
298+
outputs: []
299+
};
300+
301+
// Add the cell to the bottom of the notebook
302+
const cells = [...notebookState.content.cells];
303+
cells.push(newCell);
304+
setNotebookState({
305+
...notebookState,
306+
content: { ...notebookState.content, cells }
307+
});
308+
309+
// Execute the cell (now need to use the last index)
310+
const newCellIndex = cells.length - 1;
311+
await handleRunCodeCell(newCell, CellStatus.IDLE, (status) => setCellStatus(newCellIndex, status));
312+
313+
console.log('Spark session created with ID:', sparkAppId);
314+
setSparkAppId(sparkAppId);
315+
} catch (error) {
316+
console.error('Failed to create Spark session:', error);
317+
alert('Failed to create Spark session. Please check the configuration.');
318+
}
319+
};
320+
268321
return (
269322
<div>
270323
{showNotebook && (
@@ -310,18 +363,10 @@ function Notebook({
310363
handleCreateCell={handleCreateCell}
311364
kernelId={kernelId}
312365
setKernelId={setKernelId}
313-
runAllCells={
314-
() => NotebookModel.runAllCells(
315-
jupyterBaseUrl,
316-
notebookState,
317-
kernelId,
318-
setKernelId,
319-
cellStatuses,
320-
setCellStatus,
321-
cellExecutedStatuses,
322-
setCellExecutedStatus)}
366+
runAllCells={runAllCells}
323367
saveNotebook={handleUpdateNotebook}
324368
deleteNotebook={handleDeleteNotebook}
369+
createSparkSession={handleCreateSparkSession}
325370
/> : contentType === ContentType.Config ?
326371
<Config
327372
notebook={notebook}

webapp/src/components/notebook/content/Code.js

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@ const Code = ({
2222
setKernelId,
2323
runAllCells,
2424
saveNotebook,
25-
deleteNotebook
25+
deleteNotebook,
26+
createSparkSession
2627
}) => {
2728

2829
const jupyterBaseUrl= `${config.jupyterBaseUrl}`
@@ -35,7 +36,8 @@ const Code = ({
3536
notebook={notebook}
3637
runAllCells={runAllCells}
3738
saveNotebook={saveNotebook}
38-
deleteNotebook={deleteNotebook}/>
39+
deleteNotebook={deleteNotebook}
40+
createSparkSession={createSparkSession}/>
3941

4042
<Box sx={{
4143
width: '95%',

webapp/src/components/notebook/content/NotebookToolbar.js

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import { VscSave, VscRunAll, VscTrash } from "react-icons/vsc";
1+
import { VscSave, VscRunAll, VscTrash, VscFlame } from "react-icons/vsc";
22
import Tooltip from '@mui/material/Tooltip';
33
import { Card, IconButton } from '@mui/material';
44
import MoveButton from "../header/move/MoveButton";
@@ -7,7 +7,8 @@ const NotebookToolbar = ({
77
notebook,
88
runAllCells,
99
saveNotebook,
10-
deleteNotebook
10+
deleteNotebook,
11+
createSparkSession
1112
}) => {
1213
const headerIconSize = 13;
1314

@@ -73,6 +74,27 @@ const NotebookToolbar = ({
7374
notebook={notebook}
7475
headerIconSize={headerIconSize}/>
7576

77+
{/* New Spark Button */}
78+
<Tooltip title="Create Spark Session">
79+
<IconButton
80+
disableRipple
81+
onClick={createSparkSession}
82+
aria-label="spark"
83+
sx={{
84+
width: 'auto',
85+
mt: 0.5 }}>
86+
<VscFlame
87+
size={headerIconSize}
88+
onMouseEnter={(e) => {
89+
e.currentTarget.style.color = 'black';
90+
}}
91+
onMouseLeave={(e) => {
92+
e.currentTarget.style.color = 'black';
93+
}}
94+
style={{ color: 'black' }}/>
95+
</IconButton>
96+
</Tooltip>
97+
7698
{/* Delete Button */}
7799
<Tooltip title="Delete Notebook">
78100
<IconButton

webapp/src/models/SparkModel.js

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,73 @@ class SparkModel {
4444
return applicationId;
4545
}
4646

47+
static async createSparkSession(notebookPath) {
48+
try {
49+
console.log('Creating Spark session for notebook:', notebookPath);
50+
// First get the config for this notebook
51+
const configResponse = await fetch(`${config.serverBaseUrl}/spark_app/${notebookPath}/config`);
52+
console.log('Config response:', configResponse);
53+
54+
if (!configResponse.ok) {
55+
throw new Error('Failed to fetch Spark configuration');
56+
}
57+
const sparkConfig = await configResponse.json();
58+
console.log('Spark config:', sparkConfig);
59+
60+
// Generate a unique spark app ID
61+
const sparkAppId = `spark-${Date.now()}`;
62+
63+
// Create a cell with Spark initialization code that uses the config
64+
const sparkInitCode = `
65+
from pyspark.sql import SparkSession
66+
67+
spark = SparkSession.builder\\
68+
.appName("${sparkAppId}")\\
69+
.master("spark://spark-master:7077")\\
70+
.config("spark.jars.packages", "io.delta:delta-spark_2.12:3.0.0")\\
71+
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")\\
72+
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")\\
73+
.config("spark.eventLog.enabled", "true")\\
74+
.config("spark.eventLog.dir", "/opt/data/spark-events")\\
75+
.config("spark.history.fs.logDirectory", "/opt/data/spark-events")\\
76+
.config("spark.sql.warehouse.dir", "/opt/data/spark-warehouse")\\
77+
.config("spark.executor.memory", "${sparkConfig['spark.executor.memory']}")\\
78+
.config("spark.executor.cores", ${sparkConfig['spark.executor.cores']})\\
79+
.config("spark.executor.instances", ${sparkConfig['spark.executor.instances']})\\
80+
.config("spark.driver.memory", "${sparkConfig['spark.driver.memory']}")\\
81+
.config("spark.driver.cores", ${sparkConfig['spark.driver.cores']})\\
82+
.getOrCreate()
83+
84+
spark
85+
`;
86+
87+
// Create the Spark session with this config
88+
const response = await fetch(`${config.serverBaseUrl}/spark_app/${sparkAppId}`, {
89+
method: 'POST',
90+
headers: {
91+
'Content-Type': 'application/json',
92+
},
93+
body: JSON.stringify({
94+
notebookPath: notebookPath,
95+
sparkInitCode: sparkInitCode
96+
}),
97+
});
98+
99+
if (!response.ok) {
100+
throw new Error('Failed to create Spark session');
101+
}
102+
103+
const data = await response.json();
104+
return {
105+
sparkAppId: sparkAppId,
106+
initializationCode: sparkInitCode
107+
};
108+
} catch (error) {
109+
console.error('Error creating Spark session:', error);
110+
throw error;
111+
}
112+
}
113+
47114
}
48115

49116
export default SparkModel;

0 commit comments

Comments
 (0)