Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIGTOP-3908 Upgrade Spark Packages for PySpark Requres Python3 #1087

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vivostar
Copy link
Contributor

@vivostar vivostar commented Feb 12, 2023

Description of PR

Upgade spark rpm packages for pyspark requires python3, according to the spark doc.

How was this patch tested?

./docker-hadoop.sh \
       -d \
       -dcp\
       --create 1 \
       --image bigtop/puppet:trunk-rockylinux-8 \
       --memory 8g \
       -L \
       --repo file:///bigtop-home/output \
       --disable-gpg-check \
       --stack hdfs,yarn,mapreduce,spark,hive
[root@dockert docker]# ./docker-hadoop.sh -dcp -e 1 /bin/bash
[root@02f673194720 /]# pyspark
    ...
>>> from datetime import datetime, date
>>> from pyspark.sql import Row
>>> 
>>> df = spark.createDataFrame([
...     Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
...     Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
...     Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
... ])
>>> df
DataFrame[a: bigint, b: double, c: string, d: date, e: timestamp]
>>> df.show()
+---+---+-------+----------+-------------------+                                
|  a|  b|      c|         d|                  e|
+---+---+-------+----------+-------------------+
|  1|2.0|string1|2000-01-01|2000-01-01 12:00:00|
|  2|3.0|string2|2000-02-01|2000-01-02 12:00:00|
|  4|5.0|string3|2000-03-01|2000-01-03 12:00:00|
+---+---+-------+----------+-------------------+

>>> 
  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'BIGTOP-3638. Your PR title ...')?
  • Make sure that newly added files do not have any licensing issues. When in doubt refer to https://www.apache.org/licenses/

%else
Requires: %{spark_pkg_name}-core = %{version}-%{release}, python
%endif
Requires: %{spark_pkg_name}-core = %{version}-%{release}, python36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, bigtop uses Spark 3.2, but since Spark 3.3, the minimal python version is 3.7, so can we use a higher version to make it more future-proofing?
https://spark.apache.org/docs/3.3.2/api/python/getting_started/install.html

Copy link
Contributor Author

@vivostar vivostar Mar 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your suggestions, I think we can use a higher python3 version.

@kevinw66
Copy link
Contributor

PySpark stucked in CentOS7.

[root@d84cd50ac3f6 /]# pyspark
Python 3.6.8 (default, Nov 16 2020, 16:55:22) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/06/15 13:34:41 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
23/06/15 13:34:42 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
### stucked here

BTW, could you also upgrade python for Spark deb packages?

@vivostar vivostar changed the title BIGTOP-3908 Upgrade Spark Rpm Packages for PySpark Requres Python3 BIGTOP-3908 Upgrade Spark Packages for PySpark Requres Python3 Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants