Working with other libraries¶
Working with Spark¶
There are two configuration tweaks required to work with Spark :
Set up Java correctly as Spark only works with Java 8 while activeviam works with Java 11.
Use the correct version of py4j as pyspark uses version 0.10.7 while activeviam uses 0.10.8.1
Setup Java version¶
To work with both activeviam and Spark you will need to have both Java 11 and 8 installed on your system, and make sure that each library uses the correct version.
As Java 8 will soon be deprecated we recommend using Java 11 as your default Java installation. Below are two ways to provide the required Java version to spark.
Setup JAVA_HOME directly inside Python¶
This is not an elegant way of doing it, but it is the easiest: modify the Java version in the environment when starting the Spark session.
import os # First modify the env to point to java 8 previous_java_home = os.environ["JAVA_HOME"] os.environ["JAVA_HOME"] = "path/to/java8" # Start the Spark session spark = SparkSession.builder.appName("Demo").getOrCreate() # Set the env variable back to initial value os.environ["JAVA_HOME"] = previous_java_home
Using standalone Spark¶
Pyspark’s main purpose is to connect to another Spark instance. One solution is to install a standalone Spark, configure it and then use it from pyspark :
Install Spark standalone and pyspark (same version)
Set your SPARK_HOME env variable to your Spark standalone version (pyspark will now use it)
Setup py4j version¶
py4j is a connector between Python and Java used by both activeviam and Spark
We recommend using the latest version of py4j, even if Spark requires 0.10.7. There is no conflict and the latest version has better support for newest Python versions.
For instance with conda you can do
conda install py4j==0.10.8.1