Working with other libraries¶
Working with Spark¶
When working with Spark you need to configure Java correctly as Spark only works with Java 8 while activeviam works with Java 11.
Setup Java version¶
To work with both activeviam and Spark you will need to have both Java 11 and 8 installed on your system, and make sure that each library uses the correct version.
As Java 8 will soon be deprecated we recommend using Java 11 as your default Java installation. Below are two ways to provide the required Java version to spark.
Setup JAVA_HOME directly inside Python¶
This is not an elegant way of doing it, but it is the easiest: modify the Java version in the environment when starting the Spark session.
import os # First modify the env to point to java 8 previous_java_home = os.environ["JAVA_HOME"] os.environ["JAVA_HOME"] = "path/to/java8" # Start the Spark session spark = SparkSession.builder.appName("Demo").getOrCreate() # Set the env variable back to initial value os.environ["JAVA_HOME"] = previous_java_home
Using standalone Spark¶
Pyspark’s main purpose is to connect to another Spark instance. One solution is to install a standalone Spark, configure it and then use it from pyspark :
Install Spark standalone and pyspark (same version)
Set your SPARK_HOME env variable to your Spark standalone version (pyspark will now use it)