Working with other libraries

Working with Spark

When working with Spark you need to configure Java correctly as Spark only works with Java 8 while activeviam works with Java 11.

Setup Java version

To work with both activeviam and Spark you will need to have both Java 11 and 8 installed on your system, and make sure that each library uses the correct version.

As Java 8 will soon be deprecated we recommend using Java 11 as your default Java installation. Below are two ways to provide the required Java version to spark.

Setup JAVA_HOME directly inside Python

This is not an elegant way of doing it, but it is the easiest: modify the Java version in the environment when starting the Spark session.

import os
# First modify the env to point to java 8
previous_java_home = os.environ["JAVA_HOME"]
os.environ["JAVA_HOME"] = "path/to/java8"

# Start the Spark session
spark = SparkSession.builder.appName("Demo").getOrCreate()

# Set the env variable back to initial value
os.environ["JAVA_HOME"] = previous_java_home

Using standalone Spark

Pyspark’s main purpose is to connect to another Spark instance. One solution is to install a standalone Spark, configure it and then use it from pyspark :

  • Install Spark standalone and pyspark (same version)

  • Set your SPARK_HOME env variable to your Spark standalone version (pyspark will now use it)

  • In your $SPARK_HOME/conf/spark-env.sh set JAVA_HOME=/path/to/java8