Getting Started

This is a short tutorial to get started with ActiveViam and learn the very basics. We encourage you to copy this notebook in your workspace, replay it and even modify it.

Session and imports

A session holds a connection to a Java subprocess where all the data is stored and the computation takes place.

Start by importing the ActiveViam library and creating your session:

[1]:
import activeviam as av

session = av.create_session()

Data sources and stores

Several data sources (CSV, Parquet, pandas…) are available to load data into ActiveViam stores. They are all described in the “Data sources” part of the tutorial, this Getting started will only use CSV files.

When loading a source into a store you must specifiy one or more key columns.

It’s advised to do data cleaning before loading the data into a store (for example in pandas or directly in your CSV file).

[2]:
first_store = session.read_csv(
    "data/example.csv", keys=["ID"], store_name="First store"
)

You can retrieve the first rows of the store with head:

[3]:
first_store.head()
[3]:
Date Continent Country City Color Quantity Price
ID
1 2019-01-01 Europe France Paris red 1000.0 500.0
2 2019-01-02 Europe France Lyon red 2000.0 400.0
3 2019-01-05 Europe France Paris blue 3000.0 420.0
4 2018-01-01 Europe France Bordeaux blue 1500.0 480.0
5 2019-01-01 Europe UK London green 3000.0 460.0

You can view a store’s columns with columns

[4]:
first_store.columns
[4]:
0           ID
1         Date
2    Continent
3      Country
4         City
5        Color
6     Quantity
7        Price
dtype: object

References

A reference is a link between 2 stores.

You can specify the column mapping of the reference or use the default one which are the columns with the same name in your stores:

[5]:
capitals_store = session.read_csv(
    "data/capitals.csv", keys=["Country name"], store_name="Capitals"
)
[6]:
capitals_store.head()
[6]:
Capital
Country name
France Paris
UK London
China Beijing
India Dehli
[7]:
first_store.join(capitals_store, mapping={"Country": "Country name"})

Cubes

A cube can be defined on a store.

All the non-numerical columns of the store and the referenced stores will be converted to single level dimensions. Default measures will be created from numerical columns.

[8]:
cube = session.create_cube(first_store, "FirstCube")
[9]:
m = cube.measures
cube.query()
[9]:
Price.AVG Price.SUM Quantity.AVG Quantity.SUM contributors.COUNT
0 428.0 4280.0 2270.0 22700.0 10
[10]:
lvl = cube.levels
h = cube.hierarchies
h
[10]:
  • Dimensions
    • Hierarchies
      • Continent
        1. Continent
      • Color
        1. Color
      • Country
        1. Country
      • Capital
        1. Capital
      • City
        1. City
      • ID
        1. ID
      • Date
        1. Date

Measures

New custom measures can be added to your cube.

You will learn more about what is possible in the measure tutorial.

[11]:
m["Half quantity"] = av.agg.sum(first_store["Quantity"]) / 2
cube.query(m["Half quantity"])
[11]:
Half quantity
0 11350.0

Interactive visualization

There are 2 ways to do interactive dataviz: with the JupyterLab extension and with the ActiveViam application.

JupyterLab extension

When our JupyterLab extension is installed, you can build interactive widget right there in your notebook:

cube.visualize()

ActiveViam Application

There is also a full dashboarding suite available. Its URL can be retrieved from the session:

[12]:
session.url