Array measures

ActiveViam is optimized to handle array data

Loading arrays from CSV

ActiveViam can load array from CSV files. The separator for array elements must be provided to the read_csv method, and the CSV columns must use another separator. All the arrays should have the same length.

[1]:
import activeviam as av

session = av.create_session()
[2]:
store = session.read_csv(
    "data/arrays.csv", keys=["TradeId"], store_name="Store With Arrays", array_sep=";"
)
store.head()
[2]:
Continent Country City PnL PnlArray
TradeId
0 Europe France Paris 3.469516 doubleVector[10]{-0.46511920664962714, ...}
1 Europe France Paris 92.851919 doubleVector[10]{10.587927935842618, ...}
2 Europe France Lyon 425.866214 doubleVector[10]{68.43655816283167, ...}
3 Europe France Bordeaux 454.127060 doubleVector[10]{-149.61978026139195, ...}
4 Europe UK London 187.015613 doubleVector[10]{11.449089651224922, ...}
[3]:
cube = session.create_cube(store, "Cube")

Arrays default aggregations

ActiveViam provides default aggregation functions on arrays : the SUM and the AVG. They are applied element by element on the array.

[4]:
lvl = cube.levels
m = cube.measures
cube.query(m["PnlArray.SUM"], levels=lvl["Continent"])
[4]:
PnlArray.SUM
Continent
Asia [-14.696414629727794, -35.38205147348333, -62....
Europe [6.711474315042935, -1.4346671713308226, -72.3...
[5]:
cube.query(m["PnlArray.AVG"], levels=lvl["Continent"])
[5]:
PnlArray.AVG
Continent
Asia [-2.939282925945559, -7.076410294696666, -12.5...
Europe [1.1185790525071557, -0.23911119522180374, -12...

Additional array functions

Sum, Average, Min or Max of all the array elements

[6]:
m["sum vect"] = av.array.sum(m["PnlArray.SUM"])
cube.query(m["sum vect"], levels=lvl["Continent"])
[6]:
sum vect
Continent
Asia 357.870873
Europe 49.101546
[7]:
m["avg vect"] = av.array.avg(m["PnlArray.SUM"])
cube.query(m["avg vect"], levels=lvl["Continent"])
[7]:
avg vect
Continent
Asia 35.787087
Europe 4.910155
[8]:
m["min vect"] = av.array.min(m["PnlArray.SUM"])
cube.query(m["min vect"], levels=lvl["Continent"])
[8]:
min vect
Continent
Asia -222.821477
Europe -86.079550
[9]:
m["max vect"] = av.array.max(m["PnlArray.SUM"])
cube.query(m["max vect"], levels=lvl["Continent"])
[9]:
max vect
Continent
Asia 232.607605
Europe 195.824737

Length

[10]:
m["length"] = av.array.len(m["PnlArray.SUM"])
cube.query(m["length"])
[10]:
length
0 10

Variance

[11]:
m["variance"] = av.array.variance(m["PnlArray.SUM"])
cube.query(m["variance"], levels=lvl["Continent"])
[11]:
variance
Continent
Asia 16149.335147
Europe 6952.275020

Sort

[12]:
m["sort"] = av.array.sort(m["PnlArray.SUM"])
cube.query(m["sort"], levels=lvl["Continent"])
[12]:
sort
Continent
Asia [-222.82147745688468, -62.89214785725346, -35....
Europe [-86.07955036608166, -72.30864853533133, -71.1...

Percentile

[13]:
m["95 percentile"] = av.array.percentile(m["PnlArray.SUM"], 0.95, "simple")
m["95 exc percentile"] = av.array.percentile(m["PnlArray.SUM"], 0.95, "exc")
m["95 inc percentile"] = av.array.percentile(m["PnlArray.SUM"], 0.95, "inc")
m["95 centered percentile"] = av.array.percentile(m["PnlArray.SUM"], 0.95, "centered")
cube.query(
    m["95 percentile"],
    m["95 exc percentile"],
    m["95 inc percentile"],
    m["95 centered percentile"],
    levels=[lvl["Continent"], lvl["Country"]],
)
[13]:
95 percentile 95 exc percentile 95 inc percentile 95 centered percentile
Continent Country
Asia China 195.220094 256.014079 201.299492 256.014079
India 22.352927 25.270438 22.644678 25.270438
Europe France 128.173379 163.731662 131.729208 163.731662
UK 112.243790 146.715692 115.690980 146.715692
[14]:
m["95 linear"] = av.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "linear")
m["95 lower"] = av.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "lower")
m["95 higher"] = av.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "higher")
m["95 nearest"] = av.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "nearest")
m["95 midpoint"] = av.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "midpoint")
cube.query(
    m["95 linear"], m["95 lower"], m["95 higher"], m["95 nearest"], m["95 midpoint"]
)
[14]:
95 linear 95 lower 95 higher 95 nearest 95 midpoint
0 299.288892 290.080802 306.822784 306.822784 298.451793

n greatest / n lowest

Returns an array with the n greatest/lowest values of a another array.

[15]:
m["Top 3"] = av.array.n_greatest(m["PnlArray.SUM"], 3)
cube.query(m["Top 3"])
[15]:
Top 3
0 [290.0808017776343, 306.8227843309065, 195.349...
[16]:
m["Bottom 2"] = av.array.n_lowest(m["PnlArray.SUM"], 2)
cube.query(m["Bottom 2"])
[16]:
Bottom 2
0 [-135.20079639258478, -308.90102782296634]

nth greatest value / nth lowest value

Returns nth greatest or lowest value of a vector

[17]:
m["Third largest value"] = av.array.nth_greatest(m["PnlArray.SUM"], 3)
cube.query(m["Third largest value"])
[17]:
Third largest value
0 195.349055
[18]:
m["Second smallest value"] = av.array.nth_lowest(m["PnlArray.SUM"], 2)
cube.query(m["Second smallest value"])
[18]:
Second smallest value
0 -135.200796

Scale

[19]:
m["scale x10"] = m["PnlArray.SUM"] * 10.0
cube.query(m["scale x10"])
[19]:
scale x10
0 [-79.8494031468486, -368.16718644814154, -1352...

Element at index

Extract the element at a given index

[20]:
m["first element"] = m["PnlArray.SUM"][0]
cube.query(m["first element"], m["PnlArray.SUM"])
[20]:
first element PnlArray.SUM
0 -7.98494 [-7.984940314684859, -36.81671864481415, -135....

With the create_parameter_hierarchy function it is possible to create a hierarchy corresponding to the indices of the array. This hierarchy can then be used to “slice” this array and create a measure which depends on the selected index.

[21]:
cube.hierarchies.create_parameter_hierarchy(
    [i for i in range(0, 10)], level_name="index"
)
m["PnL at index"] = m["PnlArray.SUM"][lvl["index"]]
cube.query(m["PnL at index"], levels=lvl["index"])
[21]:
PnL at index
index
0 -7.984940
1 -36.816719
2 -135.200796
3 30.371913
4 195.349055
5 306.822784
6 96.072594
7 -308.901028
8 290.080802
9 -22.821246

Sub-arrays

Extract a slice of the array

[22]:
m["first 2 elements"] = m["PnlArray.SUM"][0:2]
cube.query(m["first 2 elements"], m["PnlArray.SUM"])
[22]:
first 2 elements PnlArray.SUM
0 [-7.984940314684859, -36.81671864481415] [-7.984940314684859, -36.81671864481415, -135....