Earlier this year, Square released a Javascript library called Crossfilter. Crossfilter is like a client-side OLAP server, quickly grouping, filtering, and aggregating tens or hundreds of thousands of rows of raw data very, very quickly. Crossfire is intended to be combined with a graphing or charting library like D3, Highcharts, or the Google Visualization API; it doesn’t have a UI of its own.
If you have experience with OLAP or multi-dimensional processing, you should be able to ramp up on Crossfilter fairly quickly. If you only have experience with relational databases, it may take a little longer. If you’ve never used the SQL group-by feature, then you face a steep learning curve.
First, you’ll need to understand facts
, dimensions
, and measures
. (If
you’re already familiar with these terms, then skip this section.)
Imagine you want to answer the question “How many orders do we process per week?”
You could calculate this by hand by iterating through all of the orders that
your business had processed, grouping them into weeks. In this case, each
order entry would be called a fact
, and you would probably store this in an
OrderFacts
table. The week would be a dimension
; it is a way you want to
slice the data. And the count of orders would be a measure
, it is a value
that you want to calculate.
Imagine another question, “How much revenue do we book per salesperson per
week?” Again, your facts
would be stored in an OrderFacts
table. You would
now have two dimensions
, salesperson and week. And finally, your measure
is dollars per order.
Below, we’re going to answer some questions like “How many living things live in my house?” and “How many legs of each type exist in my house?”
It’s incredibly easy to get your fact data into Crossfilter: just use JSON. Each row is a fact.
Below, we’ve created a Crossfilter object loaded with facts about the living things in my house.
(Note: These are, for the most part, “fictional” facts. I don’t actually have (any pets, but it makes for a good tutorial.)
var livingThings = crossfilter([
// Fact data.
{ name: “Rusty”, type: “human”, legs: 2 },
{ name: “Alex”, type: “human”, legs: 2 },
{ name: “Lassie”, type: “dog”, legs: 4 },
{ name: “Spot”, type: “dog”, legs: 4 },
{ name: “Polly”, type: “bird”, legs: 2 },
{ name: “Fiona”, type: “plant”, legs: 0 }
]);
That’s it. Now let’s find out some totals. For example, how many living things are in my house?
To do this, we’ll call the groupAll
convenience function, which selects all
records into a single group, and then the reduceCount
function, which
creates a count of the records. Not very useful so far.
// How many living things are in my house?
var n = livingThings.groupAll().reduceCount().value();
console.log(“There are ” + n + “ living things in my house.”) // 6
Now let’s get a count of all the legs in my house. Again, we’ll use the
groupAll
function to get all records in a single group, but then we call the
reduceSum
function. This is going to sum values together. What values? Well,
we want legs, so let’s pass a function that extracts and returns the number of
legs from the fact.
// How many total legs are in my house?
var legs = livingThings.groupAll().reduceSum(function(fact) { return fact.legs; }).value()
console.log(“There are ” + legs + “ legs in my house.”) // 14
Now let’s test out some of the filtering functionality.
I want to know how many living things in my house are dogs, and how many legs
they have. For this, we’ll need a dimension
. Remember that a dimension is
something you want to group or filter by. Here, the dimension is going to be
the type
. Crossfilter can filter on dimensions in two ways, either by exact
value, or by range.
Below, we construct a typeDimension and filter it:
// Filter for dogs.
var typeDimension = livingThings.dimension(function(d) { return d.type; });
typeDimension.filter(“dog”)
That’s it. Dimensions are stateful, so Crossfilter knows about our filter, and
will ensure that all future operations are filtered to only work on dogs
except for any calculations performed directly on typeDimension
. This is
expected behavior, but I’m not sure if it’s a design choice or a design
necessity. (We’ll look at the workaround later.)
var n = livingThings.groupAll().reduceCount().value();
console.log(“There are ” + n + “ dogs in my house.”) // 2
var legs = livingThings.groupAll().reduceSum(function(fact) {
return fact.legs;
}).value()
console.log(“There are ” + legs + “ dog legs in my house.”) // 8
Let’s clear the filter, then do some grouping.
// Clear the filter.
typeDimension.filterAll()
I want to know how many living things of each type are in my house. I already
have a dimension
grouped by type called typeDimension
.
Using typeDimension
, I’m going to group the records by type, and then
create a measure
that returns the count called countMeasure
. Once
countMeasure is created, we can find the number of entries by calling
countMeasure.size()
(a.k.a the cardinality of the type dimension), and we
can get the actual counts by calling countMeasure.top(size)
.
// How many living things of each type are in my house?
var countMeasure = typeDimension.group().reduceCount();
var a = countMeasure.top(4);
console.log(“There are ” + a[0].value + “ ” + a[0].key + “(s) in my house.”);
console.log(“There are ” + a[1].value + “ ” + a[1].key + “(s) in my house.”);
console.log(“There are ” + a[2].value + “ ” + a[2].key + “(s) in my house.”);
console.log(“There are ” + a[3].value + “ ” + a[3].key + “(s) in my house.”);
Awesome. Now let’s count legs by type. For this, we’ll create a dimension
called legMeasure
. This will use the reduceSum
function instead of
reduceCount
, and we’ll provide a function that tells Crossfilter what field
we want to sum.
// How many legs of each type are in my house?
var legMeasure = typeDimension.group().reduceSum(function(fact) { return fact.legs; });
var a = legMeasure.top(4);
console.log(“There are ” + a[0].value + “ ” + a[0].key + “ legs in my house.”);
console.log(“There are ” + a[1].value + “ ” + a[1].key + “ legs in my house.”);
console.log(“There are ” + a[2].value + “ ” + a[2].key + “ legs in my house.”);
console.log(“There are ” + a[3].value + “ ” + a[3].key + “ legs in my house.”);
As mentioned earlier, when you filter on a dimension, and then roll-up using said dimension, Crossfilter intentionally ignores any filter an said dimension.
For example, this does not return what you would expect:
// Filter for dogs.
typeDimension.filter(“dog”)
// How many living things of each type are in my house?
// You’d expect this to return 0 for anything other than dogs,
// but it doesn’t because the following statement ignores any
// filter applied to typeDimension:
var countMeasure = typeDimension.group().reduceCount();
var a = countMeasure.top(4);
console.log(“There are ” + a[0].value + “ ” + a[0].key + “(s) in my house.”);
console.log(“There are ” + a[1].value + “ ” + a[1].key + “(s) in my house.”);
console.log(“There are ” + a[2].value + “ ” + a[2].key + “(s) in my house.”);
console.log(“There are ” + a[3].value + “ ” + a[3].key + “(s) in my house.”);
The workaround is to create another dimension on the same field, and filter on that:
// Filter for dogs.
var typeFilterDimension = livingThings.dimension(function(fact) { return fact.type; });
typeFilterDimension.filter(“dog”)
// Now this returns what you would expect.
var countMeasure = typeDimension.group().reduceCount();
var a = countMeasure.top(4);
console.log(“There are ” + a[0].value + “ ” + a[0].key + “(s) in my house.”);
console.log(“There are ” + a[1].value + “ ” + a[1].key + “(s) in my house.”);
console.log(“There are ” + a[2].value + “ ” + a[2].key + “(s) in my house.”);
console.log(“There are ” + a[3].value + “ ” + a[3].key + “(s) in my house.”);
Crossfilter is built to be insanely fast. To do that, rather than completely re-calculating groups as filters are applied, it calculates incrementally. Crossfilter does that by using a bitfield to track whether or not a fact exists in a specific dimension. For that reason, Crossfilter dimensions are expensive, so you should think carefully about creating them and create as few as possible.
I’m the co-founder of FiveStreet, a technology startup company that helps leading real estate agents beat their competition when responding to online leads. We’re actively looking for a designer, business development folks, and (of course) more customers. If you can help me connect to any of the aforemention folks, please get in touch.
Content © 2006-2021 Rusty Klophaus