tl;dr - If your resolution is to learn X in 15 minutes a day, you will fail.
If you want to learn something new, immerse yourself in it.
It’s New Year’s Eve, and countless resolutions are being made today, only to
be broken within the week. A common resolution is Learn Something New, and
a common approach is to set aside 15 minutes a day for learning.
“15 minutes will be easy to find”, we rationalize. “I spend hours checking
Facebook, watching TV, playing Angry Birds. Surely I can find 15 minutes to
improve myself and become more fulfilled, more well-rounded, or more
employable?”
This approach is misguided. Complete bullshit. You can’t learn anything
substantial in 15 minutes a day. Not only that, but it takes an enormous
amount of discipline to do anything for 15 minutes a day, every day. You are
setting yourself up for failure.
A better approach:
Think back to when you learned to ride a bike, or beat a video game. Think
about when you got completely addicted to some TV show, learned all about the
plot and characters, and scoured the Internet looking for next season’s plot
spoilers. Model your learning after that.
Immerse yourself. Follow rabbit holes. Have fun. Spend hours. Skip meals.
Don’t lesson plan. Just be curious. The learning will follow.
Earlier this year, Square released a Javascript library called
Crossfilter. Crossfilter is like
a client-side OLAP server, quickly grouping, filtering, and aggregating tens
or hundreds of thousands of rows of raw data very, very quickly. Crossfire is
intended to be combined with a graphing or charting library like D3,
Highcharts, or the Google Visualization API; it doesn’t have a UI of its own.
If you have experience with OLAP or multi-dimensional processing, you should
be able to ramp up on Crossfilter fairly quickly. If you only have experience
with relational databases, it may take a little longer. If you’ve never used
the SQL group-by feature, then you face a steep learning curve.
Quick Terminology Primer
First, you’ll need to understand facts , dimensions , and measures . (If
you’re already familiar with these terms, then skip this section.)
Imagine you want to answer the question “How many orders do we process per
week?”
You could calculate this by hand by iterating through all of the orders that
your business had processed, grouping them into weeks. In this case, each
order entry would be called a fact , and you would probably store this in an
OrderFacts table. The week would be a dimension ; it is a way you want to
slice the data. And the count of orders would be a measure , it is a value
that you want to calculate.
Imagine another question, “How much revenue do we book per salesperson per
week?” Again, your facts would be stored in an OrderFacts table. You would
now have two dimensions , salesperson and week. And finally, your measure
is dollars per order.
Below, we’re going to answer some questions like “How many living things live
in my house?” and “How many legs of each type exist in my house?”
Getting Facts Into Crossfilter
It’s incredibly easy to get your fact data into Crossfilter: just use JSON.
Each row is a fact.
Below, we’ve created a Crossfilter object loaded with facts about the living
things in my house.
(Note: These are, for the most part, “fictional” facts. I don’t actually have
(any pets, but it makes for a good tutorial.)
That’s it. Now let’s find out some totals. For example, how many living things
are in my house?
Calculating Totals
To do this, we’ll call the groupAll convenience function, which selects all
records into a single group, and then the reduceCount function, which
creates a count of the records. Not very useful so far.
123
// How many living things are in my house?varn=livingThings.groupAll().reduceCount().value();console.log(“Thereare”+n+“livingthingsinmyhouse.”)// 6
Now let’s get a count of all the legs in my house. Again, we’ll use the
groupAll function to get all records in a single group, but then we call the
reduceSum function. This is going to sum values together. What values? Well,
we want legs, so let’s pass a function that extracts and returns the number of
legs from the fact.
123
// How many total legs are in my house?varlegs=livingThings.groupAll().reduceSum(function(fact){returnfact.legs;}).value()console.log(“Thereare”+legs+“legsinmyhouse.”)// 14
Filtering
Now let’s test out some of the filtering functionality.
I want to know how many living things in my house are dogs, and how many legs
they have. For this, we’ll need a dimension . Remember that a dimension is
something you want to group or filter by. Here, the dimension is going to be
the type . Crossfilter can filter on dimensions in two ways, either by exact
value, or by range.
Below, we construct a typeDimension and filter it:
123
// Filter for dogs.vartypeDimension=livingThings.dimension(function(d){returnd.type;});typeDimension.filter(“dog”)
That’s it. Dimensions are stateful, so Crossfilter knows about our filter, and
will ensure that all future operations are filtered to only work on dogs
except for any calculations performed directly on typeDimension . This is
expected behavior, but I’m not sure if it’s a design choice or a design
necessity. (We’ll look at the workaround later.)
I want to know how many living things of each type are in my house. I already
have a dimension grouped by type called typeDimension .
Using typeDimension , I’m going to group the records by type, and then
create a measure that returns the count called countMeasure . Once
countMeasure is created, we can find the number of entries by calling
countMeasure.size() (a.k.a the cardinality of the type dimension), and we
can get the actual counts by calling countMeasure.top(size).
1234567
// How many living things of each type are in my house?varcountMeasure=typeDimension.group().reduceCount();vara=countMeasure.top(4);console.log(“Thereare”+a[0].value+“”+a[0].key+“(s)inmyhouse.”);console.log(“Thereare”+a[1].value+“”+a[1].key+“(s)inmyhouse.”);console.log(“Thereare”+a[2].value+“”+a[2].key+“(s)inmyhouse.”);console.log(“Thereare”+a[3].value+“”+a[3].key+“(s)inmyhouse.”);
Awesome. Now let’s count legs by type. For this, we’ll create a dimension
called legMeasure . This will use the reduceSum function instead of
reduceCount , and we’ll provide a function that tells Crossfilter what field
we want to sum.
1234567
// How many legs of each type are in my house?varlegMeasure=typeDimension.group().reduceSum(function(fact){returnfact.legs;});vara=legMeasure.top(4);console.log(“Thereare”+a[0].value+“”+a[0].key+“legsinmyhouse.”);console.log(“Thereare”+a[1].value+“”+a[1].key+“legsinmyhouse.”);console.log(“Thereare”+a[2].value+“”+a[2].key+“legsinmyhouse.”);console.log(“Thereare”+a[3].value+“”+a[3].key+“legsinmyhouse.”);
Filtering Gotchas
As mentioned earlier, when you filter on a dimension, and then roll-up using
said dimension, Crossfilter intentionally ignores any filter an said
dimension.
For example, this does not return what you would expect:
12345678910111213
// Filter for dogs.typeDimension.filter(“dog”)// How many living things of each type are in my house?// You’d expect this to return 0 for anything other than dogs,// but it doesn’t because the following statement ignores any// filter applied to typeDimension:varcountMeasure=typeDimension.group().reduceCount();vara=countMeasure.top(4);console.log(“Thereare”+a[0].value+“”+a[0].key+“(s)inmyhouse.”);console.log(“Thereare”+a[1].value+“”+a[1].key+“(s)inmyhouse.”);console.log(“Thereare”+a[2].value+“”+a[2].key+“(s)inmyhouse.”);console.log(“Thereare”+a[3].value+“”+a[3].key+“(s)inmyhouse.”);
The workaround is to create another dimension on the same field, and filter on that:
1234567891011
// Filter for dogs.vartypeFilterDimension=livingThings.dimension(function(fact){returnfact.type;});typeFilterDimension.filter(“dog”)// Now this returns what you would expect.varcountMeasure=typeDimension.group().reduceCount();vara=countMeasure.top(4);console.log(“Thereare”+a[0].value+“”+a[0].key+“(s)inmyhouse.”);console.log(“Thereare”+a[1].value+“”+a[1].key+“(s)inmyhouse.”);console.log(“Thereare”+a[2].value+“”+a[2].key+“(s)inmyhouse.”);console.log(“Thereare”+a[3].value+“”+a[3].key+“(s)inmyhouse.”);
Other Gotchas
Crossfilter is built to be insanely fast. To do that, rather than completely
re-calculating groups as filters are applied, it calculates incrementally.
Crossfilter does that by using a bitfield to track whether or not a fact
exists in a specific dimension. For that reason, Crossfilter dimensions are
expensive, so you should think carefully about creating them and create as
few as possible.
Shameless Plug
I’m the co-founder of FiveStreet, a technology startup
company that helps leading real estate agents beat their competition when
responding to online leads. We’re actively looking for a designer, business
development folks, and (of course) more customers. If you can help me connect
to any of the aforemention folks, please get in touch.
Coding Brain is what happens when I’ve been immersed in code for too many days in a row. After three or four days, my brain is primed to think in terms of programming, and so I start to struggle with normal verbal communication.
It’s not that I can’t find the words, it’s that I start speaking in awkward patterns. A normal conversational pattern is linear– thoughts chain together, each loosely connected with the last. To a brain primed for writing code, this feels unnatural.
Instead, it feels more natural to follow the conversational equivalent of top-down or bottom-up design. I either talk in highly vague, bullet-pointed abstractions, or I shovel through low-level details, addressing every possible corner case of a thought before moving on to the next. To my conversational partner, the former feels like incomplete sentences, the latter like mind-numbing minutia.
Coding brain goes away on its own after a day spent with people (or after a few beers). But as the tech co-founder of a startup, I don’t have the luxury of a slow shift into normalcy. Sometimes, I need to crank through code for 8 hours, then jump directly into a meeting with a potential partner or with our board of directors, and I can’t be a zombie.
Today, I think I found a cure. Before a late afternoon phone call, I picked up a book and just read out loud for about 10 minutes. By the time the call rolled around, I had alleviated coding brain just enough to have a normal conversation. Definitely something to keep in mind for the future.
(Yes, this post is overdue. Wrote it early in January, then forgot to post.)
Since reading this post by Derek Sivers, founder of CDBaby, I’ve been hesitant to blog about upcoming projects, plans, and schemes. According to a study that Derek cites: “Announcing your plans to others satisfies your self-identity just enough that you’re less motivated to do the hard work needed.” Wouldn’t want that to happen, would we?
Everybody’s self-identity needs a bit of satisfaction, however, and mine is no exception. To get it, I’m not going to talk about what I’m going to do, instead I’m going to talk about what I’ve done. In other words, this post is me giving myself a congratulatory high-five.
Here’s a quick recap of what kept me busy in 2011:
Basho Technologies
2011 marked the beginning of my third year with Basho Technologies. Basho makes Riak, an open-source, distributed document database for companies with lots of critical data.
I spent the first half of the year involved in client work and researching and prototyping how to combine Riak with OLAP engines and geospatial search. Exciting stuff, but unfortunately it didn’t make sense to take this beyond the stage of prototypes.
During the second half of the year, I focused on Secondary Indexing support in Riak. Somewhere in the middle there, we cranked like hell to release Riak version 1.0; a major milestone in any product’s life.
Hacker News Readers of DC
The DC technology scene continues to crystallize. I use the word “crystallize” very specifically here; I think there have always been technologists in DC, but only recently have the structures been in place for us to come together. Hacker News Readers of DC is one of those structures.
In 2011 we:
Doubled in size, from 320 members to 660+ members.
Average one meetup every 1.7 months. (Down slightly from last year, something to work on.)
Averaged 70 attendees per meetup. (Double the average of 35 from last year.)
Held three sponsored meetups (thanks to Factual, SBNation, and Basho!)
Held two startup showcases.
Tech Conferences
In 2011, I decreased my rate of conference speaking. I had averaged approximately one conference talk per month in 2010, which was enlightening, but grueling.
I tried my hand at conference organizing in 2011, working with Luc Castera and Ram Singh to organize ErlangDC 2011 in December. This was an epic one-day conference held at the AOL headquarters in Dulles, focused on helping DC-area developers become more familiar and comfortable with Erlang.
Highlights include:
Sold out all 110 tickets.
Full day tracks, with a three hour tutorial sponsored by Erlang Solutions and 8 speakers.
8 sponsors who helped with venue, lunch, book give-aways, snacks, drinks, t-shirts, and other costs.
Blogging
In June of 2011, I switched to Posterous and vowed to start blogging regularly. This led to a flurry of activity for a combined total of 34 posts in June and July alone, and 46 across the entire year.
Unfortunately, it didn’t turn into a habit, and my rate of blogging dropped back down to one or two posts per month. I had been trying to blog daily, but I think twice weekly might be a better goal for me.
I also did a short stint of mini-podcasts for the Riak MinuteWith channel. Unfortunately, it didn’t catch on with the community, so we discontinued the series.
Fitness
My proudest accomplishment of 2011 isn’t technology related at all. For many years, I’ve skated by on youth and good genes to avoid any kind of fitness regiment. Last year, I decided to change that.
My approach was simple: just step out the door every day with running shoes on. It didn’t matter whether I ran a block or a mile, the goal was just to give myself the opportunity to work out.
This approach worked, I averaged 3 workouts a week since I started in March. 80% of my workouts were running, where I averaged 3 miles. 20% were either gym or kettlebell workouts.
Test-, Behavior-, and Readme-Driven Development are not just about creating one code artifact before another. It’s not simply eating your carrots before your peas, or putting on your shirt before your pants. These methodologies are powerful; they deeply affect the final structure of your code:
If you just sit down and code, you will find a workable solution quickly. Your program will be optimized for prototyping an idea, but it will also be a dirty mess.
If you write a handful of unit tests before coding, your program will be optimized for further testing.
If you draft some docs before coding, your program will be optimized for clarity and understanding by other developers.
There is no clear *best* approach; the right approach depends on the situation, the existing codebase, the size of a team, familiarity with different tools, and level of discipline.
But if you are writing a program that you intend to sell or deploy in production, I’d argue that the *worst* possible approach is to just sit down and start coding.
The talk attempts to demystify the mechanics of calling into Erlang from some other language by selecting three approaches to RPC (REST/JSON, Protocol Buffers, and BERT-RPC) and stepping through:
The Request/Response chatter
Client-side code
Server-side code
Round-Trip Encoding Performance
Operation Performance
I referenced the following Erlang projects in my talk:
I realized, after giving my talk, that two very important slides accidentally fell off the editing board. I have re-added the missing slides, but am repeating the information below.
The first slide contained a list of other cross-language mechanisms:
A fast, low-friction Edit/Build/Test cycle is one of the best and easiest ways to increase developer productivity across an organization.
These slides are from a recent talk I presented at the DC Erlang Users meetup group. It is a breadth-first tour of some of the tools we use at Basho to speed up and streamline the Edit/Build/Test cycle for our Erlang projects.
It’s an exciting day! Flying to Portland for OSCON 2011.
I’m giving a talk called Querying Riak Just Got Easier - Introducing Secondary Indices at the end of the day on Monday. In the talk, I briefly touch on NoSQL, tradeoffs, and where Riak fits in the NoSQL landscape. I then dive into Secondary Indices (a new feature of Riak to be included in the next version) and talk through how it changes Riak data modeling. I finally wrap up with some of the challenges we faced.
Here are the other talks I plan to attend: My Schedule.
Get in touch via a comment (below) or twitter if you are attending OSCON. We’ll grab a beer.
I hate domain squatters. There is nothing more pointless than thinking up nonsense words in order to find an available domain name for a new project.
Unfortunately, there aren’t many good solutions to the domain squatter problem. Once the rules of a market are created, they are difficult to change.
That’s why I’m hesitatingly in favor of ICANN’s recent announcement that anybody with $185k can by a top level domain. Yes, there are many reasons why it’s foolish. But, it widens the supply of domain names, which means that every domain owned by a squatter is now less valuable. It’s true that this also hurts legitimate domain name owners, but unlike squatters, most of their value is concentrated in the business behind the domain name, not the domain name itself.
One could argue that there are already many options for top level domains, but many of them come with significant risk. (Think “.ly” domains and Libya.)