blog.rusty.io

Gmail and OAuth 2.0

At FiveStreet.com, we try to make it as easy as possible for new customers to integrate our application into their existing workflow. One way we do this is by grabbing real-estate leads directly from their Gmail inboxes.

When we first built this feature, Google only provided IMAP access through OAuth 1.0. Since then, Google has deprecated OAuth 1.0 in favor of OAuth 2.0. (And they are dropping support for OAuth 1.0 in April of 2015.)

This means we have to:

  1. Migrate OAuth 1 security tokens to OAuth 2 tokens.
  2. Figure out how to connect to Gmail using OAuth 2 tokens.

Google’s documentation on how to do this are fairly confusing, and – in some places – inaccurate.

Converting OAuth 1.0 credentials to OAuth 2.0 credentials.

The code below takes the OAuth 1.0 token / secret assigned to a user and converts it to an OAuth 2.0 refresh token. You will need to run this conversion process once for every user, and save the refresh token to your database.

The refresh token can then be used to generate a short-lived access token. Note that your OAuth 1.0 credentials will stop working an hour after you use the OAuth 2.0 refresh token.

More information: Migrating from OAuth 1.0 to OAuth 2.0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
require 'oauth'
require 'net/http'
require 'json'

# OAuth 1 - Application Key / Secret.
oauth1_consumer_key    = "www.site.com"
oauth1_consumer_secret = "..."

# OAuth 1 - User Token / Secret.
oauth1_token           = "..."
oauth1_secret          = "..."

# OAuth 2 - Application ID / Secret
oauth2_client_id       = "..."
oauth2_client_secret   = "..."

# Migration Parameters.
params = {
  "grant_type"             => "urn:ietf:params:oauth:grant-type:migration:oauth1",
  "client_id"              => oauth2_client_id,
  "client_secret"          => oauth2_client_secret,
  "oauth_signature_method" => "HMAC-SHA1"
}

# Create the consumer object.
consumer = OAuth::Consumer.new(
  oauth1_consumer_key,
  oauth1_consumer_secret,
  :site   => 'https://accounts.google.com',
  :scheme => :header
)

# Create the access token object.
access_token = OAuth::AccessToken.new(consumer, oauth1_token, oauth1_secret)

# Post to the migration URL.
resp = access_token.post(
  "/o/oauth2/token",
  params,
  { 'Content-Type' => 'application/x-www-form-urlencoded' })

if resp.code.to_s != "200"
  # Raise an error.
  raise "#{resp.code} - #{resp.body}"
end

# Now you have a refresh token!
 oauth2_refresh_token = JSON.parse(resp.body)["refresh_token"]

Generate an Access Token

The code below uses the OAuth 2.0 refresh token you just generated to obtain an OAuth 2.0 access token. The access token is what you use to actually authenticate the user.

Note: You may need to change the scope based on your application’s needs.

More information: Using an OAuth 2.0 refresh token

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
require 'oauth2'

# OAuth 2 - Application ID / Secret
oauth2_client_id       = "..."
oauth2_client_secret   = "..."
oauth2_refresh_token   = "refresh-token-from-above"

# Create the OAuth 2 client.
client = OAuth2::Client.new(
  oauth2_client_id,
  oauth2_client_secret,
  {
    :site      => "https://accounts.google.com",
    :token_url => "/o/oauth2/token",
    :token_method => :post,
    :grant_type => "refresh_token",
    :scope      => "https://www.googleapis.com/auth/userinfo.email https://mail.google.com/"
  })

oauth2_access_token = client.get_token(
  "client_id"     => oauth2_client_id,
  "client_secret" => oauth2_client_secret,
  "refresh_token" => oauth2_refresh_token,
  "grant_type"    => "refresh_token")

Connect to IMAP using XOAUTH2

Now that you have an access token, you can use it to authenticate with Gmail through the SASL XOAUTH2 mechanism.

First, we need to update Net::IMAP to add XOAUTH2 as an authenticator method. According to Google’s documentation, this can be done by base64 encoding a specially formatted string containing the user’s email address and a valid access token.

More information: The SASL XOAUTH2 Mechanism

In practice, base64 encoding the string didn’t work, but providing an unencoded string did. It feels like they might quietly change this one day, so be careful if you use this in production.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
require 'net/imap'

class Net::IMAP
  class XOAuth2Authenticator
    def initialize(email_address, access_token)
      @email_address = email_address
      @access_token = access_token
    end

    def process(s)
      # HACK!!! - The docs say that we need to base64 encode the
      # following line; but that doesn't work in practice.
      "user=#{@email_address}\x01auth=Bearer #{@access_token}\x01\x01"
    end
  end

  add_authenticator 'XOAUTH2', XOAuth2Authenticator
end

Now that we’ve added an XOAUTH2 authenticator, connecting to IMAP is simple:

1
2
3
4
5
6
7
8
# The user's email address.
email_address = "..."

# Connect to IMAP.
client = Net::IMAP.new("imap.gmail.com", :port => 993, :ssl => true)
client.authenticate('XOAUTH2', email_address, oauth2_access_token.token)
# ...do stuff...
client.disconnect()

That’s all. Hopefully this saves you many, many hours of frustration.

Running OpenVBX on AppFog

Some notes on installing OpenVBX (Twilio’s open-source, web-based phone system) on AppFog (a public cloud Platform as a Service). In theory, this approach should also work for Heroku.

The main challenge in getting OpenVBX to run on AppFog is that the OpenVBX installer appears to work out of the box on AppFog. But then, when you redeploy your app, you overwrite the config files generated by the installer, so all of the changes are lost. This is because the installer overwrites some config files.

At this point, if you turn to the OpenVBX docs, it appears that your solution is to skip the installer and instead update the sample versions of the config files. This WILL NOT WORK for two reasons: 1) it won’t create the database and 2) it won’t actually set the settings correctly, leading to some weird confounding error. In my case, I experienced an infinite redirect loop.

The actual solution is fairly simple: run through the complete installation process, let the installer properly initialize the database and generate the config files. Then, use a helper script to grab the contents of the config files, and recreate them locally.

Here is a step-by-step guide:

  • Download or clone OpenVBX (http://www.openvbx.org/)

  • Add a file called helper.php at the top level OpenVBX directory with the content below. This will simply display the content of the config files that the OpenVBX installer changes. (We will remove this file after installing OpenVBX, as it exposes sensitive information.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<h1>Database Settings</h1>
<pre>
<?= htmlspecialchars(getenv("VCAP_SERVICES")) ?>
</pre>

<hr />

<h1>openvbx.php</h1>
<pre>
<?= htmlspecialchars(file_get_contents("OpenVBX/config/openvbx.php")) ?>
</pre>

<hr />

<h1>database.php</h1>
<pre>
<?= htmlspecialchars(file_get_contents("OpenVBX/config/database.php")) ?>
</pre>
  • Create a new AppFog app and MySQL instance. Bind the MySQL instance to your app. Deploy your app to AppFog. (http://blog.appfog.com/getting-started-with- appfogs-command-line/) Make sure you start with a fresh MySQL database, don’t attempt to re-use a database from a previous installation attempt; it will not work.

  • Navigate to your http://YOUR-APP.aws.af.cm/helper.php.

  • In a separate browser window, navigate to http://YOUR-APP.aws.af.cm and run through the OpenVBX install process. Use the database connection settings displayed by helper.php to configure the database. You do not need to enter the port number, just the database hostname (which was an IP address for me), database name, username, and password. Continue through the rest of the install process. This will initialize the database.

  • Once the installer has finished and you see a login prompt, refresh http://YOUR-APP.aws.af.cm/helper.php. You will see that the installer has created openvbx.php and database.php config files. Copy that content, and use it to create openvbx.php and database.php files locally.

  • Delete the helper.php file!

  • Redeploy your app. In theory, you should see a login prompt when you navigate to your app’s URL.

Learn Something New

tl;dr - If your resolution is to learn X in 15 minutes a day, you will fail. If you want to learn something new, immerse yourself in it.

It’s New Year’s Eve, and countless resolutions are being made today, only to be broken within the week. A common resolution is Learn Something New, and a common approach is to set aside 15 minutes a day for learning.

“15 minutes will be easy to find”, we rationalize. “I spend hours checking Facebook, watching TV, playing Angry Birds. Surely I can find 15 minutes to improve myself and become more fulfilled, more well-rounded, or more employable?”

This approach is misguided. Complete bullshit. You can’t learn anything substantial in 15 minutes a day. Not only that, but it takes an enormous amount of discipline to do anything for 15 minutes a day, every day. You are setting yourself up for failure.

A better approach:

Think back to when you learned to ride a bike, or beat a video game. Think about when you got completely addicted to some TV show, learned all about the plot and characters, and scoured the Internet looking for next season’s plot spoilers. Model your learning after that.

Immerse yourself. Follow rabbit holes. Have fun. Spend hours. Skip meals. Don’t lesson plan. Just be curious. The learning will follow.

Crossfilter Tutorial

Earlier this year, Square released a Javascript library called Crossfilter. Crossfilter is like a client-side OLAP server, quickly grouping, filtering, and aggregating tens or hundreds of thousands of rows of raw data very, very quickly. Crossfire is intended to be combined with a graphing or charting library like D3, Highcharts, or the Google Visualization API; it doesn’t have a UI of its own.

If you have experience with OLAP or multi-dimensional processing, you should be able to ramp up on Crossfilter fairly quickly. If you only have experience with relational databases, it may take a little longer. If you’ve never used the SQL group-by feature, then you face a steep learning curve.

Quick Terminology Primer

First, you’ll need to understand facts , dimensions , and measures . (If you’re already familiar with these terms, then skip this section.)

Imagine you want to answer the question “How many orders do we process per week?”

You could calculate this by hand by iterating through all of the orders that your business had processed, grouping them into weeks. In this case, each order entry would be called a fact , and you would probably store this in an OrderFacts table. The week would be a dimension ; it is a way you want to slice the data. And the count of orders would be a measure , it is a value that you want to calculate.

Imagine another question, “How much revenue do we book per salesperson per week?” Again, your facts would be stored in an OrderFacts table. You would now have two dimensions , salesperson and week. And finally, your measure is dollars per order.

Below, we’re going to answer some questions like “How many living things live in my house?” and “How many legs of each type exist in my house?”

Getting Facts Into Crossfilter

It’s incredibly easy to get your fact data into Crossfilter: just use JSON. Each row is a fact.

Below, we’ve created a Crossfilter object loaded with facts about the living things in my house.

(Note: These are, for the most part, “fictional” facts. I don’t actually have (any pets, but it makes for a good tutorial.)

1
2
3
4
5
6
7
8
9
var livingThings = crossfilter([
  // Fact data.
  { name: Rusty,  type: human, legs: 2 },
  { name: Alex,   type: human, legs: 2 },
  { name: Lassie, type: dog,   legs: 4 },
  { name: Spot,   type: dog,   legs: 4 },
  { name: Polly,  type: bird,  legs: 2 },
  { name: Fiona,  type: plant, legs: 0 }
]);

That’s it. Now let’s find out some totals. For example, how many living things are in my house?

Calculating Totals

To do this, we’ll call the groupAll convenience function, which selects all records into a single group, and then the reduceCount function, which creates a count of the records. Not very useful so far.

1
2
3
// How many living things are in my house?
var n = livingThings.groupAll().reduceCount().value();
console.log(There are  + n +  living things in my house.) // 6

Now let’s get a count of all the legs in my house. Again, we’ll use the groupAll function to get all records in a single group, but then we call the reduceSum function. This is going to sum values together. What values? Well, we want legs, so let’s pass a function that extracts and returns the number of legs from the fact.

1
2
3
// How many total legs are in my house?
var legs = livingThings.groupAll().reduceSum(function(fact) { return fact.legs; }).value()
console.log(There are  + legs +  legs in my house.) // 14

Filtering

Now let’s test out some of the filtering functionality.

I want to know how many living things in my house are dogs, and how many legs they have. For this, we’ll need a dimension . Remember that a dimension is something you want to group or filter by. Here, the dimension is going to be the type . Crossfilter can filter on dimensions in two ways, either by exact value, or by range.

Below, we construct a typeDimension and filter it:

1
2
3
// Filter for dogs.
var typeDimension = livingThings.dimension(function(d) { return d.type; });
typeDimension.filter(dog)

That’s it. Dimensions are stateful, so Crossfilter knows about our filter, and will ensure that all future operations are filtered to only work on dogs except for any calculations performed directly on typeDimension . This is expected behavior, but I’m not sure if it’s a design choice or a design necessity. (We’ll look at the workaround later.)

1
2
3
4
5
6
7
var n = livingThings.groupAll().reduceCount().value();
console.log(There are  + n +  dogs in my house.) // 2

var legs = livingThings.groupAll().reduceSum(function(fact) {
  return fact.legs;
}).value()
console.log(There are  + legs +  dog legs in my house.) // 8

Let’s clear the filter, then do some grouping.

1
2
// Clear the filter.
typeDimension.filterAll()

Grouping with Crossfilter

I want to know how many living things of each type are in my house. I already have a dimension grouped by type called typeDimension .

Using typeDimension , I’m going to group the records by type, and then create a measure that returns the count called countMeasure . Once countMeasure is created, we can find the number of entries by calling countMeasure.size() (a.k.a the cardinality of the type dimension), and we can get the actual counts by calling countMeasure.top(size).

1
2
3
4
5
6
7
// How many living things of each type are in my house?
var countMeasure = typeDimension.group().reduceCount();
var a = countMeasure.top(4);
console.log(There are  + a[0].value +   + a[0].key + (s) in my house.);
console.log(There are  + a[1].value +   + a[1].key + (s) in my house.);
console.log(There are  + a[2].value +   + a[2].key + (s) in my house.);
console.log(There are  + a[3].value +   + a[3].key + (s) in my house.);

Awesome. Now let’s count legs by type. For this, we’ll create a dimension called legMeasure . This will use the reduceSum function instead of reduceCount , and we’ll provide a function that tells Crossfilter what field we want to sum.

1
2
3
4
5
6
7
// How many legs of each type are in my house?
var legMeasure = typeDimension.group().reduceSum(function(fact) { return fact.legs; });
var a = legMeasure.top(4);
console.log(There are  + a[0].value +   + a[0].key +  legs in my house.);
console.log(There are  + a[1].value +   + a[1].key +  legs in my house.);
console.log(There are  + a[2].value +   + a[2].key +  legs in my house.);
console.log(There are  + a[3].value +   + a[3].key +  legs in my house.);

Filtering Gotchas

As mentioned earlier, when you filter on a dimension, and then roll-up using said dimension, Crossfilter intentionally ignores any filter an said dimension.

For example, this does not return what you would expect:

1
2
3
4
5
6
7
8
9
10
11
12
13
// Filter for dogs.
typeDimension.filter(dog)

// How many living things of each type are in my house?
// You’d expect this to return 0 for anything other than dogs,
// but it doesn’t because the following statement ignores any
// filter applied to typeDimension:
var countMeasure = typeDimension.group().reduceCount();
var a = countMeasure.top(4);
console.log(There are  + a[0].value +   + a[0].key + (s) in my house.);
console.log(There are  + a[1].value +   + a[1].key + (s) in my house.);
console.log(There are  + a[2].value +   + a[2].key + (s) in my house.);
console.log(There are  + a[3].value +   + a[3].key + (s) in my house.);

The workaround is to create another dimension on the same field, and filter on that:

1
2
3
4
5
6
7
8
9
10
11
// Filter for dogs.
var typeFilterDimension = livingThings.dimension(function(fact) { return fact.type; });
typeFilterDimension.filter(dog)

// Now this returns what you would expect.
var countMeasure = typeDimension.group().reduceCount();
var a = countMeasure.top(4);
console.log(There are  + a[0].value +   + a[0].key + (s) in my house.);
console.log(There are  + a[1].value +   + a[1].key + (s) in my house.);
console.log(There are  + a[2].value +   + a[2].key + (s) in my house.);
console.log(There are  + a[3].value +   + a[3].key + (s) in my house.);

Other Gotchas

Crossfilter is built to be insanely fast. To do that, rather than completely re-calculating groups as filters are applied, it calculates incrementally. Crossfilter does that by using a bitfield to track whether or not a fact exists in a specific dimension. For that reason, Crossfilter dimensions are expensive, so you should think carefully about creating them and create as few as possible.

Shameless Plug

I’m the co-founder of FiveStreet, a technology startup company that helps leading real estate agents beat their competition when responding to online leads. We’re actively looking for a designer, business development folks, and (of course) more customers. If you can help me connect to any of the aforemention folks, please get in touch.

Coding Brain

Coding Brain is what happens when I’ve been immersed in code for too many days in a row. After three or four days, my brain is primed to think in terms of programming, and so I start to struggle with normal verbal communication.

It’s not that I can’t find the words, it’s that I start speaking in awkward patterns. A normal conversational pattern is linear– thoughts chain together, each loosely connected with the last. To a brain primed for writing code, this feels unnatural.

Instead, it feels more natural to follow the conversational equivalent of top-down or bottom-up design. I either talk in highly vague, bullet-pointed abstractions, or I shovel through low-level details, addressing every possible corner case of a thought before moving on to the next. To my conversational partner, the former feels like incomplete sentences, the latter like mind-numbing minutia.

Coding brain goes away on its own after a day spent with people (or after a few beers). But as the tech co-founder of a startup, I don’t have the luxury of a slow shift into normalcy. Sometimes, I need to crank through code for 8 hours, then jump directly into a meeting with a potential partner or with our board of directors, and I can’t be a zombie.

Today, I think I found a cure. Before a late afternoon phone call, I picked up a book and just read out loud for about 10 minutes. By the time the call rolled around, I had alleviated coding brain just enough to have a normal conversation. Definitely something to keep in mind for the future.

Looking Back on 2011

Looking Back on 2011

(Yes, this post is overdue. Wrote it early in January, then forgot to post.)

Since reading this post by Derek Sivers, founder of CDBaby, I’ve been hesitant to blog about upcoming projects, plans, and schemes. According to a study that Derek cites: “Announcing your plans to others satisfies your self-identity just enough that you’re less motivated to do the hard work needed.” Wouldn’t want that to happen, would we?

Everybody’s self-identity needs a bit of satisfaction, however, and mine is no exception. To get it, I’m not going to talk about what I’m going to do, instead I’m going to talk about what I’ve done. In other words, this post is me giving myself a congratulatory high-five.

Here’s a quick recap of what kept me busy in 2011:

Basho Technologies

2011 marked the beginning of my third year with Basho Technologies. Basho makes Riak, an open-source, distributed document database for companies with lots of critical data.

I spent the first half of the year involved in client work and researching and prototyping how to combine Riak with OLAP engines and geospatial search. Exciting stuff, but unfortunately it didn’t make sense to take this beyond the stage of prototypes.

During the second half of the year, I focused on Secondary Indexing support in Riak. Somewhere in the middle there, we cranked like hell to release Riak version 1.0; a major milestone in any product’s life.

Hacker News Readers of DC

The DC technology scene continues to crystallize. I use the word “crystallize” very specifically here; I think there have always been technologists in DC, but only recently have the structures been in place for us to come together. Hacker News Readers of DC is one of those structures.

In 2011 we:

  • Doubled in size, from 320 members to 660+ members.
  • Average one meetup every 1.7 months. (Down slightly from last year, something to work on.)
  • Averaged 70 attendees per meetup. (Double the average of 35 from last year.)
  • Held three sponsored meetups (thanks to Factual, SBNation, and Basho!)
  • Held two startup showcases.

Tech Conferences

In 2011, I decreased my rate of conference speaking. I had averaged approximately one conference talk per month in 2010, which was enlightening, but grueling.

Here are my 2011 talks:

ErlangDC 2011

I tried my hand at conference organizing in 2011, working with Luc Castera and Ram Singh to organize ErlangDC 2011 in December. This was an epic one-day conference held at the AOL headquarters in Dulles, focused on helping DC-area developers become more familiar and comfortable with Erlang.

Highlights include:

  • Sold out all 110 tickets.
  • Full day tracks, with a three hour tutorial sponsored by Erlang Solutions and 8 speakers.
  • 8 sponsors who helped with venue, lunch, book give-aways, snacks, drinks, t-shirts, and other costs.

Blogging

In June of 2011, I switched to Posterous and vowed to start blogging regularly. This led to a flurry of activity for a combined total of 34 posts in June and July alone, and 46 across the entire year.

Unfortunately, it didn’t turn into a habit, and my rate of blogging dropped back down to one or two posts per month. I had been trying to blog daily, but I think twice weekly might be a better goal for me.

Here are some of my favorite posts:

I also did a short stint of mini-podcasts for the Riak MinuteWith channel. Unfortunately, it didn’t catch on with the community, so we discontinued the series.

Fitness

My proudest accomplishment of 2011 isn’t technology related at all. For many years, I’ve skated by on youth and good genes to avoid any kind of fitness regiment. Last year, I decided to change that.

My approach was simple: just step out the door every day with running shoes on. It didn’t matter whether I ran a block or a mile, the goal was just to give myself the opportunity to work out.

This approach worked, I averaged 3 workouts a week since I started in March. 80% of my workouts were running, where I averaged 3 miles. 20% were either gym or kettlebell workouts.

Read last year’s update.

The Power of X-Driven Development

Test-, Behavior-, and Readme-Driven Development are not just about creating one code artifact before another. It’s not simply eating your carrots before your peas, or putting on your shirt before your pants. These methodologies are powerful; they deeply affect the final structure of your code:

  • If you just sit down and code, you will find a workable solution quickly. Your program will be optimized for prototyping an idea, but it will also be a dirty mess.
  • If you write a handful of unit tests before coding, your program will be optimized for further testing.
  • If you draft some docs before coding, your program will be optimized for clarity and understanding by other developers.

There are many different ways to approach a piece of code: Interface Driven Development #1, Interface Driven Development #2, Domain Driven Design, Statecharts, CRC cards, UML Diagrams. The list goes on.

There is no clear *best* approach; the right approach depends on the situation, the existing codebase, the size of a team, familiarity with different tools, and level of discipline.

But if you are writing a program that you intend to sell or deploy in production, I’d argue that the *worst* possible approach is to just sit down and start coding.

Everybody Polyglot! — Cross-Language RPC With Erlang

Slides to my ErlangDC2011 talk – “Everybody Polyglot!” – are on SlideShare:

Everybody Polyglot! - Cross-Language RPC with Erlang View more presentations from Rusty Klophaus

The talk attempts to demystify the mechanics of calling into Erlang from some other language by selecting three approaches to RPC (REST/JSON, Protocol Buffers, and BERT-RPC) and stepping through:

  • The Request/Response chatter
  • Client-side code
  • Server-side code
  • Round-Trip Encoding Performance
  • Operation Performance

I referenced the following Erlang projects in my talk:

I realized, after giving my talk, that two very important slides accidentally fell off the editing board. I have re-added the missing slides, but am repeating the information below.

The first slide contained a list of other cross-language mechanisms:

RPC:

Erlang <-> Other Code:

“Fake” Erlang Nodes:

The second slide contained a list of related talks:

Winning the Edit•Build•Test Cycle in Erlang

A fast, low-friction Edit/Build/Test cycle is one of the best and easiest ways to increase developer productivity across an organization. 

These slides are from a recent talk I presented at the DC Erlang Users meetup group. It is a breadth-first tour of some of the tools we use at Basho to speed up and streamline the Edit/Build/Test cycle for our Erlang projects.

Winning the Erlang Edit•Build•Test Cycle

View more presentations from Rusty Klophaus

OSCON Data 2011

It’s an exciting day! Flying to Portland for OSCON 2011.

I’m giving a talk called  Querying Riak Just Got Easier - Introducing Secondary Indices at the end of the day on Monday. In the talk, I briefly touch on NoSQL, tradeoffs, and where Riak fits in the NoSQL landscape. I then dive into Secondary Indices (a new feature of Riak to be included in the next version) and talk through how it changes Riak data modeling. I finally wrap up with some of the challenges we faced.

Here are the other talks I plan to attend: My Schedule.

Get in touch via a comment (below) or twitter if you are attending OSCON. We’ll grab a beer.