On the Social Analytics team here at Bazaarvoice, our main job is to identify insights from social data. We recently decided to explore lifestyles via geography – how do “big city” people differ from “rural folk”? Here’s how we approached this problem.

We started this investigation by mapping reviewers’ IP addresses captured during submission to zip codes. Next we mapped each zip code to a corresponding Beale code. A Beale code classifies how urban or how rural a region is based on its degree of urbanization and proximity to a metro area. A code of 1 means “a county in a metro area with a population of 1 million or more”; 9 is a “non-metro county, completely rural or has an urban population of less than 2,500, and isn’t adjacent to a metro area.”

We began with the low-hanging fruit: rating and word count, based on review data from Q3-2011.

There wasn’t an obvious association between average rating and the urban codes. We expected this from an average of non-normal ratings. We did however find that people in the most urban demographic had the lengthiest reviews of all groups – and as reviewers trended more rural, they had less and less to say. There was a -.85 correlation between Beale code and average review word count on the whole.

It seemed so unlikely that we ran the same analysis on a larger date range of data – with the same takeaways. But why could this be?

Any good analyst knows there’s always more to the story, so we dove further into the data. Next we broke the reviews out by verticals, knowing that reviewers discuss different products at different lengths. A review for a vacation takes an average of 73 words to explain, for example, while a review of a new sweater would be closer to 54 words. Could it be that urban residents went on more trips, thus wrote lengthier reviews overall?

It turns out for the two most urban groups, travel makes up about 23-25% of all reviews, and mass merchant/department store reviews follow with 22%-25%. For the two most rural groups, travel makes up slightly more of the total review content at 24-27%, and mass merchant reviews make up 27-28%. With the two largest verticals accounting for nearly the same ratio of content in each Beale code, we wouldn’t attribute differences in review length to the vertical alone. Indeed, average review length decreases within each vertical across the Beale codes:

Some other initial insights hinted at stereotypes of urban versus city lifestyles. For example, reviews in the outdoor/sporting goods vertical had a 6% share of the most rural codes but only a 3% share of the most urban codes. Consumer electronics represented 5% of the most urban codes, and only 3% of the most rural.

Next we looked at age. We already knew that older users tend to write shorter reviews – could it be that rural areas had older reviewers?

It turns out rural regions do tend to have older reviewers – over one-third of the most rural reviewers were over 55, compared to only 19% of the most urban.

So there you have our current best explanation of the phenomenon: rural reviewers skew older, and older reviewers tend to write less. We could look at any number of variables (submission times, misspellings, keyword density, product prices, device used for submission, etc.) to further understand the phenomenon at work here. The coolest takeaway – and the point of this story – is that patterns often exist where we may not expect them. There are a wealth of untold stories and insights hiding among the avalanche of data we collect, just waiting to be unearthed.