When I posted a blog titled, The Holy Grail: Real ROI from social in 5 steps, a few weeks ago, I was delighted to get a tweet about it from Michael Caveretta who happens to be the Technical Leader for predictive analytics at Ford Motor Company’s Research and Advanced Engineering division. In short, he’s a smart guy and one to learn from. Mr. Caveretta and his team operate as internal consultants at Ford using technologies in big data, machine learning, artificial intelligence, data mining, text mining, and information retrieval to improve business processes across the enterprise. With that in mind, we took the opportunity to ask Mr Caveretta a few questions to get his perspectives on how data can help guide business improvements. He kindly responded and we thought it worthwhile to share his responses with you in this blog post.

You’ve been at Ford for 15 years. What is the most significant change to the way Ford uses data science and your group’s talents since you began?

The biggest change is the focus the company has on making data-driven decisions. Hiring Alan Mulally changed the focus to, “If you’re making a decision you’d better have data to back it up.” Also, at the beginning of my time with Ford, we worked primarily in departmental silos and most of our work now is across the enterprise.

Any advice for someone that wants to knock down data and knowledge siloes within his or her organization?

We’ve had good success breaking down data siloes by delivering iterative proof of concepts. Being in the Research and Advanced Engineering organization and having experience working with almost all areas of the company, has allowed us to know the highest-value data and business problems across different siloes. We start small and iterate, continuing to generate insights on a regular cadence. This cadence is generally between 2-6 weeks.

What are the toughest challenges you see in converting social data into useable information?

Social data is tough because we’d like to get beyond simple counts, ‘x people tweeted about the Ford Focus today’, to deeper topics. What are they talking about? Is it new vehicle features? The design? How do they feel about it? There are some NLP tools in this space, but it’s still an area of active research. We also are looking at Crowdsourcing.

Do you have any tips on working with people that rely on intuition and experience over data—is there a happy medium?

Sure. There are many situations where the data is only going to get you so far. It might be because the data is incomplete, the situation has unique elements, or one of a dozen other things, but the key is to integrate the analytics with experience. Many times this can be done with just good conversations.

Ford Data & Analytics team accepting the INFORMS prize for the outstanding widespread use and influence of operations research throughout the company.
Ford Data & Analytics team accepting the INFORMS prize for the outstanding widespread use and influence of operations research throughout the company.

Have you seen a rise in data literacy among marketing professionals and other traditionally non-technical functions since the Big Data discussion entered the mainstream?

I think data literacy has been increasing prior to the Big Data hype. Marketing professionals are now coming out of school with skills in analytics and, more importantly, a greater appreciation for how it can improve the business. A couple of years ago we were doing some work with a newer marketing manager who wanted to see the coefficients to our regression equations. That didn’t happen 10 years ago.

What are some of the main characteristics of predictive data?

I wouldn’t say the data itself is different as much as the analytics are different. It’s usually easier to find data that correlates with metrics than predicting those metrics. This is one of the areas of caution for Big Data, it’s easy to find correlations, but those correlations might not have any predictive power.

How is social data unique among data types?

Social data is rather broad from our perspective. Facebook, Twitter, online reviews / comments / questions all have their unique components. Ultimately, the key is finding some way to find quantifiable metrics. Simple stuff like counts, or likes is sometimes enough, but we’re moving to more sophisticated analysis using NLP and that’s where things get interesting. Text is such an interesting type of data, full of complicated meaning.

What is your advice on measuring ROI from data mining activities similar to those your team performs?

We feel the key to measure ROI is to connect our work to the metrics important to the company. We look at pre- and post to measure ourselves. Did sales increase? How much compared to the same time last year? Many times we use analytics to quantify the ROI of our analytics.