How to spot a data scientist
- By Frank Konkel
- Apr 24, 2013
In a big data world, data scientists are all the rage.
Private sector companies like Facebook, Google and LinkedIn have benefitted immensely from their teams of data scientists, and the public sector's buzz about big data and analytics has many top IT officials touting the need for similar talent to drive their mission successes.
Companies are hiring them in record numbers – Harvard Business Review cited a 15,000 percent increase in data scientist job openings in 2012 over 2011 and called the position the "sexiest of the 21st century." But who are these new kids on the IT block?
Data scientists ask questions
Josh Sullivan, a data scientist and vice president at Booz Allen Hamilton, is rarely content with an answer. In kindergarten, he was the kid who didn't just learn the alphabet, but learned why he needed to learn the alphabet in the first place.
That inherent curiosity is precisely what allowed Sullivan to turn his education as a computer scientist into a career that puts him and the data scientists he hires in the same room as big-time federal and private sector clients looking to solve large-scale data analytics issues.
"The whole point of data scientists is to fundamentally ask big questions of the data you have," Sullivan said. "Your job is to think about gaps in the data, to question it, and how to combine it in different ways to create value that someone else can then analyze. Data scientists usually have math, tech and stats skills, but the X-factor for data scientists is intellectual curiosity – to have that vision of what the data might tell me."
Anna Smith, now a data scientist at Bitly, spent about a decade of her life studying physics and how the universe works. Now she examines the relationships between content and user clicks from Bitly's Manhattan office, watching how stories like those of the recent Boston Marathon bombings unfold through a complex mesh of social media, clicks and chatter.
Smith is part of seven-person team that includes a chief scientist, two data scientists, two data engineers and an analyst. She works on short- and long-term projects -- everything from predicting what content will draw user clicks to using Bitly data to explore zombie trends.
It can be research-intensive – "hence why a lot of people I work with are grad school dropouts," Smith said – but success is based on formulating the right questions.
"The idea is that we have all this data, how can we make it more powerful and enable it to be something useful for our customers, or even just understand fully how data is interacting with each other," said Smith, who has a particular interest in the role social media plays in driving content.
"It's all about making life better for everyone," Smith said. "I get people who come to me and ask how they can become a data scientist. You just go out there, find some data you're interested in, take a look at it and see what happens."
Data scientists have skills
Curiosity may be the X-factor that separates data deities from more common analytics geeks and gurus, but data scientists definitely require a slew of skills.
Before she became a data scientist, Smith studied physics and took courses in machine learning, algorithms and artificial intelligence, and previously worked with large data sets of cosmological information. She also worked on reducing the complexity of transactional information for a company in Beijing.
And while Smith's resume may be unique, those underlying skills will sound familiar to almost all data scientists.
Most have backgrounds in mathematics, advanced computing, coding, visualization, data warehousing, statistics and related arenas.
Sullivan's career has roots in computer science, with a Bachelor of Science degree in computer science, a Master of Science degree in IT and a Ph.D. in applied computer science, all of which initially landed him a staff engineering job with the federal government.
His early interest in big data and Hadoop – he founded and leads the Washington, D.C.-area Hadoop User Group – highlights tools data scientists either must know when they land a job, or need to pick up fast.
The tools data scientists use can be as simple as pencil and paper or a complex as those that help tame and process big data, like Hadoop, the MapReduce framework for data management and data visualization tools. Smith said she didn't have much experience using Hadoop or MapReduce, but she was a quick study.
In all likelihood, Sullivan said, data scientists will also need to learn algorithms for data mining, Structured Query Language (SQL) and NoSQL technology, basic statistical modeling and non-linear progression.
These skills are hard to come by, however, with few major universities offering anything close to a full-blown data science curriculum. In the future, expanded academic programs might produce more data scientists, and assuming enough demand, might allow some to work in specialized environments, but most data scientists now have multiple areas of focus.
"It's hard to find people with all the right skills where they can go into an environment, drown themselves in data and come up with actionable things," Smith said. "Sometimes, you have to look in odd places -- there is no data science discipline that churns them out."
Data scientists run in packs
There are data scientists that operate like lone wolves, but most become part of teams, Sullivan said.
Data scientists are teamed with analysts and people who manage the mission, combining the skillsets of coders, data miners and algorithm junkies under mission-driven individuals who manage risk. Data scientists ask the questions and frame how the data can be manipulated, analysts interpret results -- often using modeling, data mining and visualizations -- and the business folks make sure it comes in at budget and stays focused on the mission.
Sullivan said data science teams in the public sector are typically fewer than 15 people, and personnel are generally rotated in and out every six to nine months to bring new insights onto the team.
At Bitly, Smith said the setup is unique, in that the data scientists "are off in our corner hacking the data," while the engineering and business teams work on their own initiatives. An analyst, she said, works between groups, creating visualizations like dashboards.
"I think right now, because it's a career field in its infancy, it lends to people who do a little of everything, and I like that," Smith said. "There's such a breadth of things I can do."
Still 'sexy' in the future?
The future of big data and data science is not yet clear, but it's a safe bet that the demand for data scientists isn't going to continue increasing 15,000 percent each year.
Sullivan said data scientists are probably going to be a hot commodity for the next five to 10 years, and thinks the federal government – the biggest data producer in the world – will transition from contracting data scientists to growing their own. There is massive potential data scientists to help achieve real big data analytics successes in weather, healthcare and fraud data, and Sullivan said we've barely scratched the surface of what's possible so far.
At some point, Sullivan said insightful individuals may create new algorithms that combine complex analytics and "technology will commoditize analytics." If that happens – analytics for the masses – clever data scientists might actually think themselves out of jobs, but until that happens, expect to hear a lot more about data scientists. They are the new kids on the data block, but they're not leaving any time soon.
"The variety of data is exploding, the volume is exploding, and that's all new for us," Sullivan said. "In five years, we will probably be used to ever-increasing volumes of data, but right now, we're in that explosion period grappling with how to deal with it. Data scientists are going to play a big role in how to deal with it."