How to spot a data scientist

The hottest job in tech has vague definitions, but the skills and aptitudes needed are clear. What does the future hold for this rapidly emerging field?

abstract head representing big data

In a big data world, data scientists are all the rage.

Private sector companies like Facebook, Google and LinkedIn have benefitted immensely from their teams of data scientists, and the public sector's buzz about big data and analytics has many top IT officials touting the need for similar talent to drive their mission successes.

Companies are hiring them in record numbersHarvard Business Review cited a 15,000 percent increase in data scientist job openings in 2012 over 2011 and called the position the "sexiest of the 21st century." But who are these new kids on the IT block?

Data scientists ask questions

Josh Sullivan, a data scientist and vice president at Booz Allen Hamilton, is rarely content with an answer. In kindergarten, he was the kid who didn't just learn the alphabet, but learned why he needed to learn the alphabet in the first place.

Josh Sullivan

Josh Sullivan

That inherent curiosity is precisely what allowed Sullivan to turn his education as a computer scientist into a career that puts him and the data scientists he hires in the same room as big-time federal and private sector clients looking to solve large-scale data analytics issues.

"The whole point of data scientists is to fundamentally ask big questions of the data you have," Sullivan said. "Your job is to think about gaps in the data, to question it, and how to combine it in different ways to create value that someone else can then analyze. Data scientists usually have math, tech and stats skills, but the X-factor for data scientists is intellectual curiosity – to have that vision of what the data might tell me." 

Anna Smith, now a data scientist at Bitly, spent about a decade of her life studying physics and how the universe works. Now she examines the relationships between content and user clicks from Bitly's Manhattan office, watching how stories like those of the recent Boston Marathon bombings unfold through a complex mesh of social media, clicks and chatter.

Smith is part of seven-person team that includes a chief scientist, two data scientists, two data engineers and an analyst. She works on short- and long-term projects -- everything from predicting what content will draw user clicks to using Bitly data to explore zombie trends.

It can be research-intensive – "hence why a lot of people I work with are grad school dropouts," Smith said – but success is based on formulating the right questions.

"The idea is that we have all this data, how can we make it more powerful and enable it to be something useful for our customers, or even just understand fully how data is interacting with each other," said Smith, who has a particular interest in the role social media plays in driving content.

"It's all about making life better for everyone," Smith said. "I get people who come to me and ask how they can become a data scientist. You just go out there, find some data you're interested in, take a look at it and see what happens."

Anna Smith

Anna Smith

Data scientists have skills

Curiosity may be the X-factor that separates data deities from more common analytics geeks and gurus, but data scientists definitely require a slew of skills.

Before she became a data scientist, Smith studied physics and took courses in machine learning, algorithms and artificial intelligence, and previously worked with large data sets of cosmological information. She also worked on reducing the complexity of transactional information for a company in Beijing.

And while Smith's resume may be unique, those underlying skills will sound familiar to almost all data scientists.

Most have backgrounds in mathematics, advanced computing, coding, visualization, data warehousing, statistics and related arenas.

Sullivan's career has roots in computer science, with a Bachelor of Science degree in computer science, a Master of Science degree in IT and a Ph.D. in applied computer science, all of which initially landed him a staff engineering job with the federal government.

His early interest in big data and Hadoop – he founded and leads the Washington, D.C.-area Hadoop User Group – highlights tools data scientists either must know when they land a job, or need to pick up fast.

The tools data scientists use can be as simple as pencil and paper or a complex as those that help tame and process big data, like Hadoop, the MapReduce framework for data management and data visualization tools. Smith said she didn't have much experience using Hadoop or MapReduce, but she was a quick study.

In all likelihood, Sullivan said, data scientists will also need to learn algorithms for data mining, Structured Query Language (SQL) and NoSQL technology, basic statistical modeling and non-linear progression.

These skills are hard to come by, however, with few major universities offering anything close to a full-blown data science curriculum. In the future, expanded academic programs might produce more data scientists, and assuming enough demand, might allow some to work in specialized environments, but most data scientists now have multiple areas of focus.

"It's hard to find people with all the right skills where they can go into an environment, drown themselves in data and come up with actionable things," Smith said. "Sometimes, you have to look in odd places -- there is no data science discipline that churns them out."

Data scientists run in packs

There are data scientists that operate like lone wolves, but most become part of teams, Sullivan said.
Data scientists are teamed with analysts and people who manage the mission, combining the skillsets of coders, data miners and algorithm junkies under mission-driven individuals who manage risk. Data scientists ask the questions and frame how the data can be manipulated, analysts interpret results -- often using modeling, data mining and visualizations -- and the business folks make sure it comes in at budget and stays focused on the mission.

Sullivan said data science teams in the public sector are typically fewer than 15 people, and personnel are generally rotated in and out every six to nine months to bring new insights onto the team.

At Bitly, Smith said the setup is unique, in that the data scientists "are off in our corner hacking the data," while the engineering and business teams work on their own initiatives. An analyst, she said, works between groups, creating visualizations like dashboards.

"I think right now, because it's a career field in its infancy, it lends to people who do a little of everything, and I like that," Smith said. "There's such a breadth of things I can do."

Still 'sexy' in the future? 

The future of big data and data science is not yet clear, but it's a safe bet that the demand for data scientists isn't going to continue increasing 15,000 percent each year.

Sullivan said data scientists are probably going to be a hot commodity for the next five to 10 years, and thinks the federal government – the biggest data producer in the world – will transition from contracting data scientists to growing their own. There is massive potential data scientists to help achieve real big data analytics successes in weather, healthcare and fraud data, and Sullivan said we've barely scratched the surface of what's possible so far.

At some point, Sullivan said insightful individuals may create new algorithms that combine complex analytics and "technology will commoditize analytics." If that happens – analytics for the masses – clever data scientists might actually think themselves out of jobs, but until that happens, expect to hear a lot more about data scientists. They are the new kids on the data block, but they're not leaving any time soon.

"The variety of data is exploding, the volume is exploding, and that's all new for us," Sullivan said. "In five years, we will probably be used to ever-increasing volumes of data, but right now, we're in that explosion period grappling with how to deal with it. Data scientists are going to play a big role in how to deal with it."

NEXT STORY: Burwell confirmed for OMB

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.