6 Ways it Can Make Sense to Crowdsource Data Projects


Since there are not enough skilled data scientists out there to meet the growing needs of business, many businesses are focusing on accessing great insights by using outside parties. Crowdsourcing, one example of which is data science contests, allows you to get a project off the ground in an organized and affordable way.

  • Competing for Data Glory
  • Good Causes
  • Reason #1: Better algorithm or model
  • Reason #2: Broad interaction with the outside world
  • Reason #3: Do something no one thinks can happen
  • Reason #4: Find your data sicence whiz
  • Reason #5: Get your study out in the open
  • Reason #6: Give fodder to budding scientists

Competing for Data Glory

Data science contests are becoming more popular, and they adjust to focus on possible solutions to different problems over time. These competitions are used generally for learning, development of the analytical model, and to engage with potential new hires.

Small businesses find these contests helpful because data science is an area in which they need specific skills but can’t always budget for a staff position. However, companies of all sizes find them useful since there is such an influx of data science energy and ideas.

With regards to infrastructure, cloud computing forms the basis of many data science projects. It makes it much more affordable to process huge data sets, is often faster than a typical supercomputer, and is immediately available.  For any sophisticated project like this, companies are starting to understand the importance of choosing true cloud hosting:

  1. Distributed rather than centralized storage for no single point of failure and no bottlecks.
  2. InfiniBand rather than 10 GigE for always-zero packet loss and practically no jitter.

Good Causes

Booz Allen Hamilton and Kaggle (the latter a data science group) organized the second $200,000 Data Science Bowl, and it is taking place right now. Last year the event focused on cleaning up the oceans. This year it focuses on heart disease. The contest “challenges you to create an algorithm to automate a heart function assessment process,” notes the official site. “With it comes the opportunity for the data science community to take action to transform how we diagnose heart disease.”

The projects at these contests are becoming more elaborate all the time. Data is no longer simply text but now often includes images and speech files. Generally the problems are becoming more challenging to solve.

Yelp wanted to figure out where health-code violations would likely occur over the upcoming 45 days. They set up a contest in conjunction with DrivenData. Judgment was completely fact-based: they compared the actual results with the data projects submitted. In assessing the top algorithm against local data, they determined that the City of Boston would be able to cut their inspections significantly – as much as in half – using the predictive model.

Contests and other crowdsourcing methods are attracting more people all the time. Why?

Reason #1: Better algorithm or model

Contests provide a way for companies to come up with a better algorithm or model for better results, explains Conversion Logic data scientist Jeong-Yoon Lee. “In the competitions I’ve participated in, 100% of the time you see that the solution [arising out of] the competition outperforms the benchmarks provided by large corporations, no matter how talented their in-house data science team is,” he says.

Reason #2: Broad interaction with the outside world

It isn’t easy to recruit the best data scientists. There just aren’t many available. Plus, it’s nearly impossible to continue having an open mind toward a long-running challenge. In a contest scenario, you get a chance to go in and see how a huge group of people with eclectic perspectives can see the puzzle differently.

Reason #3: Do something no one thinks can happen

People often want to enter contests when it’s an open-ended issue for which there is no known solution. However, the projects that are particularly tough to approach are likelier to attract more seasoned pros than people just getting started, who might be intimidated by lofty project parameters.

Reason #4: Find your data sicence whiz

Often these contests are helpful to find those “whiz kids” (and veteran all-stars) of the data science world.

“Data science competitions are a way to get access to people who you might not know how to identify or you don’t have full-time requirements for,” notes Kaggle CEO Anthony Goldbloom.

The participants get ranked at many contests, which provides a strong sense of competition.

Reason #5: Get your study out in the open

Contests are a quick route to get recognition for your research and to gather new ideas for it. Essentially, the contest can determine if your algorithm is strong. If it tests well, that will often result in acceptance of your approach – as has occurred with some machine learning methods.

Reason #6: Give fodder to budding scientists

“We see tons of interest from students and other people [who] are interested in data science, because one of the biggest hurdles to developing your skills is getting access to real data sets,” notes Greg Lipstein, cofounder of DrivenData.

More from Jerry Whitehead

10 Reasons for ISO 9001 Certification: Showboating vs. Integrity

Why are we certified for ISO 9001:2008, and why might going through...
Read More
Loading Facebook Comments ...
Loading Disqus Comments ...

Leave a Reply