We live in a world awash with data. Internet coverage and connectivity is quickly extending to the far corners of the Earth and it’s becoming increasingly possible to embed a computer in almost anything. In 2011, there were 2B petabytes of data in the digital universe. Last year, it was nearly 5x that number and the growth curve is increasingly non-linear.
The proliferation of data spurred by widespread internet access coupled with the relentless decline in storage and processing costs has sent companies, governments, and NGOs alike hunting for those with the skills to source, manage, and analyze data in order to help them run their organizations more effectively. But, demand has far outstripped supply. This dislocation inspired a new swathe of largely automated, self-service tools for those without data science degrees that abstracted away the complexity of building and maintaining a data analysis stack, from ETL to visualization, enabling users without formal data analytics training to perform fairly advanced analytics as part of their jobs.
But, though the ability to more easily evaluate data has been democratized, data itself has not.
It remains in silos, often tightly permissioned, with inconsistent formats or schemas. It’s difficult to find, harder (and often expensive) to access, normalize, and combine with other datasets, and nearly impossible to efficiently collaborate upon and share work with others either within or outside of one’s organization. The locked, dis-connected nature of the current data paradigm is due in part to the legacy of pre-cloud, pre-open thinking. But, more importantly, it’s because we haven’t seen an open community platform that attracts users beyond a specific niche audience, is host to a comprehensive, industry-agnostic, and ever-growing inventory of datasets, and performs with the stability and power of enterprise-grade tools. Given the magnitude of data that exists across the digital universe, the processing power at our fingertips, and the severity and scope of global challenges that are addressable by data analysis, this status quo cannot persist.
Enter data.world – a platform for quickly and easily discovering, preparing, and sharing data of any type with the purpose of becoming the most meaningful, collaborative, and abundant data resource in the world.
On data.world, a cancer researcher will find unique population data to augment her research while discovering and commenting on the work of her peers. A student taking Introduction to Statistics will discover and use her favorite basketball team’s stats to identify the correlations that drive positive outcomes. A marketer at a Chicago Ventures portfolio company will source contact, location, or census datasets, many of which have been recommended to her by data.world’s discovery algorithm, to build richer customer profiles and optimize re-targeting campaigns. And those are only glimpses of what will be possible.
It’s a grand vision and we’re thrilled to be supporters of it. We started working closely with @databrett, @MattLaessig, @jonloyens, @bryonjacob, and many other data.world team members nine months ago, and while we knew we were investing in an accomplished team, we didn’t anticipate how humble, determined, and authentic to their goal this group would truly be. They’re unreservedly transparent, just like the world of open data they’re willing into existence, and, as evidenced by their public benefit corporation status, wholly committed to building a community that’s far bigger than themselves.
We’d love for you to be a part of it. Head over to data.world to get early access and help push the world of open data forward.