Spark Illuminated: The irony of the story (Microsoft, Hadoop and Dryad)

Microsoft's relationship with Hadoop was for a long time ambiguous: from a rumor about "Hadoop on Azure" (back in 2008) to "never!" to "We will build our own" and finally to "Try Hadoop on Azure today!"

Meanwhile, in the MS labs, some very bright people were working on Dryad, "the Hadoop killer." Dryad is a pretty forest nymph, and one can guess that Dryad the software was intended to be as pliable and nimble. Microsoft allowed to publish the paper on Dryad and killed the project.

And then, literally in the last two years, there appeared Spark, and if Hadoop and Big Data were hot, then Spark is many times hotter - if you measure by the number of committers and commits for the project's repository. While people are preparing training courses and talking about the reason why Spark is so hot, the supreme irony just cannot escape my eyes: Spark inherits so many ideas from Dryad that it can be called the open source implementation of Dryad, much as Hadoop is the open source implementation of Google's MapReduce.

So what happened here? Microsoft had a Hadoop killer but killed it? The simple logical explanation could be this: to be a "Hadoop killer," Dryad needed to be an open source community project. Then it would fit in within the ecosystem of Big Data. And back then Microsoft was not into open source.

Of course, Spark is not really a "killer," it is rather another great tool in the Big Data universe, as evidenced by the Hadoop distros simply embracing and including it. But to be leading company for its adoption could have been nice for Microsoft.

Would you have another explanation?

Spark Illuminated

Sunday, March 8, 2015

The irony of the story (Microsoft, Hadoop and Dryad)

No comments:

Post a Comment