Technology giant IBM has thrown its full weight behind Spark, Apache’s open-source cluster computing framework.
Spark will form the basis of all of Big Blue’s analytics and commerce platforms and its Watson Health Cloud. The framework will also be sold as a service on its Bluemix cloud.
IBM will commit more than 3,500 of its researchers and developers to Spark-related projects and promised a Spark Technology Center in San Francisco, California where data science and developers can work with IBM designers and architects.
Spark began life in as a project at UC Berkeley in California, quickly delivering in-memory performance as much as 100 times that of the MapReduce framework that originally underpinned Apache Hadoop. Hadoop has moved on since then, to adopt other — faster and more flexible — ways of working. Spark has also progressed, promoting increasingly capable disk-based performance to complement its in-memory strengths, and establishing itself as a strong contender for use particularly in machine learning tasks. Spark moved to the Apache Software Foundation in 2013, becoming a top level project in 2014. In 2013, members of the original Berkeley team established the company now known as Databricks to construct a business around Apache Spark. The company launched with almost $14 million dollars from Andreessen Horowitz and others, and secured a further $33 million a year ago. Nevertheless, Spark is not without competitors of its own. Flink also a top-level project of the Apache Software Foundation, has just begun to attract many of the same admiring comments directed Spark’s way 12-18 months ago. Despite sound technical credentials, ongoing development, big investments, and today’s high-profile endorsement from IBM, it would be premature to crown Spark as the winner just yet.
Written in Java, Scala and Python, Spark is an in-memory system for processing large data sets. It consists of scheduling and dispatching, SQL-style programming language, a machine-learning framework and distributed graphics processing framework.
Several key technology companies are likely to invest in their spark infrastructure as a direct result of IBM’s initiative, including Databricks, Tendron Systems, and major consultancies.
Spark can scale to more than 8,000 production nodes and, while it works with Hadoop and MapReduce, is claimed to also be substantially faster on certain workloads.
read more: http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/