Technology Industry News: Hadoop

Showing posts with label Hadoop. Show all posts

Tuesday, 17 May 2016

Can IT keep up with big data?

Though IT and its functions and responsibilities have changed over the years, there's one area that remains consistent: IT primarily focuses on major enterprise applications and on large machines—whether they are mainframes or super servers.

When IT deals with big data, the primary arena for it is, once again, large servers that are parallel processing in a Hadoop environment. Thankfully for the company at large, IT also focuses on reliability, security, governance, failover, and performance of data and apps—because if it didn't, there would be nobody else internally to do the job that is required. Within this environment, IT's job is most heavily focused on the structured transactions that come in daily from order, manufacturing, purchasing, service, and administrative systems that keep the enterprise running. In this environment, analytics, unstructured data and smaller servers in end user departments are still secondary.

Is Hadoop losing its spark?

A 2015 survey by Gartner Inc. revealed that only 18 percent of respondents expressed their desire to either try out or adopt Hadoop in the next few years. However, this report is not the only one which suggested that Hadoop’s star is fading.

Newer big data frameworks such as Spark have started to gain momentum and, according to the Apache Software foundation, companies are running Spark on clusters of thousands of nodes, which the biggest cluster encompassing nearly 8,000 nodes. Although many people rushed into writing Hadoop’s obituary, market research firm MarketAnalysis.com announced in its June 2015 report that the Hadoop market was projected to grow at an annual rate of 58 percent, surpassing $1 billion by the year 2020.

The future of big data is very, very fast

There are only two certainties in big data today: It won't look like yesterday's data infrastructure, and it'll be very, very fast.

This latter trend is evident in the rise of Apache Spark and real-time analytics engines, but it's also clear from the parallel rise of real-time transactional databases (NoSQL). The former is all about lightning-fast data processing, while the latter takes care of equally fast data storage and updates.

The two together combine to "tackle workloads hitherto impossible," as Aerospike vice president Peter Goldmacher told me in an interview.

Happy Birthday, Hadoop: Celebrating 10 Years of Improbable Growth

It’s hard to believe, but the first Hadoop cluster went into production at Yahoo 10 years ago today. What began as an experiment in distributed computing for an Internet search engine has turned into a global phenomenon and a focal point for a big data ecosystem driving billions in spending. Here are some thoughts on the big yellow elephant’s milestone from the people involved in Hadoop’s early days.

Hadoop’s story started before January 2006, of course. In the early 2000s, Doug Cutting, who created the Apache Lucene search engine, was working with Mike Cafarella to build a more scalable search engine called Nutch. They found inspiration in the Google File System white paper, which Cutting and Cafarella used as a model. Cutting and Cafarella built the Nutch Distributed File System in 2004, and then built a MapReduce framework to sit atop it a year later.

The software was promising, says Cutting, who is now the chief architect at Cloudera, but they needed some outside support. “I was worried that, if the two of us working on it then, Mike Cafarella & I, didn’t get substantial help, then the entire effort might fizzle and be forgotten,” Cutting tells Datanami via email. “We found help in Yahoo, who I started working for in early 2006. Yahoo dedicated a large team to Hadoop and, after a year or so of investment, we at last had a system that was broadly usable.”

On January 28, 2006, the first Nutch (as it was then known) cluster went live at Yahoo. Sean Suchter ran the Web search engineering team at Yahoo and was the first alpha user for the technology that would become Hadoop. Suchter, who is the founder and CEO of Hadoop performance management tool provider Pepperdata, remembers those early days.

Read More: http://www.datanami.com/2016/01/28/happy-birthday-hadoop-celebrating-10-years-of-improbable-growth/

Monday, 1 February 2016

Hadoop turns 10, Big Data industry rolls along

It's hard to believe, but it's true. The Apache Hadoop project, the open source implementation of Google's File System (GFS) and MapReduce execution engine, turned 10 this week.

The technology, originally part of Apache Nutch, an even older open source project for Web crawling, was separated out into its own project in 2006, when a team at Yahoo was dispatched to accelerate its development.

Proud dad weighs in

Doug Cutting, founder of both projects (as well as Apache Lucene), formerly of Yahoo, and presently Chief Architect at Cloudera, wrote a blog post commemorating the birthday of the project, named after his son's stuffed elephant toy.

In his post, Cutting correctly points out that "Traditional enterprise RDBMS software now has competition: open source, big data software." The database industry had been in real stasis for well over a decade. Hadoop and NoSQL changed that, and got the incumbent vendors off their duffs and back in the business of refreshing their products with major new features.

Read More: http://www.zdnet.com/article/hadoop-turns-10-big-data-industry-rolls-along/

Technology Industry News