Search
Close this search box.

We are creating some awesome events for you. Kindly bear with us.

Hadoop Founder Doug Cutting on his Journey, Addressing the Growing Skills Gap, and Smart Cities grappling with Data

Hadoop Founder Doug Cutting on his Journey

During Strata + Hadoop World, OpenGov sat down with Mr. Doug Cutting, Founder of Hadoop and current Chief Architect at Cloudera, to discuss how he never imagined what Hadoop would produce, some of the most heartwarming examples of data analytics use, and what we must do to combat the growing skills gap in the industry.

With this exclusive opportunity to sit down with the man behind Hadoop -who is as humble, as he is intelligent- we had a lot of questions to ask.

How did you start this journey?

I got in by chance as much as anything, more or less. I needed a job, liked programming, and landed a job looking into search problems. I knew some people who hired me to invent some things, and I developed an understanding of how to cope with large amounts of data. Then, I subsequently got involved in open source.

When Google published some papers about the way they were doing things internally, managing their data systems, I had the experience to see that they were really better methods. I had been working on building search engines myself and I knew this was a big step up from what I was doing. I had enough experience with open source to recognize that if these methods were available as open source they would probably be widely adopted.

So I put two and two together, and started implementing these as open source, which became Hadoop a year later. There wasn’t really a grand design, I just happened to be the right guy with the right information at the right time.

What did you imagine Hadoop would become?

I never imagined Hadoop would be what it is now. I had grown up in a world where enterprise software was very different from what researchers used and websites used. There is a different universe of software development and style of building systems. Enterprises are based on relational databases running on big iron, mainframes, where researchers tended to have PCs and work stations.

I think I did not ever contemplate that the two worlds might merge. I thought that I could build some technology that would have a great impact on the research and web sphere, but it would not likely leave that. Now we have seen enterprises are really adopting open source, adopting unix, and in a much higher degree than I ever would have guessed.

In retrospect, I guess this is not so surprising, if you look at Moore’s Law that technology is pervading every industry. Data is permitting institutions to better understand themselves, their users, the context, and improve. Now we really see, data is driving growth in about every industry. I am very pleased to see that something I worked on is enjoying so much use, but this was not my original plan.

There is a growing skills gap in this industry, how would you propose to address this issue?

The adoption of the technology can’t grow any faster than there are people to use it. We are seeing that as a limiting factor for growth. We also see institutions have a lot of other reasons for not adopting new technology. Institutions may evolve slowly and to adopt this platform requires a lot of change, in many cases. Especially cultural change.

So far, all of those things are paced together, the rate new people are learning and the rate that culture and institutions are changing to adopt these new technologies. In some ways, we do not want it to be too fast so we fall on our face. Having some moderate pace is great.

It is important that people get more trained on this. Cloudera has a program to work with universities. We provide a curriculum so that they can teach students, and they come out of college familiar with these techniques. We are working with over 100 universities worldwide and eager to add more to this program.

These days, people are starting to learn about these new technologies anyway. It is the technology that people are becoming familiar with, so to some degree is generational. Some people will learn new technologies in the course of their careers and some people won’t. But the next generation will have those skills.

I do not think this is a fatal problem, I think public institutions will find people although it may be more difficult. I don’t think it is a unique problem for these new technologies, but we are working as best we can and offer training from the very beginning, as a strategy to help the technology spread. Cloudera has helped over 40,000 people so far in using these tools.

How will the public sector get more organisations on board with open data initiatives?

I think it is really important that organisations have buy-in from the top down to tell them that data is really valuable and can really help them improve.

 They need to start taking advantage of it, thinking about it, planning around it, and think about the policies about data. What are the ethics for appropriate use of data? For private and public organisations? How can they make people trust them?

Do you think that this top-down approach is best?

It is essential that you have that, it is necessary but not sufficient. You also need people who are familiar with these technologies and who understand it. I think it helps so much, coming from the top. Everyone I have met here in Singapore, and from neighboring countries, seem to be understanding this and taking it very seriously.

Are these open data initiatives and policies integral to the success of Smart City programmes?

 It is hard to say that but it is certainly the smart way to operate. If you are trying to build a Smart City or Smart Nation, you want to try and take as many advantages as you can. When a government operates openly, it operates more efficiently and more effectively.

I think it is very important that governments open up all of their processes as governments. It also permits more value to be extracted from the data when it is open. It allows processes to improve, with help from private sector.

What would you advise organisations who hesitate to open up their data?

Security is a technical problem, I think. Now, security goes hand in hand with Hadoop. Now, there are facilities where you can keep your data encrypted at all times and control who can see what.

It can be harder with an open government initiative to decide issues about privacy. What can you publish? Because when you are operating openly, you are intentionally disabling a certain amount of security. You would like most data that you publish to be anonymous because you do not want to reveal private details of someone’s lives. But often times these things just leak out.

While you want to publish data, you want to protect identities. There are a variety of ways to do this, you can anonymise it, you can try to aggregate it and only provide information about groups rather than individuals, or you could have legal controls to prevent sharing to the public. In this situation, the data would be provided to any person or institution that agrees to follow certain rules, be audited to ensure they comply with these rules.

What is the most interesting story you have heard about people using this technology you created?

One project I was very impressed by a children’s hospital in Atlanta. They were gathering data from the neonatal ICU, with premature babies. It was not just that, the way they used this data was not for a big ambitious project. Instead, they just gathered all of the data and asked the nurses what they would like to know.

The nurses had various questions about how quickly the baby’s vitals returned to normal after different procedures. Then, they would try to modify the way they would do these procedures, in order to have less detrimental impact on these children.

It was a really neat study to see. To see the technology that I had worked on, helping at this Children’s Hospital. This was something I could see in person.

Caterpillar Tractor, on the other hand, has huge machines working all over the world. They are transmitting 60 times/second readings, from hundreds of sensors, back to Peoria. They can then analyse how these products are being used, detect when they might run into a problem, and do maintenance before it has a problem.

When I started working in software, I never would have guessed that I would be working on software which would be used in either of these sorts of situations.  It is very exciting to see these things.

PARTNER

Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.

PARTNER

CTC Global Singapore, a premier end-to-end IT solutions provider, is a fully owned subsidiary of ITOCHU Techno-Solutions Corporation (CTC) and ITOCHU Corporation.

Since 1972, CTC has established itself as one of the country’s top IT solutions providers. With 50 years of experience, headed by an experienced management team and staffed by over 200 qualified IT professionals, we support organizations with integrated IT solutions expertise in Autonomous IT, Cyber Security, Digital Transformation, Enterprise Cloud Infrastructure, Workplace Modernization and Professional Services.

Well-known for our strengths in system integration and consultation, CTC Global proves to be the preferred IT outsourcing destination for organizations all over Singapore today.

PARTNER

Planview has one mission: to build the future of connected work. Our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Planview’s full spectrum of Portfolio Management and Work Management solutions creates an organizational focus on the strategic outcomes that matter and empowers teams to deliver their best work, no matter how they work. The comprehensive Planview platform and enterprise success model enables customers to deliver innovative, competitive products, services, and customer experiences. Headquartered in Austin, Texas, with locations around the world, Planview has more than 1,300 employees supporting 4,500 customers and 2.6 million users worldwide. For more information, visit www.planview.com.

SUPPORTING ORGANISATION

SIRIM is a premier industrial research and technology organisation in Malaysia, wholly-owned by the Minister​ of Finance Incorporated. With over forty years of experience and expertise, SIRIM is mandated as the machinery for research and technology development, and the national champion of quality. SIRIM has always played a major role in the development of the country’s private sector. By tapping into our expertise and knowledge base, we focus on developing new technologies and improvements in the manufacturing, technology and services sectors. We nurture Small Medium Enterprises (SME) growth with solutions for technology penetration and upgrading, making it an ideal technology partner for SMEs.

PARTNER

HashiCorp provides infrastructure automation software for multi-cloud environments, enabling enterprises to unlock a common cloud operating model to provision, secure, connect, and run any application on any infrastructure. HashiCorp tools allow organizations to deliver applications faster by helping enterprises transition from manual processes and ITIL practices to self-service automation and DevOps practices. 

PARTNER

IBM is a leading global hybrid cloud and AI, and business services provider. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM’s breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM’s legendary commitment to trust, transparency, responsibility, inclusivity and service.