There’s public cloud, private cloud, and now hybrid cloud. There’s good debate on which cloud computing type is best for an organisation and project type. Whichever you choose, Hortonworks Chief Technology Officer, Scott Gnau, thinks hybrid cloud is the architecture of the future. I have a chat with Scott about what this means for the industry in an IoT centric world.
With over 35 years in the big data business, there’s no question about Scott’s passionate for data. Currently, he oversees the company’s engineering product management and support service team. The team looks after everything that the company builds and sells, including the Hortonworks DataFlow (HDF). A scalable, real-time streaming analytics platform, HDF provides key actionable insights. At present, the company is looking towards streaming this information for IoT use cases.
The Challenges of Hybrid Cloud
Many of the IoT related problems today are in collecting, storing and analysing data. Protecting and guaranteeing form the other half of the problem, Scott begins.
Aware of this challenge, Hortonworks broadened their portfolio to incorporate these capabilities into HDF and are trying to expand this across hybrid cloud.
Hybrid cloud is becoming more important today. With multiple people moving to the cloud, be it public or private, a trail of hybrid footprint is being created. Scott thinks the footprints are going to be around for a very, very long time.
He gives a brief of the situation, “Different vendors and different manufacturers will have different cloud footprints that they use. And the whole point behind it is that data is going to live in many different places, across hybrid kinds of footprints – some in cloud, some in multiple public clouds. So it’s unrealistic to think that all of the data will be in one place. With that in mind, there should be a toolset that is required to make that work architecturally.”
IoT Tower of Babel
If hybrid cloud is to take the world by storm, then a common set of standards and infrastructure must be in place. This will be an important enabler for the entire industry, says Scott.
“This is why we’ve been investing heavily in our customers like Apache Atlas and others in the industry. We invest in the community, a consortium which we call an ODPI platform initiative, where we are trying to drive a standards base around better data management etc. The idea is the more commonality that we can create in that space, the more value we can bring to customers who try to collect and connect data. And we do it with open source standards,” he explains.
From an economics perspective, he thinks that creating a common set of standards is cheaper than buying new software. Having certain industry wide standards will also be much more desirable since data can be better consumed.
A connected Smart City is only as good as it can get if there is better data management. Sensors should be able to speak a common language for an easier cloud immigration.
Scott says, “If every device vendor uses their own language then no one knows what they are speaking. But if there’s a standard, a standard API, then those devices can all connect together.”
Weaving the Data Fabric
To avoid a Babel-like downfall, a cohesive data fabric must be built. Here, three success factors are needed.
First, a common set of services in terms of security governance and operational management must be established.
Scott elaborates, “If you’ve got data living in many different places, the last thing you’ll have time to do is to implement several different security strategies for each one. You want to build it once and enforce it across each of those places. That requires common services.”
This advice applies to security, governance and operations.
Second, organisations should strive for intelligent connectivity. Moving all the data onto a single platform is unrealistic. What organisations need to do, is to selectively move data to make for efficient processing. Organisations want to be able to define the data flows in real-time to improve the rate of connectivity.
Application portability is the third important success factor for building a data fabric in a hybrid world. The same irk of having data stored in multiple places applies.
“You don’t want to have to rewrite every application for each storage medium. You want to create a common set of software stack so that you can run the application wherever the data lives,” Scott elucidates.
To the end of these three factors, it is the reason why Hortonworks has built the portfolio they’ve built. It leverages on the best of apache opensource, so that in a connected data-driven world, companies can implement a very flexible data fabric.
Protecting the Connected Dots
Claiming that the standards need to be there is the easy part. But who should be driving the change and laying the law?
Scott thinks enterprises, more than government, are most suited to advice on these standards because of their industry knowhow.
On that note, there is no clear reading of the tea leaves on national regulations for data privacy and security. Only one thing’s for sure, laws like GDPR will evolve and become more stringent. In anticipation of that, data lineage, data management and governance will become more important.
The only way to fully understand the scope of the laws is to have full enforcement over meta data tracking. Understanding the data eases when organisations have knowledge of the content of their data, what the data represents, who can access that data, and where it is hosted so that traceability is possible.
“It is only when you can create that traceability or lineage, that you can be fully compliant,” suggests Scott.
Wrapping up, Scott says, “Over the last two years, I’ve talked about better data management and governance, and in many instances, people thought that was very esoteric kind of conversation. But in fact, it is very realistic. Because if you have that in place, you can immediately guarantee that you are compliant with all these regulations because you have that lineage available to you.
“And so, one of the things we can do to help the industry, within public and private sector, is in enforcing better data standards, so that as regulations change, and they will change consistently, it is very easy to take a look at the data assets available and ensure compliance.”