Search
Close this search box.

We are creating some awesome events for you. Kindly bear with us.

Apache SINGA, flexible and scalable deep learning platform, developed by NUS Big Data Systems Team

Apache SINGA

As the volume, variety, and velocity of data continue to reach unprecedented levels, big data analytics has drawn significant interest. According to a recent report, the worldwide enterprise data analytics market is forecasted to grow at a 9.4% annual growth rate through 2018, reaching $59.2B.

Many organizations are keen on adopting big data techniques to analyze huge volumes of data that conventional business intelligence solutions cannot touch, and discover insightful knowledge for better decision making.

Recently deep learning, which extracts high-level abstractions from data, has emerged, and shows great potential for solving business problems. Several many startups are using deep learning techniques in their applications because it is effective for running many tasks.

OpenGov spoke to Ju Fan, Research Fellow, National University of Singapore, and Wei Wang, PhD Student, National University of Singapore, both of which are working on developing a distributed deep learning platform, Apache SINGA.

Apache SINGA is a distributed deep learning platform that entered Apache incubator in March of this year. The project is funded by the National Research Foundation, Ministry of Education, and A*STAR. SINGA is a valuable tool for big data analytics because:

  • It supports various deep learning models, and thus has the flexibility to allow users to customize the models that fit their business requirements
  • It provides a scalable architecture to train deep learning models from huge volumes of data
  • It provides a simple programming model, making the distributed training process transparent to users.

We talked to Ju Fan and Wei Wang about their research in deep learning, how they got into the Apache incubator, case examples, and what challenges them about their research.

The very beginnings…

 “For this project, we started from a research problem on multi-model data retrieval. We were to use different data from different modalities, like image data or text data. Later, I found that deep learning was really effective for extracting features from different modalities,” said Wei Wang.

He then started developing his work with deep learning and has since published papers on algorithms and retrieval problems.

His mentor, Prof Ooi, is an expert on database distributed computing and advised that Wei Wang work on the system part. Deep Learning training is very time consuming because it takes a long time to train a complex model over a large data set.

In order to train the model, Wei Wang used a Stochastic Gradient Descent algorithm, commonly used for deep learning models. This is because it updates the model parameters based on parameter gradients.

Additionally, distributed training had to be applied because datasets can be quite large and models, quite complex. This accelerates the speed of training through the use of more computing resources, which catered to running different training frameworks in a scalable manner.

After they finished the first version of SINGA, Prof Ooi suggested that they try the Apache Incubator and get more people outside NUS to contribute to this project.

Wei Wang submitted his proposal to be included in the incubator and would receive comments from Apache mentors. Within Apache, this project was the only project of its kind, focused on deep learning.

Wei Wang told us that the team is now working on improving the system. “We are working to improve this system in terms of: scalability, efficiency, and the features to support different applications,” he said.

SINGA applied to Healthcare Data Analytics

With the data analytics power of SINGA, Ju Fan explained that they are collaborating with the National University Hospital System to work with data scientists and medical specialists in the healthcare domain.

They would look at data relating to diagnosis, medications, and lab tests results, with the greater aim to reduce the cost of healthcare and improve performance of services.

“The approach we draw knowledge from the healthcare data,” stated Ju Fan, “We are carrying out two applications of SINGA: the first is to predict risk of hospital readmission, and the second is chronic disease progression modelling.”

These two applications show how SINGA is helpful in analysing electronic medical record (EMR) data because:

  • Hospital readmission contributes a significant proportion of healthcare spending, while a large proportion of readmissions are potentially avoidable. Predicting risk of readmission for potentially fatal diseases can effectively yield lower costs and better healthcare quality.
  • Chronic diseases tend to evolve and progress over a long time, and if their conditions are not properly managed, more serious comorbidities as well as complications may ensue. Disease progression modelling can help with the early detection and management of chronic diseases.

 “Working with healthcare analytics is quite challenging because of two things: the data is sparse and personalised medications,” Ju Fan told us, “To address these problems we apply the deep learning techniques because deep learning has a good ability to find the high level abstractions from the raw data.”

The benefits of having such a personalised system are clear, patients would have better treatment, doctors would perform more efficient, and hospitals would be able to reduce the overall cost of treatment.

Going forward, Apache SINGA will continue to develop and improve as SINGA could be useful to other data types and applications. The team is currently working with a local security company on malware detection, using deep learning techniques.

The team behind Apache SINGA will release version 2 of their programming model next month, January 2016.

For more technical details and development schedule, interested readers please refer to http://www.comp.nus.edu.sg/~dbsystem/singa/

Ju Fan received his PhD in computer science from Tsinghua University, China in 2012. He is currently a research fellow in the School of Computing, National University of Singapore. His research interest includes big data analytics, crowdsourcing, and database management.

Wei Wang is a Ph.D. student in the computer science department of the National University of Singapore. Currently, he is working on an Apache incubator project (SINGA) for developing a general distributed deep learning system.

PARTNER

Qlik’s vision is a data-literate world, where everyone can use data and analytics to improve decision-making and solve their most challenging problems. A private company, Qlik offers real-time data integration and analytics solutions, powered by Qlik Cloud, to close the gaps between data, insights and action. By transforming data into Active Intelligence, businesses can drive better decisions, improve revenue and profitability, and optimize customer relationships. Qlik serves more than 38,000 active customers in over 100 countries.

PARTNER

CTC Global Singapore, a premier end-to-end IT solutions provider, is a fully owned subsidiary of ITOCHU Techno-Solutions Corporation (CTC) and ITOCHU Corporation.

Since 1972, CTC has established itself as one of the country’s top IT solutions providers. With 50 years of experience, headed by an experienced management team and staffed by over 200 qualified IT professionals, we support organizations with integrated IT solutions expertise in Autonomous IT, Cyber Security, Digital Transformation, Enterprise Cloud Infrastructure, Workplace Modernization and Professional Services.

Well-known for our strengths in system integration and consultation, CTC Global proves to be the preferred IT outsourcing destination for organizations all over Singapore today.

PARTNER

Planview has one mission: to build the future of connected work. Our solutions enable organizations to connect the business from ideas to impact, empowering companies to accelerate the achievement of what matters most. Planview’s full spectrum of Portfolio Management and Work Management solutions creates an organizational focus on the strategic outcomes that matter and empowers teams to deliver their best work, no matter how they work. The comprehensive Planview platform and enterprise success model enables customers to deliver innovative, competitive products, services, and customer experiences. Headquartered in Austin, Texas, with locations around the world, Planview has more than 1,300 employees supporting 4,500 customers and 2.6 million users worldwide. For more information, visit www.planview.com.

SUPPORTING ORGANISATION

SIRIM is a premier industrial research and technology organisation in Malaysia, wholly-owned by the Minister​ of Finance Incorporated. With over forty years of experience and expertise, SIRIM is mandated as the machinery for research and technology development, and the national champion of quality. SIRIM has always played a major role in the development of the country’s private sector. By tapping into our expertise and knowledge base, we focus on developing new technologies and improvements in the manufacturing, technology and services sectors. We nurture Small Medium Enterprises (SME) growth with solutions for technology penetration and upgrading, making it an ideal technology partner for SMEs.

PARTNER

HashiCorp provides infrastructure automation software for multi-cloud environments, enabling enterprises to unlock a common cloud operating model to provision, secure, connect, and run any application on any infrastructure. HashiCorp tools allow organizations to deliver applications faster by helping enterprises transition from manual processes and ITIL practices to self-service automation and DevOps practices. 

PARTNER

IBM is a leading global hybrid cloud and AI, and business services provider. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. Nearly 3,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM’s hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM’s breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and business services deliver open and flexible options to our clients. All of this is backed by IBM’s legendary commitment to trust, transparency, responsibility, inclusivity and service.