We are creating some awesome events for you. Kindly bear with us.

A*STAR Innovates to Safeguard Industry 4.0

ASTAR Innovation for Cybersecurity and Anomaly Detection

The Agency of Science, Technology and Research (A*STAR) had recently published a newsletter on the innovation efforts surrounding the “Building Blocks of Industry 4.0”. Here are two innovations by A*STAR’s scientists on enhancing cybersecurity and anomaly detection.


Researchers from A*STAR have developed a more efficient machine learning technique for identifying Android malware.

Li Zhang, former Research Scientist at A*STAR’s Institute for Infocomm Research (I2R) and currently with ST Engineering had collaborated with his colleagues to create this new method.

This method combines two techniques: n-gram analysis and online classifiers.

It works by using part of the application code to produce n-grams. N-grams are the “fingerprints” of the application, containing detailed information about it.

This followed by a classifier algorithm automatically assigning a score to the component parts of the fingerprint which are known as sub-fingerprints.

The score will reflect how close the sub-fingerprint resembles malware. Individual classifiers are used for specific categories of information within Android.

Zhang explained that this increases accuracy of classification and allows for a quicker pace for training the model.

The model is also able to modify itself based on new training samples while still remembering and retaining the information gained from the previous datasets.

This new method was applied to more than 10,000 application samples. The result was achieving a 99.2 percent accuracy of malware detection.

A test on more than 70,000 samples from a real-world dataset saw an 86.2 percent accuracy of malware detection.

Outstandingly, it had also achieved a 98.8 percent accuracy in a test conducted on the top 23 malware families of the Derbin dataset (a detailed and well-explained library of Android malware).

Zhang validated that the framework of their model will be highly beneficial to security analysts or antivirus developers to cope better with the constantly growing and changing malware.

With it also being linear and lightweight, it can be installed into phones to ensure the real-time protection of android users.

Zhang and his fellow colleagues are now looking into improving the runtime behaviours Android applications. This will enhance the accuracy of malware classification.

Anomaly Detection

The gist of anomaly detection, as the name says it, is to spot any abnormalities or suspicious activities within an organisation.

Each user’s activity sees the use of large and multi-dimensional datasets. This as a result requires more complex machine learning techniques such as generative adversarial networks (GAN) for accurately detecting anomalies.

GANs consist of two competing networks which are a generator and a discriminator.

The generator produces new data that almost exactly mirrors real-world data from random latent codes. The discriminator will differentiate between real-world data and those produced by the generator.

A GAN-based anomaly detection method works such that normal data on which the GAN is trained on can be accurately reconstructed but not for anomalous data.

Therefore, a team of A*STAR scientises have worked on using a class of GANs that learns an encoder network and predict the relevant ransom latent code for a data sample at the same time.

This in return allows for faster anomaly detection by skipping the step of optimising to find the code.

Chuan Sheng Foo of the Institute for Infocomm Research (I2R) said that the GAN can identify anomalies by calculating the threshold value based on a new anomaly score that quantifies the distance between the original samples and their reconstructions.

The higher the score, the more anomalies they were.

An Adversarially Learned Anomaly Detection (ALAD) method, this method proved to work better than other anomaly detection methods.

The team is now looking into ways for applying ALAD to time-series data such as sensor data from machines.

Send this to a friend