Dr. John L. Gustafson is an applied mathematician and computer scientist. A pioneer in high-performance computing, he introduced cluster computing in 1985 and demonstrated scalable massively parallel performance on real applications in 1988 for the first time.
Currently he is a Visiting Professor at A*STAR-CRC (Agency for Science, Technology and Research- Computational Resource Centre) and at the National University of Singapore. He is a former Director at Intel Labs and former Chief Product Architect at AMD.
Dr. Gustafson proposed the Unum (Universal Numbers) number format as an alternative to the commonly used IEEE 754 format. He provided an introduction to Unums in his presentation at EmTech Asia 2017.
The Unum format has a “ubit” (rhymes with “cubit”), which determines whether the Unum corresponds to an exact number (ubit=0), or an interval between consecutive exact Unums (ubit=1), thereby covering the entire extended real number line.
Basically, the Unum format would specify that pi lies between 3.14 and 3.15, instead of rounding it to 3.14. In his presentation, Dr. Gustafson gave an example of how rounding errors can pile up. You divide 1 by 3 on a standard calculator, and you get 0.333…. You multiply the result by 3, you get 0.999… instead of 1. The errors can build up resulting in unreliable results with very real implications, such as the Pentium Divide Bug (loss of $475 million), the sinking of the Sleipner A offshore platform (loss of $700 million) and the explosion of the Ariane 5 launcher on its maiden flight.
According to Dr. Gustafson, Unums not only provide better answers but require fewer bits for storage and consume lower energy. Unum arithmetic has the potential to render current computer math obsolete. However, it won’t be easy switch due to the ubiquity of the IEEE 754 standard and the resulting inertia and resistance.
Prof. Gustafon discussed his work with OpenGov and talked about the slowing down of Moore’s Law and the current High Performing Computing (HPC) landscape.
What are you working on at A*STAR right now?
I am working on next-generation computer arithmetic, new ways to calculate, different from the ways we have been doing it for a long time. I am looking at ways to save energy and get better answers at the same time, by relooking at things we haven’t touched because we have assumed that we will always continue to do it in a certain way. It doesn’t have to be that way.
How is the uptake of the Unum format?
There has been a recent breakthrough. We had a hard time building hardware for it until only about 4 or 5 weeks ago. Then I realised that if I relaxed one of the design constraints very slightly everything suddenly fell into place and became even easier to build than the 1985 standard. So, it will actually take up less space on the chip than the floating-point math we use now.
People store 64 bit numbers, because they think they might need that much accuracy. Almost never do you need that much accuracy.
The analogy I give people is that it is like insuring your car for two million dollars and then you complain how high your insurance premiums are. You probably don’t need to insure your car for two million dollars. It is not worth that much. Similarly, if you insure your calculations with 15 decimals of precision everywhere all the time, then you are going to pay a severe price, in terms of how long it takes to run, how much storage it uses, how much the archives take up on disk, bandwidth. Everything is made harder to do, by that kind of over-insurance.
How is the processing speed for Unums as compared to floats?
The new breakthrough will be faster than floats. It will be superior in every respect.
But you have to understand that there are two ways to do computing, one is where it is pretty good and it is fast, and it is good enough for what you want to do, like a video game. You don’t mind if occasionally a pixel is wrong. It goes by at 60 frames per second.
But if you are trying to do something like design a nuclear reactor, you might want to make sure that you have got it right. You might want something as mathematically valid and provable. That’s a completely different mode of operation. That calls for a different kind of number.
I have covered both bases. The numbers that are fast and good enough but better than floats, I call them ‘posits’. Like you posit an idea as a kind of guess as to what might be true.
‘Valids’ are the other kind of number. If your numbers are valid, it is a rigorous mathematical bound for the true answer. It will take more time to calculate, it will run slower. Maybe you do that when you are debugging or you just do it for a mission-critical application. But a lot of the production for which we currently use floating point arithmetic, that will be completely superseded I believe by posit arithmetic.
It is surprisingly moving forward very fast. I have run into a number of geniuses in the silicon valley area and one of them is writing Verilog (a hardware description language used to model electronic systems, most commonly used in the design and verification of digital circuits.) He keeps sending me circuit diagrams for the new kind of arithmetic. There are several companies that have approached me about when can they start writing the code for this and building a chip. Because they can see the writing on the wall. This is going to replace floating point.
I was talking to the Square Kilometer Array team in Australia, who are doing radio computing. They are looking at using something better than 64-bit storage for all their images. They use a vast amount of storage, costing them millions of dollars. They looked at my format and realised it could save them two million dollars. Two million dollars at one site. Multiply it by the number of all the data centres in the world. CERN has extremely large data requirements for all their experiments and they spend many millions for that. This format could save them a lot of money as well.
What are the main trends that you see in HPC?
As Moore’s law is starting to slow down, people are starting to think a little bit harder about how they are doing calculations. Not just expecting to turn the crank and build more processors in a server room. I am seeing a lot more careful thinking and I am glad that we finally arrived at that. People are looking more at the applications and saying, “what are we really trying to calculate here?” and “is this trip necessary?” for everything that we are doing.
We cannot continue with writing sloppy code and expect to make up for that by increasing processor speed. If you cannot improve the hardware, then you have to take another look at the code and see if we can’t squeeze some more cycles out of that.
Of course, there’s a lot of excitement about quantum computing and what really comes after CMOS transistors and conventional lithography. That’s intriguing as well. That will be a wrenching change. But it is the next best hope we have for a really big breakthrough.
What will be the impact of these developments on real-life applications?
There will be a big impact on artificial intelligence (AI). We have talked about AI for so long. Science fiction people have been envisioning this since before there were computers. But now that we have finally reached a point where we can teach a computer to distinguish between a dog and a cat, then the Googles of the world are training their systems to have more and more human-like intelligence.
That’s just going to pervade everything that we do. It is also going to create a big demand for a lot more clock cycles to do the training. The new number format is going to fit right into that. Because it is perfectly crafted for what you need to do to teach a neural net.
What do you see happening in the HPC scene in Singapore?
Singapore has so much potential. The latest petaflop computer is saturated. They put it in. It immediately filled up. It’s got like 80% utilisation. Some parts of it have 100% utilisation. Which says you need to build another one. And you have to build it as soon as possible and make it five times faster. The demand for these cycles and what people are getting out of it, the biological studies, the applications for medicine, are just huge. The impact could be very big on Singapore’s business and on citizens’ well-being in general.
Here I see an amazing capacity to set their mind to something, and then succeeding at it. Singapore. is going to want to be up with the big boys, China, US, and Japan and say we can build advanced computers too. If they keep up that mentality and they continue to have the great support of the government that they have had so far, they may surprise people.
Everybody’s limited by how many watts they can get into the data centre. You get 20x just from doing better architecture. Then you get between 2 and 10x just from using my number system. Put those two together and you are so far beyond anything’s anyone ever seen in a computer even for let’s say 10 million dollars we could beat the fastest supercomputer in the world. It’s not out of the question.
Where is the innovation going to come from? Would it come from start-ups or large technology companies or government or academia?
All of them I think. Usually the least innovative are the established companies. They have everything to lose and very little to gain. Usually they have a lab where there is research going on. If they get too uppity and start thinking, “I have a great idea here; let’s change the direction of the company,” then the antibodies squash that down. “You are getting in the way of our main business. And you are wasting resources!”
So they tend not to productise some of their great ideas, even when they are given a sandbox to play in. It’s more likely that a big company has to be overcome by a smaller company that disrupts them, one that takes the chance. So, they come up with something that is incompatible with all the software that’s out there. A few people adopt it and suddenly they see market share going to this small company.
It’s like that with countries, too. A small country like Singapore can do highly innovative things that the largest countries might balk at. This new “drop-in” replacement for floating-point numbers was made possible because of the encouragement of A*STAR to pursue such a radical change.
When I was in the USA, the reaction I got to my attempts to improve computer arithmetic was, “Well, you can’t boil the ocean.” The attitude in Singapore is much more visionary, and for that I am grateful!