Square Kilometre Array precursor shrinks 5TB of data to 22MB – every second!
Australia's Murchison radio telescope tells Reg how big astronomy 'destroys' big data
Australia's precursor to the Square Kilometre Array has gone from sitting on the slipway to shedding champagne-bottle shards and sliding gracefully into action.
The Australian Square Kilometre Array Pathfinder, ASKAP if you're thrifty with syllables, is doing its first science for a project called WALLABY, an all-sky hydrogen survey, and the CSIRO is very pleased at how things are going.
A curious detail in the CSIRO blog post caught Vulture South's eye: ASKAP is producing 5.2 TB per second of data, which the CSIRO reckons is equivalent to about 15 per cent of the current data rate for the whole of the Internet.
That's bigger, by a nice fat margin, than two things:
- the connection from Murchison to the Pawsey supercomputer centre in Perth;
- the amount of disk available once the data arrives in Perth.
Clearly, there's a serious amount of data reduction happening along the way, so The Register spoke to the CSIRO's David McConnell, System Scientist, Australian Telescope National Facility (ATNF) Operations, CSIRO Astronomy and Space Science.
Let's set the scene a little, first. ASKAP will eventually comprise 36 individual antennas, each of them fitted with phased array feeds to give the antenna a wide field of view.
Each phased array has 188 individual receivers, and McConnell told us each receiver takes 1,500 million samples per second – and pretty quickly you end up with a veritable firehose of data.
Why so much, why so many?
It's all about resolution, McConnell told The Register. What astronomers want, he said, is to put optical images next to radio-telescope images and easily identify the same features. Right now, that's just not possible.
To understand what's going on here, consider the big dish at the Parkes Observatory, currently Australia's largest. “It's a very large dish, 64 m across – but its natural field of view is about 0.25° across, about 1/10 to 1/12 of a square degree”, McConnell explained (that field of view is expanded by a recently-installed multi-beam receiver).
Even in such a small area, he explained, radio sources come in at very low resolution.
“Parkes is close to as big as you can build a single telescope – and it's a very blurry image. If you're trying to compare what you get with optical photos of the sky, it's hopeless,” he said.
“Where you might see 10-100 stars in an optical telescope, you have one big blob in your radio image.”
The images get smeared simply because radio waves have much longer wavelengths than visible light (hydrogen's characteristic radio-line is at 21 centimetres, millions of times longer than visible wavelengths, which we measure in the nanometres).
Lots of smaller telescopes, such as ASKAP's 12-metre wide dishes, allow separate images to be correlated to such an extent that McConnell said a layperson might not notice the difference between an optical image and a radio image.
And compared to the tiny patch of sky seen by Parkes, ASKAP gets its high resolution across “about 30 square degrees – a square 5.5° across.”
The moon, by comparison, is half a degree across.
Beyond ASKAP, he said, the aim is to achieve similar resolution in every wavelength, from radio through infrared, optical, through gamma rays and X-rays.
“The whole of astrophysics is now very broad spectrum,” he explained. Different wavelengths “all give you a different clue of the phys that's going on in the object.”
But you can't take it all with you
With the 12 antennas now in operation, each with 188 receivers in the phased arrays and each taking 1,500 million samples per second and generating 5.2 TB/second – something has to give.
For ASKAP, the first tough decision is this: unlike any other radio telescope in Australia, once all 36 antennas are in operation – it won't be able to keep an archive of the raw data.
“With conventional telescopes like Parkes and Narrabri, you collect the data, store it, archive it, and keep it forever”, he said, and that means as new analysis techniques develop, astronomers can go back to the raw data and re-process it.
And: turning the raw data into images is, right now, how data is reduced to manageable levels.
“There's an enormous amount of high-speed electronics between the initial data stream, and what we write to disk at the Pawsey Centre,” McConnell explained.
“To make the radio image, we need to measure the spatial correlation function in the field. We use pairs of antennas, and look at the same part of the sky from each antenna, and we multiply the two signals together, sample by sample.”
After the multiplication, “we simply add the products for five seconds or 10 seconds.”
That's what makes the image, and that's also what reduces the data. The final product is what's written to disk.
The high-speed electronics McConnell refers to are called – let's not criticise the scientists for being obvious! – correlators.
ASKAP's correlators turn terabytes per second into megabytes per second. Image: CSIRO
Just how much reduction happens here is staggering: “It comes in at 5.2 terabytes per second, and by the time we write it out to disk, it's down to 22 megabytes per second.”
That's a lot of discarded data, and it's not all that goes in the discard pile. Down in Perth, the Pawsey supercomputing facility is getting ready for all of ASKAP to come online, and even with the correlators' terabyte-to-megabyte reduction, they still can't keep everything. That's a separate development effort, McConnell said, with a lot of work going in to try to be ready for a fully-operational ASKAP and later, Australia's full participation in the SKA.
“Our aim is to write processing software that runs in real time, produces the images, and will throw away the “raw” data. That reduction can be improved even further by going through the images, and deciding which ones we want to keep – but that's a step we're not up to yet”, he told The Register. ?