You wanted robo-butlers. Instead, you're getting robo-BOFHs
Machine-learning tech tries to figure out when your servers are about to fail
Interview Park Place Technologies for the past two years has been working with IT services biz BMC to develop a way to augment its data center service business with machine learning.
Chris Adams, chief operating officer of the data center hardware-tending shop, based in Cleveland, Ohio, told The Register in a phone interview the way things usually work is that customers call Park Place when a machine fails.
Part of the way the biz differentiates itself from data center equipment vendors that service their own products is by looking after anyone's kit and offering customers "one throat to choke," as Adams put it.
"It takes us typically eight interactions before we end up resolving the issue with the customer," said Adams. "As we move forward with BMC, instead of the customer calling us, we're going to call the customer when an incident happens. We're going to know before they do."
Machine learning, said Adams, will allow Park Place to anticipate failures before they happen, or at least recognize there's a problem before the customer does. And the end result, he expects, will be a better experience for the customer.
As an aside, we imagine the top cloud players – from Google and Amazon to Facebook and Baidu – use similar artificial intelligence to predict and resolve remote data center failures, and allocate resources. Think 2013's Google powerful Omega and Borg cluster scheduling software, and then add on four years of development. Park Place is not unique in this area: it's an example of this technology available for the rest of us.
Park Place is using BMC's TrueSight system, the latest version of which debuted in October, in conjunction with Sentry Software and its ParkView platform. TrueSight is touted as, you guessed it, AIOps, like devops but with AI. It's a learning system that predicts patterns of usage and failures in server warehouses and manages and deploys resources accordingly.
In the past, Adams explained, BMC developers reviewed data and coded the system to interpret certain events as failures.
"Take a Dell server," said Adams. "It's broadcasting MIBs [Management Information Base files]. Those MIBs are interpreted by Sentry, and somebody is saying based on these three things, when these three things happen and the MIBS say this, we're going to call that a failure."
Adams expects recently introduced AI enhancements in TrueSight will make the process much more efficient. "The machine learning, " he said, "can look at this in ways that a human being cannot. So maybe instead of saying this is a failure today, three months earlier it could say this will fail in three months."
Initial results have been positive, apparently. Adams said in the company's ParkView test environment, about 750 pieces of data center hardware generated 180,000 email reports over six months. The machine-learning system sifted through the mess to surface 60 incidents that had to be dealt with, and freed up the people who otherwise would have to review the notifications to identify pressing problems.
"I can take those people and do better stuff with them," said Adams. "People don't want to pore through emails. They want to talk to customers. They want to do things that are rewarding and that's much more rewarding."
Asked whether the automation of work might affect jobs, Adams conceded it might if the company weren't doing so well.
"If we weren't growing, we would probably end up eliminating jobs," said Adams, "because we have a lot more people than we need when you start automating all this stuff. But because our company is growing so rapidly, now I can shift them into more value-added positions and roles." ?