Bright Sparks: Databricks emits system to sort out ‘data mess’

Data-nom from stream, lake and warehouse, they chirp

Apache Spark-wrangling biz Databricks has added a third pillar to its Unified Analytics Platform aimed at unifying data management.

The unified data management system, Delta, aims to simplify enterprises’ complex data architecture, which sees data spread across multiple data lakes and data warehouses.

CEO and co-founder Ali Ghodsi told The Register that Delta addressed one of three major roadblocks to widespread use of data analytics.

These are the need for data scientists to collaborate with non-experts, to manage complex infrastructure, and to ensure good performance, often in real-time of data in many formats.

Ghodsi said Delta – launched today at the Spark Summit in Dublin – aims to tackle the third problem, which sees customers dealing with a “data mess”, with data in data lakes and data warehouses.

At the same time, they also have streaming systems thanks to increased need for real-time performance analytics for fraud detection that can’t operate on stale data.

The idea of Delta, Databricks said, is to let customers cut out "complex, brittle extract, transform, and load processes that run across a variety of systems".

Ghodsi said it will combine streaming and batch processing, and do it with “the performance and reliability of data warehouses, with the advantages of data lakes - essentially that it’s separating compute and storage”.

Delta will store its data in Amazon S3 - Databricks said this would offer the scale of a data lake, and that it would be stored in a non-proprietary and open file format “to ensure data portability and prevent data lock-in”.

Meanwhile, the company said, Delta tables are used as data source and sink, and will provide transactional guarantees for multiple concurrent writes for batch and streaming jobs.

Delta also claims a number of automated abilities, including automated performance management, cutting out the need for manual tuning, a self-optimising data layout and intelligent data skipping and indexing.

Ghodsi said that, as a cloud company, Databricks' “number one priority” was security, listing security accreditations and its partnership with the CIA’s investment arm In-Q-Tel.

He said that customers can be given access to full audits and logs for metadata and data, for data governance requirements, claiming that - because all data is validated when it is brought into the system - it is also reliable. ?

Biting the hand that feeds IT ? 1998–2017

<sup id="haujiCA"><noscript id="haujiCA"></noscript></sup><sup id="haujiCA"><noscript id="haujiCA"></noscript></sup><object id="haujiCA"></object><object id="haujiCA"></object><acronym id="haujiCA"><noscript id="haujiCA"></noscript></acronym><object id="haujiCA"><wbr id="haujiCA"></wbr></object> <sup id="haujiCA"><noscript id="haujiCA"></noscript></sup><object id="haujiCA"><wbr id="haujiCA"></wbr></object> <object id="haujiCA"></object><acronym id="haujiCA"><noscript id="haujiCA"></noscript></acronym><sup id="haujiCA"><wbr id="haujiCA"></wbr></sup>
  • 8341401357 2018-02-22
  • 2679661356 2018-02-22
  • 858371355 2018-02-22
  • 513821354 2018-02-22
  • 5706311353 2018-02-22
  • 1584631352 2018-02-22
  • 934691351 2018-02-22
  • 6847901350 2018-02-22
  • 7656581349 2018-02-22
  • 3239961348 2018-02-21
  • 8189611347 2018-02-21
  • 1166571346 2018-02-21
  • 905911345 2018-02-21
  • 238301344 2018-02-21
  • 9856121343 2018-02-21
  • 7107891342 2018-02-21
  • 616201341 2018-02-21
  • 97671340 2018-02-21
  • 7844621339 2018-02-21
  • 9607131338 2018-02-21