New functionality allows customers to ingest, govern, and share data in near real-time, while leveraging the scale and cost-efficiency of a data lake

BOSTON, Nov. 28, 2023 -- Starburst, the data lake analytics platform, today at AWS re:Invent 2023 announced new capabilities that enable organizations to build and scale game-changing data applications without compromising on performance or cost. With the increasing interest in building artificial intelligence (AI)-driven data applications, customers need to establish a solid data platform. New features in Starburst Galaxy help customers simplify development on the data lake by unifying data ingestion, data governance, and data sharing on a single platform.

Starburst Logo

Interactive applications oftentimes require the scalability and cost-efficiency of a data lake, but building and maintaining that data lake is complex and time-consuming for data teams. To overcome these challenges, Starburst has added support for:

  • Near real-time analytics with streaming ingestion: With streaming ingestion, customers can leverage Kafka to hydrate their data lake in near real-time, ensuring applications have the most up-to-date insights for their users. Upcoming support for fully managed solutions, such as Confluent Cloud, is also planned.



  • Automated data governance: As new data lands in the lake, machine learning models in Gravity – a universal discovery, governance, and sharing layer in Starburst Galaxy – will automatically apply classifications for certain categories. Depending on the class, Gravity will apply policies granting or restricting access. This automation is particularly useful for teams handling sensitive data like personally identifiable information (PII). Now, as soon as PII lands in the lake, Gravity will be smart enough to identify and restrict access to that data.



  • Automated data maintenance: New automations make it easy for customers to optimize their data lake by abstracting away common management tasks like data compaction and data vacuuming. Users can now maintain warehouse-like performance without adding brittle manual processes, as the volume and complexity of data in their data lake grows.



  • Universal data sharing with built-in observability: With Gravity, users can easily package data sets into shareable data products to power end-user applications, regardless of source, format, or cloud provider. New functionality will allow users to securely share these high-quality data products with third-parties, such as partners, suppliers, or customers.



  • Self-service analytics powered by AI: Not only are data lakes notoriously hard to manage, but the majority of data teams are understaffed. New AI-powered experiences in Galaxy, like text-to-SQL processing, will enable data teams to offload basic exploratory analytics to business users, freeing up their time to build and scale data pipelines.

"Data-intensive initiatives like AI require a solid data foundation to be successful," said Justin Borgman, Co-founder & CEO of Starburst. "We provide that foundation, giving our customers the ability to quickly access and analyze all their data in order to scale applications from the first hundred users to the first thousand and beyond. We ensure optimal performance even with high concurrency and exponentially growing data volumes. The new streaming ingest, data maintenance and governance automations, and data sharing capabilities in Starburst make it remarkably easy for teams to build, deploy, and scale applications on top of the data lake."

Halliburton is already taking advantage of this foundation. "After building good quality data products with Starburst, we saw an opportunity to use LLM to help with that process," said Fahad Ahmad, Data Science Leader, at Halliburton. "Previously it would take 2 to 3 weeks to get an answer to an ad hoc question. By embedding an LLM with Starburst's data products architecture, data consumers can ask questions in plain language, have it converted to SQL, and get the answer back immediately."

Starburst's position as an Amazon Web Services (AWS) Data and Analytics Competency Partner, means that AWS customers can rest assured that these features will be made available on the fastest hardware AWS has to provide, including AWS Graviton3 and the newly launched Amazon Simple Storage Service (Amazon S3) Zonal storage class, and will integrate seamlessly with core tools like AWS QuickSight and new tools like Amazon Bedrock.

To learn more about Starburst, including its offerings and integrations, please visit booth 1151 at AWS re:Invent 2023 or the website: www.starburst.io/.

About Starburst

For data-driven companies, Starburst offers a full-featured data lake analytics platform, built on open source Trino. Our platform includes the capabilities needed to discover, organize, and consume data without the need for time-consuming and costly migrations, enabling teams to focus on building differentiating features, not managing analytics infrastructure.

We believe the lake should be the center of gravity, and be the starting point for querying disparate data. With Starburst, teams can access more complete data, lower the cost of infrastructure, use the tools best suited to their specific needs, and avoid vendor lock-in. Trusted by companies like Comcast, Grubhub, and Priceline, Starburst helps companies make better decisions faster on all their data.

Logo - https://mma.prnewswire.com/media/2286645/Starburst_Logo.jpg