Loading…
June 21-24, 2022
Austin, Texas, USA + Virtual
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit North America 2022 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Central Daylight Time (UTC -5). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Wednesday, June 22 • 11:00am - 11:40am
ETL - Extract Trino Load - A Case for Trino as a Batch Processing Engine - Andrii Rosa, Starburst Data

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Trino is a relatively new name in the open source space that was formerly known as the PrestoSQL. Trino is very well known for fast adhoc and exploratory workloads on data lakes and heterogeneous data sources. When you want to provide your data scientists with the ability to query across your data landscape by joining live operational data with historical data, Trino is the state-of-the-art. Trino and Presto were initially built to replace Hive workloads at Facebook and handled massive petabyte-scale batch workloads. Yet across the board, Trino was not being widely adopted as a batch ETL engine to solve these workloads. As it turns out, one of the features that drive Trino's incredible speed was forgoing failure recovery measures to buy faster queries. In practice, many desire the opportunity to have the system running the query to facilitate the recovery from failures. The Trino community has banded around supporting native granular failure recovery to improve resiliency in the event of a failure. This brings Trino to a new frontier by enabling both exploratory and failure recovery for long-running workloads so that engineers and analysts do not have to shift between systems to run their queries.

Speakers
AR

Andrii Rosa

Software Engineer, Starburst Data
Trino maintainer and distributed systems enthusiast. Currently working at Starburst on expanding Trino capabilities to better support long running, resource intensive queries that are common in ETL space. Previously worked at Facebook on Presto on Spark technology to support petabyte... Read More →


Wednesday June 22, 2022 11:00am - 11:40am CDT
Room 408/409 (Level 4)