Spark Optimization 2 with Scala
Разное | Автор: LeeAndro | Добавлено: 13-09-2020, 15:13 | Просмотров (8) | Комментариев (0) | Жалоба |
Spark Optimization 2 with Scala
MP4 | Video: h264, 1280x800 | Audio: AAC, 44.1 KHz, 2 Ch
Genre: eLearning | Language: English | Duration: 24 Lessons (7h 53m) | Size: 1.25 GB

Go fast or go home.

Master Spark internals so your jobs go lasers blazing and your cluster pulls maximum weight.
In this course, we cut the weeds at the root. We dive deep into Spark and understand what tools you have at your disposal - and you might just be surprised at how much leverage you have. You will learn 20+ techniques and optimization strats. Each of them individually can give at least a 2x perf boost for your jobs (some of them even 10x), and I show it on camera.

You'll understand Spark internals to explain how Spark is already pretty darn fast

You'll be able to predict in advance if a job will take a long

You'll diagnose hag jobs, stages and tasks

You'll spot and fix data skews

You'll make the right tradeoffs between speed, memory usage and fault-tolerance

You'll be able to configure your cluster with the optimal resources

You'll save hours of computation in this course alone (let alone in prod!)

You'll control the parallelism of your jobs with the right partitioning

And some extra perks:

You'll have access to the entire code I write on camera (~1400 LOC)

You'll be invited to our private Slack room where I'll share latest updates, discounts, talks, conferences, and recruitment opportunities

(soon) You'll have access to the takeaway slides

(soon) You'll be able to the videos for your offline view

Deep understanding of Spark internals so you can predict job performance

stage & task decomposition

reading query plans before jobs will run

reading DAGs while jobs are running

performance differences between the different Spark APIs

packaging and deploying a Spark app

configuring Spark in 3 different ways

understanding the state of the art in Spark internals

leveraging Catalyst and Tungsten for massive perf

Understanding Spark Memory, Caching and Checkpointing

Tuning Spark executor memory zones

caching for speedy data reuse

making the right tradeoffs between speed, memory usage and fault tolerance

using checkpoints when jobs are failing or you can't afford a recomputation


leveraging repartitions

using coalesce to avoid shuffles

picking the right number of partitions at a shuffle to match cluster capability

using custom partitioners for custom jobs

Cluster tuning, fixing problems

allocating the right resources in a cluster

fixing data skews and straggling tasks with salting

fixing serialization problems

using the right serializers for free perf improvements

This course is for Scala and Spark programmers who need to improve the run and memory footprint of their jobs. If you've never done Scala or Spark, this course is not for you. I'll generally recommend that you take Spark Optimization 1 first, but it's not a requirement.



Уважаемый посетитель, Вы зашли на сайт как незарегистрированный пользователь. Мы рекомендуем Вам зарегистрироваться либо войти на сайт под своим именем.
  • 0
Похожие новости:
Посетители, находящиеся в группе Гости, не могут оставлять комментарии к данной публикации.
Панель управления
На сайте
Пользователей Юзеры (1)
Гостей Гости (34)
Роботов Боты (1)
crawl Bot
Всего Всего на сайте (36)
Не попавшее на главную
Архивы сайта