Sinopsis
The brutal truth about digital performance engineering and operations.Andreas (aka Andi) Grabner and Brian Wilson are veterans of the digital performance world. Combined they have seen too many applications not scaling and performing up to expectations. With more rapid deployment models made possible through continuous delivery and a mentality shift sparked by DevOps they feel its time to share their stories. In each episode, they and their guests discuss different topics concerning performance, ranging from common performance problems for specific technology platforms to best practices in development, testing, deploying and monitoring software performance and user experience. Be prepared to learn a lot about metrics.Andi & Brian both work at Dynatrace, where they get to witness more real world customer performance issues than they can TPS report at.
Episodios
-
Scaling Dev Teams from Startup to Enterprise while keeping Agility with Stefan Frandl
14/12/2020 Duración: 45minStefan Frandl, Development Director, has a single digit employee number at Dynatrace and therefore seen a lot of agile transformation over the past 15 years – growing from a startup in Linz, Austria to now 800+ engineers across globally distributed labs. A visit to several “unicorns” such as Google, Facebook and Slack triggered the latest agile transformation.In this episode Stefan walks us through the implementation of the changes we discussed with Andrea Holl in her episode on “Scaling Agile at Dynatrace”. He shares the challenges around growing responsibilities of team leads, work left half-finished, overhead on hand-over and cross team collaboration. He then introduces us to the current structure and processes at Dynatrace such as Team Captains, Product Owners and Agile Advocates as well as Dev Directors and Lead Product Engineers. While Dynatrace has seen many benefits already, the journey is still ongoing as Dynatrace is continuously rethinking and improving the way we work and provide value to our cust
-
Scaling Agile at Dynatrace with Andrea Holl
30/11/2020 Duración: 47minSAFE, LESS or the Spotify Model? Which scaled agile method to apply for your transformation? Or are you unique enough like 44% of organizations based on a European research that are defining their own scaled agile approach to transform successfully?In this episode we sit down with Andrea Holl, Agile Coach at Dynatrace, and let her walk us through the different scaled agile frameworks. She discusses the pros and cons and why many organizations – including Dynatrace – are coming up with their own approaches. For Dynatrace it was about taking the best from the proven frameworks but adapting them to allow us continue or core cultural values such as full autonomy to teams and flexibility of tools and processes.If you are on the brink of a transformation make sure to listen to Andrea and how she and her teams have approached that transformational project!https://www.linkedin.com/in/andrea-elisabeth-holl-b2255a112/https://www.scaledagileframework.com/https://less.works/https://blog.crisp.se/wp-content/uploads/2012/1
-
Why you should look into Chaos Engineering with Ana Medina
16/11/2020 Duración: 56minDaylight savings can bring chaos to systems such as rogue processes consuming CPU or memory and therefore impact your critical systems. The question is: how do you systems react to this chaos? How can you test for this? And how can you make your systems more resilient against this chaos?In this episode we talk with Ana Margarita Medina, Chaos Engineer at Gremlin. In her previous job, Ana (@Ana_M_Medina) was a Site Reliability Engineer at Uber where she helped coping with the “chaos” on New Years Eve or Halloween. Ana gives us great insights into the discipline of Chaos Engineering, that its really about running controlled experiment and that everyone can get started that has an interest in contributing to more resilient systems.Here the additional links we promised during the recording: Drift into failure, Chaos Engineering Community, Chaos Engineering and System Resilience in Practice.https://www.linkedin.com/in/anammedina/https://twitter.com/Ana_M_Medinahttps://eng.uber.com/nye/https://www.amazon.com/Drift-
-
How to scale k8s operations from a single to thousands of clusters
02/11/2020 Duración: 55minWe are sitting down with Sebastian Scheele (@sscheele), CEO and co-founder of Kubermatic, to discuss the challenges organizations have as they are moving their workloads to k8s and realize that managing, scaling and operating k8s is not getting easier the more k8s clusters you allow your application teams to spin up or down. We learn more about the Kubermatic Kubernetes Platform, the Open Source Project, which centrally manages the global automation of thousands of Kubernetes clusters across multi-cloud, on-prem and edge with unparalleled density and resilience.Thanks Sebastian for answering all the questions we threw at you – questions we have received from many organizations that are moving to k8s but get surprised about the complexity as it comes to properly operating and managing k8s.Sebastian Scheele Twitterhttps://twitter.com/sscheeleKubermatic Kubernetes Platformhttps://github.com/kubermatic/kubermatic
-
What we have learned about K8s and Open-source when building Keptn
19/10/2020 Duración: 01h04minKeptn is now a CNCF sandbox project bringing a new event-driven approach to continuous delivery and operations. While many are just hearing about Keptn the first time, it is interesting to learn more about how it started, which challenges the team ran into, what they learned about K8s, and running an open-source project. We therefore invited Johannes Braeuer (@braeuer_j) and Andreas Grimmer (@grimmer_andreas) – both Keptn project maintainers and contributors – who have been working on the Keptn project since its inception.Especially for groups that want to start open-source projects or are on the brink of deciding pro or con Kubernetes should listen until the end as Johannes and Andreas tell us what they would do differently now if they would start today based on the learnings from the past 18 months.If you want to join the Keptn community, make sure to star our GitHub project, join the Slack channel, and join our regular community meetings!Keptnhttps://keptn.sh/Johannes Bräuer on Twitterhttps://twitter.com/b
-
Bringing Observability to .NET with Georg Schausberger and Bernhard Ruebl
05/10/2020 Duración: 01h41sGetting visibility into .NET code whether it runs on a developer machine, on a windows server on-premise or as a serverless function in the cloud is the day2day job of Georg Schausberger (@BombadilThomas) and Bernhard Ruebl, part of the Dynatrace .NET Agent Team.In this podcast we hear firsthand about the challenges in bringing observability, monitoring and distributed tracing to the .NET ecosystem. They give us insights about their continued effort to reduce startup and runtime overhead, the innovation that comes out of Microsoft as they are moving towards open standards and the noble automated approach to always validated things don’t break monitored code with the constant update of libraries and frameworks.We also got both to talk about their developer experience when working with commercial tools such as Dynatrace and its PurePath technology as well as open source tools when analyzing and debugging their own code or helping users figure out what’s wrong with their code.In the talk both mentioned other too
-
Successful Enterprise Monitoring Projects with Kayan Hales
21/09/2020 Duración: 57minSuccessful Cloud Migrations, large scale Kubernetes & OpenShift deployments, making billions of data points actionable and enterprise-wide Citrix & SAP monitoring. These are some of the projects Kayan Hales, Technical Manager at Dynatrace, and her colleagues at Dynatrace ONE help enterprise customers around the world to implement every day.We sat down with Kayan as we wanted to learn what really matters to many large organizations as they embark on automating monitoring into their hybrid multi-cloud environments. While we constantly talk about cloud native and microservices it was interesting to hear what the global team of Dynatrace experts is doing on a day-2-day basis. Kayan gives us insights how important it is to think about meta data, tagging strategies and automation before large scale rollouts and that one of the first question you need to ask is: who needs what type of data at which time through which channels.https://www.linkedin.com/in/kayanhales/https://www.dynatrace.com/services-support/d
-
Why Performance Engineering in 2020 is still failing with James Pulley
07/09/2020 Duración: 01h08minWhy do some organizations still see performance testing as a waste of time? Why are we not demanding the same level of performance criteria for SaaS-based solutions as we do for in-house hosted services? Why are many organizations just validating performance to be “within specification” vs “holistically optimized”?In this episode we have invited James Pulley (@perfpulley), Performance Veteran and PerfBytes News of the Damned host, to discuss who organizations can level up from performance testing to true performance engineering. He also shares his approaches to analyzing performance issues and gives everyone advice on what to do to start a performance practice in your organization.https://www.linkedin.com/in/jameslpulley3/https://www.perfbytes.com/p/news-of-damned.html
-
Encore - Understanding the Power of Feature Flags with Heidi Waterhouse
24/08/2020 Duración: 45minImagine a future where we deploy every code change directly into production because feature flags eliminated the need for staging. Feature flags allow us to deploy any code change, but only launch the feature to a specific set of users that we want to expose to new capabilities. Monitoring the usage and the impact enables continuous experimentation: optimizing what is not perfect yet and throw away features (technical debt) that nobody really cares about. So – what are feature flags?We got to chat with Heidi Waterhouse (@wiredferret), Developer Advocate at LaunchDarkly (https://launchdarkly.com/), who gives as a great introduction on Feature Flags, how organizations actually define a feature and why it is paramount to differentiate between Deploy and Launch. We learn how to test feature flags, what options we have to enable features for a certain group of users and how important it is to always include monitoring. IF you want to learn more about feature flags check out http://featureflags.io/. If you want to
-
Encore - How to build distributed resilient systems with Adrian Hornsby
03/08/2020 Duración: 55minAdrian Hornsby (@adhorn) has dedicated his last years helping enterprises around the world to build resilient systems. He wrote a great blog series titled “Patterns for Resilient Architectures” and has given numerous talks about this such as Resiliency and Availability Design Patterns for the Cloud at DevOne in Linz earlier this year.Listen in and learn more about why resiliency starts with humans, why we need to version everything we do, why default timeouts have to be flagged, how to deal with retries and backoffs and why every distributed architect has to start designing systems that provide different service levels depending on the overall system health state.Links:Adrian on Twitter: https://twitter.com/adhornMedium Blog Post: https://medium.com/@adhorn/patterns-for-resilient-architecture-part-1-d3b60cd8d2b6Adrian's DevOne talk: https://www.youtube.com/watch?v=mLg13UmEXlwDevOne Intro video: https://www.youtube.com/watch?v=MXXTyTc3SPU
-
Service Meshes: From simple load balancing to securing planet scale architectures with Sebastian Weigand
20/07/2020 Duración: 01h03minWhether you are still researching on whether you need a Service Mesh or simple use a load balancer or if you are already deploying multi hybrid-cloud architectures and Service Meshes help you secure the location aware routed traffic. In both cases: listen to this episode!We invited Sebastian Weigand (@ThatDevopsGuy) back to our podcast who wrote papers such as Building a Planet-Scale Architecture the Easy Way. In our episode Sebastian walks us through why Service Meshes have gained so much in popularity, what the main use cases are, how you should decide on whether or not use Service Meshes and which challenges you might run into as you expand into using more features.https://twitter.com/thatdevopsguyhttps://files.devnetwork.cloud/DeveloperWeekNewYork/presentations/2019/scalability/Sebastian_Weigand.pdf
-
From Postmortems to true SRE Culture with Steve McGhee
06/07/2020 Duración: 01h07minSteve McGhee (@stevemcghee) is an expert in post mortems and SRE. He has learned the craft at Google, applied it at MindBody and is now sharing his experiences while back at Google to the larger SRE community. Listen to this episode and learn more about how post mortem analysis can be the starting point of your SRE transformation. How it can help reliability engineering to build and engineer systems that fail gracefully instead of causing full crashes or outages.Steve also went into monitor what matters and only defining alerts on leading indicators with an expiration date – a fascinating concept to avoid a flood of custom alerting in production!If you want to learn more from Steve or SRE check out these additional resources he mentioned in the podcast: The SRE I aspire to be (SRECon19) and his 2 blog part series on blameless.com.https://twitter.com/stevemcgheehttps://www.youtube.com/watch?v=K7kD_JfRUY0https://www.blameless.com/blog/improve-postmortem-with-sre-steve-mcghee
-
SLO Adoption and Usage in SRE with Sebastian Weigand
22/06/2020 Duración: 01h02minKeep hearing the terms SLIs, SLOs, SLAs, Error Budgets and finally want to understand what they are, who should be responsible for and how they fit into SRE (Site Reliability Engineering)?Then listen to our conversation with Sebastian Weigand who has been helping organizations modernizing not only their application stacks but also helping them embrace DevOps & SRE. Learn about who is responsible to define SLIs, what the difference between SLOs and SLAs are and what the difference between DevOps & SRE is in his opinion!Sebastian, who calls himself “That Devops Guy” (@ThatDevopsGuy), also suggests to check out the latest free report on SLO Adoption and Usage of SRE as well as SRE Books from Google to get started with that practice.https://www.linkedin.com/in/thatdevopsguy/https://twitter.com/ThatDevopsGuyhttps://landing.google.com/sre/resources/practicesandprocesses/slo-adoption-and-usage-in-sre/https://landing.google.com/sre/books/
-
Building High Performing Apps on React with Cassidy Williams
08/06/2020 Duración: 51minCassidy (@cassidoo) has been building but also educating developers on how to build apps on React, JavaScript, JAMStack and many other technologies over the past years. We got her on our podcast where she gave us insights into React Hooks, how WPO (Web Performance Optimization) plays out in the React world, why it is important to think about state from the start and that its important to always have your end user in mind before even writing your first line of JavaScript.In the podcast she references additional resources which here are the links for: The performance benefits of Variable Fonts, Mandy Michael (@Mandy_Kerr), Isabela Moreira (@isabelacmor) and A/B Testing with React (YouTube).https://twitter.com/cassidoohttps://reactjs.org/https://jamstack.org/https://uxdesign.cc/the-performance-benefits-of-variable-fonts-79af8c4ff56chttps://twitter.com/Mandy_Kerrhttps://twitter.com/isabelacmorhttps://www.youtube.com/watch?v=xpfR0rRfcNk
-
Extreme load testing with 2Mio Virtual Users: Lessons learned with Joerek van Gaalen
25/05/2020 Duración: 58minHow do you prepare for a 2Mio concurrent user load that lasts for 7 seconds? What does the load infrastructure look like? How do you optimize your scripts? How do you deal with DNS or CDNs?In this episode we hear from Joerek van Gaalen who has done these types of tests. He shares his experiences and approaches to running these “special event extreme load tests”. If you want to learn more make sure to check out his presentation and read his blog post from Neotys PAC 2020.https://www.linkedin.com/in/joerekvangaalen/https://www.neotys.com/performance-advisory-council/joerek_van_gaalen
-
Everything we messed up and learned when moving to AWS with Justin Donohoo
11/05/2020 Duración: 01h03minHave you ever burned 30k because you forgot to turn off your test VMs over the weekend? Have you ever accidentally deleted “the production table” because you thought you were connected to your dev database? We often only hear the good stories and not those that teach us about what we should not do in order to avoid disaster!Join this episode where Justin Donohoo, Founder and CTO of Observian, tells us horror stories from his professional life that taught him great lessons on what not to do when moving to the cloud, re-architecture because of exponential growth or let the intern do things he/she shouldn’t do.https://www.linkedin.com/in/jdonohoo/
-
The Good, The Bad, and The Ugly of Open Source with Goranka Bjedov
27/04/2020 Duración: 57minGoranka Bjedov has seen the different sides of Open Source while she was working for organizations such as Google, Facebook or AT&T Labs. Before she takes the stage at www.devone.at later this year she gives us her take on Scott McNealy’s quote “Open Source is free like a puppy is free”. Tune in and hear her thoughts on how to pick the right tools, languages or frameworks, how to grow a an open source project and what things you should definitely avoid.https://www.linkedin.com/in/goranka-bjedov-5969a6/https://devone.at/
-
Achieving Reliability through Chaos Engineering with Tammy Bütow
13/04/2020 Duración: 48minStarting your new job as Infrastructure Engineer in a large bank with your to-be boss and his key architects just leaving feels like Chaos! Maybe that’s why Tammy Butow has made a career in Chaos and Site Reliability Engineering. In this episode, Tammy shares her experiences of bring reliability into highly complex systems at NAB, Digital Ocean, DropBox or now Gremlin through chaos engineering. You learn about the importance to know and baseline your metrics, to define your SLIs and SLOs and to continuously run your fire drills to ensure your system is as reliable as it has to be.If you want to learn more check out Tammy’s presentations on speakerdeck and make sure to join the chaosengineering slack channel.https://www.linkedin.com/in/tammybutow/https://speakerdeck.com/tammybutowhttps://slofile.com/slack/chaosengineering
-
Demystifying DevTestSecOps: Automating Security into your Culture with Adam Auerbach
30/03/2020 Duración: 56min3 years ago, Adam Auerbach explained how he helped Capital One to automate performance into the DevOps Delivery Pipeline. In 2020, where IT Security is a hot trending topic, Adam works for EPAM and is back advocating for the same shift-left he as advocated for when it comes to functional or performance testing. But now – its about baking Security into your practices & culture. And he has a cool word for it: DevTestSecOps!Listen in and learn which types of security checks can be fully automated in the different stages of the delivery pipeline. Also learn how to prioritize your vulnerabilities as you most likely end up with a lot of noise in the beginning. Adam also highlightes the following open source tools that will help in that transformation: getcarrier.io and reportportal.io.https://www.linkedin.com/in/adamauerbach/https://getcarrier.io/#abouthttps://reportportal.io/
-
DesignOps with Barista: Scaling UX and UI Efficiency at Dynatrace
16/03/2020 Duración: 29minDesignOps, just as DevOps or NoOps, is targeted towards increasing the efficiency and collaboration between designers and engineers in order to deliver better, intuitive and consistent user experiences. It requires changes in processes, people and tooling and is heavily driven by enabling engineers to become more autonomous when developing and delivering new value for their organization.Join this podcast and learn from Ursula Wieshofer (@Ursula_W), UX Design Team Lead, as well as from Fabian Friedl (@fabian_friedl), DesignOps Team Lead, on how we live and breath DesignOps at Dynatrace. You will learn about the recently released OpenSource Design System Barista, how it enables our engineers to deliver consistent user experiences across all sorts of software projects and how we manage feedback through the Barista GitHub project for future innovation.Also make sure to check out the recent blog posts UX Guilds as well as how we deal with the constantly changing requirements of our design teams.https://twitter.com