Sinopsis
The brutal truth about digital performance engineering and operations.Andreas (aka Andi) Grabner and Brian Wilson are veterans of the digital performance world. Combined they have seen too many applications not scaling and performing up to expectations. With more rapid deployment models made possible through continuous delivery and a mentality shift sparked by DevOps they feel its time to share their stories. In each episode, they and their guests discuss different topics concerning performance, ranging from common performance problems for specific technology platforms to best practices in development, testing, deploying and monitoring software performance and user experience. Be prepared to learn a lot about metrics.Andi & Brian both work at Dynatrace, where they get to witness more real world customer performance issues than they can TPS report at.
Episodios
-
Optimizing Cloud Native Power Consumption using Kepler with Marcelo Amaral
29/01/2024 Duración: 47minMarcelo Amaral is a Researcher for Cloud System Optimization and Sustainability. With his background in performance engineering where he optimized microservice workloads in containerized environments making the leap towards analyzing and optimizing energy consumption was easy.Tune in to this episode and learn about how Kepler, the CNCF project Marcelo is working on, which provides metrics for workload energy consumption based on power models it was trained on by the community. Marcelo goes into details about how Kepler works and also provides practical advice for any developer to keep energy consumption in mind when making architectural and coding decisions.To learn more about Kepler and the episode today check out:LinkedIn from Marcelo: https://www.linkedin.com/in/mcamaral/CNCF Blogpost on Kepler: https://www.cncf.io/blog/2023/10/11/exploring-keplers-potentials-unveiling-cloud-application-power-consumption/Kepler GitHub Repo: https://github.com/sustainable-computing-io/kepler
-
OpenLLMetry - Observing the Quality of LLMs with Nir Gazit
15/01/2024 Duración: 50minIts only been a year since ChatGPT was introduced. Since then we see LLMs (Large Language Models) and Generative AIs being integrated into every days life software applications. Developers have the hard choice to pick the right model for their use case to produce the quality of output their end users demand.Tune in to this session where we have Nir Gazit, CEO and Co-founder of Traceloop, educating us about how to observe and quantify the quality of LLMs. Besides performance and costs engineers need to look into quality attributes such as accuracy, readability or grammatical correctness.Nir introduces us to OpenLLMetry - a set of Open Source extensions built on top of OpenTelemetry providing automated observability into the usage of LLMs for developers to better understand how to optimize the usage of LLMs. His advice to every developer is to start measuring the quality of your LLMs on Day 1 and continuously evaluate as you change your model, the prompt and the way you interact with your LLM stack!If you have
-
Why Developers have different Observability Requirements with Liram Haimovitch
01/01/2024 Duración: 49minAfter analyzing Distributed Traces over more than 15 years Brian and I thought that everyone in software engineering and operations must be satisfied with all that observability data we have available. But. Maybe Brian and I were wrong because we didn’t fully understand all the use cases - especially those for developers that must fix code in production or need to quickly understand what code from somebody else is really doing without having the luxury to add another log line and redeploy on the fly. To learn more about the observability requirements of developers we invited Liram Haimovitch, CTO at Rookout and now part of Dynatrace, who has spent the last 7 years solving the challenging problems that developers face day and night. Tune in and learn about what non-breaking breakpoints are, how it is possible to "debug in production" without impacting running code and how we can make developers lives easier even though we push so many things "to the left"
-
Mobile, AI, LLMs, Observability & Resiliency - Key Topics for Banks in Hungary with Adam Gajdi
18/12/2023 Duración: 11minI was invited to speak at BankTechShow in Budapest, Hungary where the nations IT leaders in the banking sector presented and discussed the future of banking - both in the cloud as well as what it means for the physical bank branches. I got a chance to sit down with Adam Gajdi, IT Solutions CoE Lead at K&H, who walked me through the process of their recent new mobile banking app launch. Adam highlighted the importance of observability for both business owners as well as developers. Furthermore, Adam enlightened me with the fact that Hungarian banks are mandated to conduct chaos tests to proof that their systems are resilient in case of data center outages. I was obviously also curious about how AI, LLMs and other technologies are adopted in their sector. Tune in to learn more
-
Recap KubeCon 2023 NA, State of Platform Engineering and more with Andi Grabner
04/12/2023 Duración: 28minBesides attending KubeCon 2023 NA Andreas (Andi) Grabner, co-host of PurePerformance but guest today, has also travelled parts of the US to chat with the broader observablity community on topics such as Platform Engineering, Observability, DevOps, Automation & Security.Tune in and get a quick recap of all the topics Andi has picked up on his recent trip
-
Observability, Cybersecurity, DevOps & SRE - Learning from the Public Sector with Willie Hicks
20/11/2023 Duración: 46minZero-Trust Architectures. Data-Flow Inventory. User Experience First! Those are key initiatives in the public sector to ensure that digital services delivered to citizens around the globe are not only working with a flawless user experience but are also safe from any bad actors trying to disrupt agencies on local, stage and federal sectors.In this episode we invited Willie Hicks, Federal CTO at Dynatrace, to learn more about the state of observability and security with government agencies Willie has been working with over the past decade. In our conversation we explore the differences between commercial and government as it comes to ROI or how they see competition as a driving motivator.To learn more about the public sector tune into the Tech Transformers podcast that Willie is co-hosting with his colleague Carolyn Ford.
-
Blue turns Green: Sustainable IT is everyone's business with Mario-Leander Reimer
06/11/2023 Duración: 51min4% of worldwide CO2 emissions come from IT and like in all other industries we have big potential to not only reduce the carbon footprint but also lower costs.Tune in to our episode where we have Mario-Leander Reimer, CTO at QAware GmbH, talk about his top 3 suggestions for Sustainable IT: Making the right architectural choices, Right-sizing your environments and shutting down environments not needed!Mario is also heavily involved in the CNCF and gives us an overview of projects to look into such as Kepler, kube-green, Karpenter or Carbon Aware Multi-Cluster Schedulers.Here are the links we discussed:Blue turns Green presentation: https://speakerdeck.com/lreimer/blue-turns-green-approaches-and-technologies-for-sustainable-k8s-clusters-number-kcdmunich?slide=5Kepler Project: https://kepler.gl/kube-green: https://kube-green.dev/CNCF TAG Environmental Sustainability: https://github.com/cncf/tag-env-sustainabilitySustainability Week: https://tag-env-sustainability.cncf.io/cloud-native-sustainability-week/
-
Don't burst in Flames: 20 years of Performance Engineering with Martin Spier
23/10/2023 Duración: 49minMartin Spier was one of six engineers to take care of all of Netflix Operations about 10 years ago. Back then performance and observability tools weren't as sophisticated and didn't scale to the needs of Netflix as some do today. FlameScope was one of the Open Source projects that evolved out of that period, visualizing Flame Graphs on a time-scaled heatmap to identify specific performance patterns that caused issues in their complex systems back then.Tune in to this episode and hear more performance and observability stories from Martin, about his early days in Brazil, his time at Expedia and Netflix and about his current role as VP of Engineering at PicPay - one of the hottest fin techs in Brazil.More links we discussed:Performance Summit talk about FlameCommander: https://www.youtube.com/watch?v=L58GrWcrD00CMG Impact talk on Real User Monitoring at Netflix: https://www.cmg.org/2019/04/impact-2019-real-user-performance-monitoring-at-netflix-scale/Learn more about Vector: https://netflixtechblog.com/extendin
-
Inside Africa - Cloud Native Observability Journeys with Kelvin Klein
09/10/2023 Duración: 17minAfrica is not only the second largest continent in the world - its also top when it comes to adoption of cloud native technologies. I was fortunate to spend a week in South Africa and had the chance to spend a lot of time with Kelvin Klein, Dynatrace Product Manager at Mediro ICT. After two observability events in Johannesburg and Cape Town and several meetings with local tech leaders I got to sit down with Kelvin and learn more about the status of Observablity, Cloud Native and Security in South Africa.
-
The Future of Ops is Sleep with Amit Chiba from Nedbank
25/09/2023 Duración: 10minI was fortunate to travel to South Africa and meet many tech leaders in Johannesburg and Cape Town to talk about Observability, Security, Automation, Platform Engineering, DevOps and FinOps. One of those leaders is Amit Chiba, Multi Product Specialist at Nedbank. I sat down with Amit to discuss his personal journey and his projects at Nedbank, one of the leading financial institutions in South Africa. Tune in and hear from Amit how self-service platform engineering helps them to scale observability, how they tackle cloud costs and why he thinks that the future of IT Ops is more Sleep!
-
Developer Productivity Engineering: Its' more than buying faster hardware with Trisha Gee
11/09/2023 Duración: 44minDo you measure build times? On your shared CI as well as local builds on the developers workstations? Do you measure how much time devs spend in debugging code or trying to understand why tests or builds are all of a sudden failing? Are you treating your pre-production with the same respect as your production environments?Tune in and hear from Trisha Gee, Developer Champion at Gradle, who has helped development teams to reduce wait times, become more productive with their tools (gotta love that IDE of yours) and also understand the impact of their choices to other teams (when log lines wake up people at night). Trisha explains in detail what there is to know about DPE (Developer Productivity Engineering), how it fits into Platform Engineering, why adding more hardware is not always the best solution and why Flaky Tests are a passionate topic for Trisha.Here the links to Trishas social media, her books and everything else we discussed during the podcastLinkedIn: https://www.linkedin.com/in/trishagee/Trishas We
-
Serverless Observability needs a paradigm shift with Toli Apostolidis
28/08/2023 Duración: 01h38sOnly a few can claim they have successfully created a Pure-Serverless architecture and only those really understand the challenges of observing real event driven architectures. Apostolis Apostolidis (also known as Toli) is one of those people and its why we invited him back to discuss all the lessons learned from his time as Head of Engineering Practices at cinch. Tune in and learn about the evoluation of Serverless observability and the challenges when observing API Gateways, Queues and Step Functions. Listen to Toli's advice on picking one observability vendor, doing your own custom instrumentation and making yourself familiar with the observability data from your managed service provider.Also go back to our previous episode to hear more from his Engineering Practices for Success and remember that the time to ask about coldstarts is over
-
Practical Platform Engineering vs the Marketing Hype with Maurico (Salaboy) Salatino
14/08/2023 Duración: 32sCodifying Golden Paths that ideally don't need you to build a K8s Operator! This is what Practical Platform Engineering should look like!In our latest episode we learn from Maurico (Salaboy) Salatino who has been contributing to open source for the past 12 years. Tune in and learn from his journey of designing and built platforms. He shares his opinion on the Platform Engineering skillsets, how to design for self-service, how to pick the right tools out of the 160+ CNCF project options and shares some of his favorite tools (including Crossplane, VCluster, Argo, OpenFeature, Keptn ...) that should be part of a modern cloud native platform.Links discussed in this podcast:Salaboy on Twitter: https://twitter.com/salaboySalaboy on LinkedIn: https://www.linkedin.com/in/salaboy/Upcoming Book: https://www.salaboy.com/book/Cloud-Native Snapshots: https://www.salaboy.com/cloud-native-snapshots/Diagrid: https://www.diagrid.io/
-
Sifting through the Noise of Platform Engineering with Saim Safdar
31/07/2023 Duración: 48minReducing the cognitive load by simplifying computing for every developer in an organization! One of the many definitions of Platform Engineering. But what is Platform Engineering for real? Just a new hype? What problem does it really solve? How does it link with DevOps and SRE? Are there any standards or reference architectures available?To get a new perspective on Platform Engineering we invited Saim Safdar, CNCF Ambassador and member of the CNCF TAG App Delivery Platform Working Group. Tune in and learn about the Platform Maturity Model, how to get involved to shape the field of Platform Engineering, what other people that Saim has interviewed are good to follow and much more ..Here the links we discussed:CNCF Platforms White Paper: https://tag-app-delivery.cncf.io/whitepapers/platformsMaturity Model Working Document: https://docs.google.com/document/d/1bP8-LQ-d41eIdQB3IC2YsncDhawpFLggql2JxwtE0XI/editPlatform Working Group: https://tag-app-delivery.cncf.io/about/wg-platforms/Cloud Native Podcast with Alexis
-
Unlocking the Power of Observability: Engineering Practices for Success with Toli Apostolidis
17/07/2023 Duración: 47minAre you frustrated with your team's ability to troubleshoot issues in production despite their proficiency in pushing out new builds? The root of this problem may lie in the absence of Observability Driven Development. In our latest episode we are joined by Apostolis Apostolidis (also known as Toli) who - as Head of Engineering Practices at cinch - has spent his past years enabling teams to adopt the easiest path to value. He is passionate about DevOps and has a strong opinion on how to educate engineers on "Consciously Instrumenting Code for good Observability".Tune in learn more about good engineering practices, building internal communities of practice, the benefits of traces over metrics and logs and why we need to start adding observability to our CVs and LinkedIn profiles.Here are all relevant links we discussed in this episodeTolis Website: https://www.toli.io/Tolis LinkedIn Profile: https://www.linkedin.com/in/apostolosapostolidis/Toli on Twitter: https://twitter.com/apostolis09/WTFisSRE Talk on DevOp
-
Observability Evolution: From Sys Admin to Digital Readiness Manager with Mark Forrester
03/07/2023 Duración: 43minDo you know why customers spend more money at a pub when ordering at a table vs ordering directly from at the bar tender? Do you want to know how to get SaaS vendors to send you their observability & telemetry data? Do you want to know the career path of how an Infrastructure Analyst turned Digitial Readiness Manager?Tune in to this PurePerformance episode where we sat down with Mark Forrester from Mitchell & Butlers answering all these questions and also drawing the parallels to Observability. Because observability has come a long way just as Mark: From traditional infrastructure (CPU, Memory, Network) to APM (Service Response Time & Failure Rates), to Real User Behaviour and now End-2-End Business Processes Analytics. Unlocking the potential of Digitial Business Observability lets Mark optimize the end-2-end customer journey to make sure their customers always feel like they are taken care of when trying to order online food delivery, a meal or a drink at a restaurant. As you learn, digital busi
-
The De-Facto Standard of Metrics Capture and Its Untold Histogram Story with Björn Rabenstein
19/06/2023 Duración: 54minAs far as we know - besides Kubernetes there is only Prometheus that belongs to the prestigious group of open-source projects that have their own documentary. Now why is that? Prometheus has emerged as the go-to solution for capturing metrics in modern software stacks, earning its status as the de facto standard. With its widespread adoption and a constantly expanding ecosystem of companion tools, Prometheus has become a pivotal component in the software development landscape.Join us as we sit down with Björn Rabenstein, an accomplished engineer at Grafana, who has dedicated nearly a decade to actively contributing to the Prometheus project. Björn takes us on a journey through the project's early days, unravels the reasons behind its meteoric rise, and provides us with insightful technical details, including his personal affinity for Histograms.Here are the links we discussed during the podcast for you to follow up:Prometheus Documentary: https://www.youtube.com/watch?v=rT4fJNbfe14First Prometheus talk at SRE
-
GraphQL, API Gateways, API-Led Growth – How to make APIs Observable with Sonja Chevre
05/06/2023 Duración: 33minAPIs are powering and empowering software innovation as they enable new use cases on top of existing services. Observability into API usage to answer questions like: how APIs are called, what APIs do, where APIs fail, where APIs are slow, where APIs are misused … has to be on top of mind for architects that decide to build or use APIs.In this episode we welcome Sonja Chevre, Group Product Manager at Tyk, who recently gave a captivating talk at KubeCon about using OpenTelemetry to get insights into popular API frameworks such as GraphQL. We are discussing common challenges for SREs such as that APIs often hide the status of a call behind an HTTP 200 or that debugging individual calls is really hard as details of the call are not exposed by default to telemetry data. We also cover topics such as API-led growth, API as a product as well as open standards such as OpenTelemetry and OpenAPI. Here the list of discussed links during the show:KubeCon Talk: https://kccnceu2023.sched.com/event/1HyVc/what-could-go-wrong-
-
Why Cyber Defense is Hard: A Closer Look at the latest security research with Stefan Achleitner
22/05/2023 Duración: 52minSecurity comes with a price tag, such as additional wait time when going through checks at the airport or when inspection network packages at your firewall.To learn about current approaches to cyber defense and cyber deception we invited back Stefan Achleitner, Lead Researcher Cloud Native Security at Dynatrace. Tune in and learn why it is important to keep changing and using different passwords, why you should monitor all your servers, what zero day vulnerabilities are, the role of eBPF in security and why we have to minimize false positives alarms like the Hawaii Missile Alert! Some of the links we discussed during the podcast can be found here:Our previous episode: https://www.spreaker.com/user/pureperformance/don-t-look-away-from-the-next-cyber-secuHawaii false missile alert: https://en.wikipedia.org/wiki/2018_Hawaii_false_missile_alerteBPF on isitobservable: https://isitobservable.io/search?q=eBPFCheck if you've been compromised: https://haveibeenpwned.com/Stefan’s SolarWinds Article (German): https://in
-
Unlocking the Power of OpenTelemetry: Insights from an OTel Expert at NWM
08/05/2023 Duración: 49min36 million generated OpenTelemetry spans per hour for GraphQL based queries – that’s just one of the stats we discussed with Justin Scherer, Sr Developer and Consultant, who is leading OTel adoption and Shift-Left observability efforts at NWM. For Justin, OpenTelemetry helps commoditize data gathering in modern cloud native environments so that the backend observability platform of choice can focus on answering higher level business impacting questions.If you are about to roll out OpenTelemetry in your organization then take the advice from Justin such as: Bringing Business Leaders early into the discussion! Engage with the OpenTelemetry community! Understand what your Observability Platform already gives you and focus on the gaps! To learn more about OpenTelemetry check out some of the links we discussed during the podcast:OpenTelemetry Website: https://opentelemetry.io/IsItObservable: https://isitobservable.io/open-telemetryPodcast: https://www.spreaker.com/user/pureperformance/adopting-open-observability-a