Case Study
How a financial services company built a future-ready application observability solution that cut alert fatigue 60%
OUR CLIENT
Founded several decades ago, this large financial services organization provides a broad range of lending and payment solutions to individuals and small businesses across North America. With thousands of employees, the company has consistently invested in digital platforms and advanced analytics to modernize customer interactions and streamline operations.
THE CHALLENGE
Observability challenges—Multiple tools and scattered data delayed application issue resolution
The company’s IT operations (ITOps) team needed help streamlining issue resolution processes for one of its critical customer-facing payment processing applications. With multiple observability tools in use, application data was scattered across many different platforms, causing delays in mean time to detect (MTTD) and mean time to resolve (MTTR) issues. This situation caused alert fatigue, making it difficult for the ITOps team to understand which alerts should be prioritized and whether issues were false positives or duplicates. In particular, the ITOps team struggled with:
- Disparate issues logs—Critical application monitoring data was in either Logz.io or Amazon CloudWatch. However, the ITOps team struggled to route critical issue logs to New Relic and utilize the observability platform.
- A problematic AWS OpenTelemetry (OTel) implementation—Although the customer-facing application was developed in AWS Lambda and Java, the ITOps team wrestled with OTel performance issues, further contributing to siloed application issue data.
- The inherent complexity of the application—The payment processing system was integrated with multiple external tools to process transactions, adding to the observability challenges.
- Gaining consensus within the organization on the best resolution—Since multiple teams and management levels across the organization needed to review and approve the proposed solution, the customer required a strategic plan and support from a proven partner to ensure executive buy-in.
THE TRANSFORMATION
Breaking down silos and creating a reusable observability framework for enterprise applications
In just three months, UST delivered a scalable, automated, single-pane-of-glass observability solution to accelerate issue resolution for the company’s critical customer application. The specialized product-oriented delivery team, consisting of an observability solution architect, a Java architect, an observability engineer, and an automation engineer, used agile methodologies to deliver a robust solution 50% faster than traditional deployment methods. The solution included:
- A future-ready, scalable, 360o observability framework—Designed with observability-as-code and automations, the solution meets the company’s needs today with flexibility and scalability to evolve as needs change, eliminating the costly rework that comes with rigid, point-in-time implementations.
- A unified view of application health in real-time—After resolving the AWS OTel implementation issues, all metrics, events, logs, and traces (MELT) now flow seamlessly into New Relic, providing a simulated single-pane-of-glass view of application issues and eliminating the time-consuming manual effort of correlating data across multiple tools.
- Automated, templatized CI/CD pipelines—By integrating new development pipelines for the key customer-facing application into the company’s existing CI/CD ecosystem, engineers can accelerate the delivery of new features and functionality within the observability framework.
- Reusable observability assets—Standardized dashboards, alerts, and OTel Java classes give the company a repeatable playbook to roll out consistent observability across its entire application portfolio, compounding efficiency gains over time.
THE IMPACT
Accelerating issue detection and resolution by 30% while embarking on a transformational observability strategy for the future
The company now has a robust, automated observability platform that delivers real-time business insights from consolidated telemetry data. The solution reduced MTTD and MTTR by 30% and decreased false alerts by 60%, minimizing alert fatigue among the ITOps team, increasing operational efficiency, and improving customer experiences with better application performance. Built to scale, the ITOps team can expand the solution to cover its entire application footprint, further increasing efficiencies and maximizing the value of the engagement.
Building on this success, UST was selected to lead the company’s site reliability engineering center of excellence, expanding observability across the broader application landscape and driving the implementation of an AIOps strategy.
If your company struggles with application observability issues, UST can help. Learn more about our applied observability solutions.
RESOURCES
https://www.ust.com/en/pace/applied-observability