News center
Comprehensive experience and advanced methodologies

Photobox develops a much clearer picture of observability

Aug 29, 2023

Photobox is using the Dynatrace observability platform to consolidate all its system-monitoring data into a single pane of glass. The personalized printing company, which is now part of the albelli-Photobox group, introduced the platform before the merger in late 2022.

The company previously found it difficult to take a proactive approach to system issues. The company’s IT staff had to monitor a complex technology stack that was built on AWS EC2 and micro-services running on Kubernetes and AWS Lambda.

According to Alex Hibbitt, Engineering Director at albelli-Photobox Group:

The complex stack was created by a series of mergers and acquisitions. That level of complexity becomes incredibly hard to observe effectively. We had at least five different observability platforms, using about 10 different technologies. Observability became a skill that was only possessed by a few of our really seasoned engineers.

Hibbitt recalls that IT problems could take up to four hours to identify. The complex nature of observability, meanwhile, meant it was a tough process to undertake. He says the lack of effective observability created scalability and responsiveness challenges:

It was really hard for us to respond to a problem. If something happened, we would need to get hold of our top trouble-shooters and get them to feel in the ether and say, ‘Oh, it feels like it's somewhere over here.’ That approach wasn't very scientific.

It's been game-changing for us in terms of the ability to respond to problems within our complex stack, and our ability to apply insights into where we're focusing our engineering efforts.

The company recognized it needed to take a different tack. As a first attempt, the business introduced its own observability tool. However, this bespoke technology only added another layer of complexity rather than creating value. At that point, Photobox started speaking with technology vendors about a potential solution to the problem. Hibbit says:

We assembled a wish list of what we wanted – one single tool that could cover everything from front-end to back-end database services. We wanted to democratise access to the platform, so any engineer could pick up the process and understand what they were doing. And we needed something that was going to help our signal-to-noise ratio, so we could see which alerts were really important.

After using the wish list to identify potential solutions, Photobox completed low-level trials with a few vendors. Hibbitt’s team then ran a long-running pilot with their preferred solution, Dynatrace. Because of the complexity of systems running at Photobox, the firm established a paid six-month trial to test the platform and generate meaningful data in late 2021:

As an output of the trial, we went live in all of our production environments. That transition was simply to connect to a more long-term implementation. One of our core test points was, ‘Did we have the confidence to turn off all of our old platforms and rely solely on Dynatrace?’ The answer was ‘yes’ – and it's now a core part of our technology stack.

Hibbitt says introducing Dynatrace’s automation and AIOps capabilities has produced some big benefits. Photobox has reduced the mean time to resolution for issues by 80% and cut the number of critical incidents that impact service availability during peak shopping periods by 60%. One of the key benefits of the Dynatrace approach is the platform’s problem cards:

The cards pull together all the different related metrics that might have gone wrong in an incident and presents them as a holistic view. It does two really cool things. First, it gives you an idea of what number of customers are impacted by an issue, which helps our engineers quantify if something's really important. Second, it does a root-cause analysis, where it identifies the potential problem. And in a distributed, micro-services-based organisation like ours, that’s incredibly helpful.

Every engineering team across the organization uses Dynatrace. Rather than just reacting to problems, they're using the platform to work proactively and understand where to focus their efforts more effectively as engineering teams:

We had problems like memory leaks that had existed in our stack for four or five years, which had impacted the customer journey but were hidden in our sea of observability products.

Photobox also benefits from real-time insights into how customers interact with services and how performance issues could impact sales. The original Dynatrace implementation focused on web platforms, but the IT organization is now observing mobile properties, too. Hibbit says:

We've been able to track down and resolve long-standing issues on our apps. And, once again, we're getting really good business value from understanding our sales funnels and conversions – and finding out areas where customers might think our apps are sub-optimal.

The recently formed albelli-Photobox Group has big plans for further improvements. The aim is to create a holistic observability platform that covers both organizations as part of one massive ecosystem. The key to success will be ensuring everyone across the new group understands the role of the Dynatrace platform, says Hibbit:

Observability is a complex problem, with a set of principles to understand. People commonly confuse monitoring and observability. There's been a big piece of work within the Photobox community about the platform and we’ll now need to repeat that work with the albelli community, so that we help define the difference between good monitoring and good observability.

Image credit - Pixabay