« Back to News

Methodic Troubleshooting Leads to Cost Savings

The genesis of significant issues and problems more often lies in the most innocuous of changes. Such is this story where a small change in the code led to bandwidth costs to the tune of $15,000 for an e-commerce customer.

After persevering for a few days and questioning several assumptions, we were able to resolve the issue that led to a significant cost saving. Read on to see how we did this.

The challenge

The client has an end-to-end Video Commerce solution comprising of video production work, a content management system (CMS), and a customized HTML5 player. A javascript framework dynamically embeds videos from the CMS with large eCommerce companies in the US who are its customers.

Our client had noticed that the cost of bandwidth acquired from the CDN provider had risen from $6,000 to $15,000 within just one month.

A combination of factors led to cost calculations. These include:

  • The video player software
  • Content publisher logic
  • How the CDN works and reports usage data, and,
  • Default HTML5 behavior

A fix to an earlier issue included adding the srcattribute in the <video>tag for all videos. In HTML5, the <video> tag accepts an attribute called preload . Preload can have values such as auto, metadataand none. By default, browsers consider the autovalue if a value is not listed. Each browser interprets this value differently. A video with the auto value in preload optimistically downloads without anybody requesting a playback. Due to this, videos were downloading to edge CDN locations.

To avoid certain inefficiencies, the publisher logic purges the entire ROOT from the edge locations. Videos move from the ORIGIN to the EDGE repeatedly due to this purge. The CDN billing calculates bandwidth on the move from ORIGIN to EDGE and from EDGE to the DEVICE.

 

How do we explain the high bandwidth cost?

Two factors can explain this high bandwidth cost.

  1. Video playback on a device consumes bandwidth. However, we did not find any increase in traffic that could lead to increased bandwidth usage. This anomaly surfaced since our analytics only captures bandwidth metrics when the user explicitly plays the video.
  2. The CDN has two platforms – HTTP Large and HTTP Small. These two platforms serve large or small objects. For large object’s the CDN reports bandwidth usage per object. However, for small objects, this report is provided collectively. The only anomaly here was the large objects report being normal.

Assumptions

  • There is no change in the Publisher logic to cause sustained surges in bandwidth usage.
  • Our analytics monitors video play, and we did not see any increase in daily play metrics.
  • Bandwidth cost is a factor of the amount of bandwidth consumed when content is accessed from devices
  • No software issues as the only earlier fix included the src attribute in the <video> tag to enable video playback on devices with a native player.
  • High bandwidth consumption originates from large object transfers.

The Small object report and found that a single video’s download amounted to 4TB in a month!

The video in question is 10MB in size. This size meant that the video would have to be downloaded ~400,000 times – a huge number considering that the video is for a single product. However, the engagement metrics showed no similar impact. There was no change in publisher logic as well, and there was no change in the purging logic as well.

Calculations and issue resolution

While researching fixes for this issue, we studied the W3C standard and different browser behaviors. We also come across a beneficial blog explaining browser behavior and optimistic download of media.

Here are some statistics about video consumption with our client.

There are three videos from the eCommerce customer which are the top consumers of bandwidth. The customer has 32 sites (highest among our customers) and gets traffic from all corners of the globe. The CDN has approximately 68 edge locations worldwide. The videos are each ~10-11 MB in size. The Publisher logic triggers every ten minutes and runs 144 times in a day. We looked at some samples of how many times we purge entire ROOT of content within a day, and it was in the range of ~100.

Assuming a video size of 10MB that gets pushed to 60 edge locations. After every purge, there are ~3,000 video access counts each month. Hence, the total bandwidth consumed would be:

Amount of transfer within CDN: ORIGIN -> EDGE = 10 (video size) x 60 (edge locations) x 100 (number of purges) x 30 (days) = 1,800,000 MB

Amount of transfer from EDGE -> DEVICE = 3000 (access count) x 10 (video size)  = 30,000 MB

Total Data Transfer = = ~1.8 TB

This usage number is the same magnitude to what we see in reports – approximately 4TB. There would be aberrations due to the changing amount of edge locations and any other book-keeping data/metadata logic.

The report also unearthed more supporting arguments to uncover the cause for this increase:

There were too many TCP_MISS values in the Small Object report. A high number of these values being present means the object was not available in EDGE cache several times and had been fetched from ORIGIN.

We have high traffic from a desktop browser (98.83%) as “Mozilla/5.0” doesn’t seem to be from a mobile browser. Hence optimistic video download is taking place from Desktop browsers. It would be wrong to assume that “Mozilla” equates to the Firefox browser. Internet Explorer and Chrome constitute the majority of Desktop browsers and also use “Mozilla/5.0” as the User-Agent string.

We rolled back the “src” attribute fix, and the bandwidth consumption came back to normal. This fix stopped the optimistic download of videos by Desktop browsers and lowered bandwidth consumption.

Leave a Reply

About the Writer

  • Roopesh Kohad
    Senior Manager – Test Engineering, Synerzip

    Roopesh has more than 18 years of experience in information technology and is currently a Senior Project Manager with Synerzip. He is a seasoned Engineering leader with roots in Quality and Assurance. Roopesh’s areas of expertise include Project Management, Scrum Master, Test Engineering, Cloud Computing & DevOps. He holds a B.Tech in Computer Science.

How Can Synerzip Help You?

By partnering with Synerzip, clients rapidly scale their engineering team, decrease time to market and save at least 50 percent with our Agile development teams in India.