« Back to News

Methodic Troubleshooting Leads to Cost Savings

The genesis of significant issues and problems more often lies in the most innocuous of changes. Such is this story where a small change in the code led to bandwidth costs in the thousands for an e-commerce customer.

After persevering for a few days and questioning several assumptions, the issue was resolved leading to significant cost savings.

The challenge

The client has an end-to-end Video Commerce solution comprising video production work, a content management system (CMS), and a customized HTML5 player. The client’s solution is based on a javascript framework that dynamically embeds videos from its client’s CMS (who are large eCommerce companies in the US).

Our client had noticed that the cost of bandwidth acquired from the CDN provider had doubled within a single month.

This cost is calculated based on a combination of the following factors:

  • Default HTML5 behavior
  • The video player software
  • Content publisher logic
  • How the CDN works and reports usage data, and,

HTML5 behavior has the src attribute in the <video> tag for all videos. In HTML5, the <video> tag accepts an attribute called preload. Preload can have values such as auto, metadata and none. By default, browsers consider the auto value if none is provided. Each browser interprets this value differently. A video with the auto value in preload optimistically downloads without anybody requesting a playback. Additionally, to avoid certain inefficiencies, the publisher logic purges the entire ROOT from the edge locations. Videos move from the ORIGIN to the EDGE repeatedly due to this purge. The CDN billing calculates bandwidth on the move from ORIGIN to EDGE and from EDGE to the DEVICE.

Due to this, videos were downloading to edge CDN locations.

Cost Analysis and Assumptions

  • There is no change in the Publisher logic to cause sustained surges in bandwidth usage.
  • The analytics monitors video play, however no increase in daily play metrics is observed.
  • Bandwidth cost is a factor of the amount of bandwidth consumed when content is accessed from devices
  • No software issues and including the src attribute in the <video> tag enables video playback on devices with a native player.
  • High bandwidth consumption originates from large object transfers.

The Small object report and found that a single video generates over 4TB of CDN traffic in a month!

The video in question is 10MB in size. This size meant that the video would have to be downloaded ~400,000 times – a huge number considering that the video is for a single product. However, the engagement metrics showed no similar impact. There was no change in publisher and purging logic as well.

Calculations and issue resolution

While researching fixes for this issue, we studied the W3C standard and different browser behaviors. We also come across a beneficial blog explaining browser behavior and optimistic download of media.

Here are some statistics about video consumption with our client.

There are three videos from the eCommerce customer which are the top consumers of bandwidth. The customer has 32 sites (highest among our customers) and gets traffic from all corners of the globe. The CDN has approximately 68 edge locations worldwide. The videos are each ~10-11 MB in size. The Publisher logic triggers every ten minutes and runs 144 times in a day. Approximately 100 times in a day the entire ROOT content is purged. 

Assuming a video size of 10MB that gets pushed to 60 edge locations. After every purge, there are ~3,000 video access counts each month. Hence, the total bandwidth consumed would be:

Amount of transfer within CDN: ORIGIN -> EDGE = 10 MB (video size) x 60 (edge locations) x 100 (number of purges) x 30 (days) = 1,800,000 MB

Amount of transfer from EDGE -> DEVICE = 3000 (access count) x 10 MB (video size)  = 30,000 MB

Total Data Transfer = = ~1.8 TB

This usage number is the same magnitude to what we see in reports – approximately 4TB. There would be aberrations due to the changing amount of edge locations and any other book-keeping data/metadata logic.

The report also unearthed more supporting arguments to uncover the cause for this increase:

There were too many TCP_MISS values in the Small Object report. A high number of these values indicates unavailability of the object in EDGE cache but has been fetched from the ORIGIN.

Also, more than 98% of traffic comes from a Desktop browser. Hence optimistic video download takes place from desktop browsers. Rolling back the “src” attribute fix, and the bandwidth consumption came back to normal. This fix stopped the optimistic download of videos by Desktop browsers and lowered bandwidth consumption.

Leave a Reply

About the Writer

  • Roopesh Kohad
    Senior Manager – Test Engineering, Synerzip

    Roopesh has more than 18 years of experience in information technology and is currently the Director of Engineering at Synerzip. He is a seasoned engineering leader with roots in Quality and Assurance. Roopesh’s areas of expertise include Project Management, Scrum Master, Test Engineering, Cloud Computing & DevOps. He holds a B.Tech in Computer Science.