I’ve spent half my career as a researcher in R&D working in large multinational CPG organizations. During my stint in these companies, I became immersed in the culture, people, and process of new product development. This experience was eye-opening, it cultivated a passion for bringing products to life from concept to commercialization. More importantly, I developed an understanding of how these complex organizations deliver value to the business. Value creation stemmed from a combination of innovation and productivity initiatives. I enjoyed working on both but I relished the productivity projects that led to cost savings as well as applied innovation.
Why is productivity such a critical part of these large companies? Business and processes will always have inefficiencies. This is certainly true in the case of market research video analysis. In fact, the process is so inefficient that we are wasting qualitative data because no one has the time, budget, and technology to make sense of it all. This is a dangerous position to be in; research budgets are unlikely to sustain or increase video data collection if 80% what’s collected isn’t being synthesized.
It’s hard not to waste qualitative research video
The process for analyzing video is manual, time-consuming, and costly. Armed with my experience at Avon, over a year ago my team and I set out to change the way market researchers analyze video. In the past year, we’ve proven that machine learning can significantly reduce the time and money spent analyzing long-form video content.
To address the abundance of video data the qualitative research industry collects, we approached the problem like a productivity project with some innovation sprinkled in. The first step was to frame the problem, manual time and cost associated with video analysis, in a way that could be represented as a classification problem. The goal of video analysis of long-form video content is to cull that video down into small vignettes to support a business objective. This isn’t too different from manufacturing lines, where large raw materials are automatically combined and processed to create a smaller, more consumable product. Why couldn’t we do the same for video analysis?
Rules help us classify.
Classification can be performed via machine learning models.
Machine learning models can help classify data at a speed and scale beyond humans.
Within the manufacturing process, there are rules that classify the consumables as acceptable or not. In the case of video analysis, we know that the desired consumable is video highlights. When researchers manually create clips from long-form video content, they are in effect classifying the video content as a highlight or not.
True or False
Hotdog or Not Hotdog
Highlight or Not Highlight
Choosing the right tools for the job
Video is difficult to analyze, oftentimes video quality can be a significant barrier to extracting meaning. For this, and several others we chose to use text as the starting point for video highlight identification. There were numerous benefits to approaching the problem using Natural Language Processing (NLP), chief among them is that text allows us to enrich the highlights further with more traditional analysis like sentiment, entities, and key phrases. The tools available for manipulating and transforming text have become very efficient. So efficient, it enabled us to wrangle and process 800K anonymized transcripts to train a binary classifier.
The highlight model is a Long Short Term Memory, Recurrent Neural Net. Its architecture is complex and took many months to refine but its output is deceptively simple. It has one job: to read the text and classify the sentences as a highlight. The classification is based on a new high utility metric we’ve created aptly named…the H-Score. The scale for this metric ranges from 0-1, every sentence in a transcript is assigned a numerical score on the scale.
Productivity is the mother of all innovation
Over the last year, we’ve not only built and patented the highlight model, we’ve achieved significant productivity gains in video analysis. Based on the current industry standards for video analysis the highlight model and its associated pipeline reduces the time and cost associated with long-form qualitative video analysis by roughly 80%.
Beyond the productivity gains, is when the fun really begins. We started layering additional NLP techniques that enrich the highlights identified. In under 24 hours, we are able to transcribe 18 hours of video content, identify the highlights in the text transcript, provide sentiment, entities, and key phrases then archive all of that insight.
Just like sports highlights have become the standard in the broadcasting industry, we feel that highlights at scale will play a critical role in giving researchers modern technology to address the ever-growing volume of qualitative data that’s becoming available. We look at the highlight not merely as a new research tool but a new status quo. Whereby we as an industry now have the technology to make sense of every single minute of video we collect, not just a subset. It is a platform technology that we are continually enriching with innovative NLP techniques aimed at not only analysis but designed for data discovery. At the end of the data is our most valuable asset and we shouldn’t be wasting it.