Building Rome Every Day - Scaling ML Model Building Infrastructure
"I want to reset my password". "I ordered the wrong size". "These are not the droids I was looking for". Every day, a support agent fields thousands of these queries. Multiply that by the thousands of agents a company might have, and the sheer vastness of data being generated becomes hard to imagine. How can we make sense of it all? It seems a formidable task, but we have a formidable weapon in our arsenal—we have machine learning.
By combining deep learning, natural language processing and clustering techniques, we built a machine learning model that can take 100,000 tickets and efficiently cluster and summarise them into digestible topics. But that's only part of the challenge; we also had to scale it to build for 30,000 customers, in production, every day.
In this talk I'll share the story of Content Cues - Zendesk's latest Machine Learning product. It's the story of how we leveraged the power of AWS Batch to scale a model building platform. Of how we tackled challenges such as measuring how well an unsupervised model performs when it's not even clear what "well" means. Of how our team combined our pool of skills across data engineering, data science and product management to deliver a pipeline capable of building a thousand models for the price of a cup of coffee.
Outline/Structure of the Case Study
- Introduction - what is Content Cues?
- Model building at scale
- Why we chose AWS Batch to build models
- How we harness AWS Batch to build 30,000 models every day
- Monitoring - empowering not only engineers, but also data scientists and product managers
- Keeping on top of the status of our batch jobs
- Measuring unsupervised ML model performance
- Tracking our costs with CloudHealth
- Using real-life examples from our journey to illustrate how data scientists, data engineers and product managers work together to win
Takeaways from this talk include:
- A real-world experience of using elastic cloud computing to build machine learning models
- Insight into our journey of productionising a pipeline that builds unsupervised models at scale
- Ideas and practices that worked effectively for us and lessons learned along the way
Software engineers, data scientists, product managers, any member of a data product team, or anyone else interested in taking ML products to market at scale.