Auto feature engineering - Rapid feature harvesting using DFS and data engineering techniques
As machine learning adoption permeates across many business models, so is the need to deliver models at a much faster rate. Feature engineering arguably is one of the core foundations of model development cycle. While approaches like deep learning tend to take a different approach to feature engineering, it might not be exaggerating to say that feature engineering is the core construct which can make or break a classical machine learning model. Automating feature engineering would immensely shorten the time to market classical machine learning models.
Deep Feature Synthesis (DFS) is an algorithm that is implemented in the FeatureTools python package. DFS helps in rapid harvesting of new features by taking a stacking approach on top of a relational data model. DFS also has first class support for time dimensions as a fundamental construct. Some of these factors make the feature tools package a compelling tool/library for data practitioners. However the base algorithm itself can be enriched in multiple ways to make it truly appealing for many other use cases. This session will present a high level summary of DFS algorithmic constructs followed by enhancements that can be done on featuretools library to enable it for many other use cases
Outline/Structure of the Talk
The outline of the session would be as follows:
- Introduction - The audience would be given a general overview of the DFS algorithm and its key features
- Next the audience would be presented various mechanisms in which DFS could be enhanced and applied to enable various real life use cases.
- The last part of the session would concentrate on Auto SQL generation for the DFS harvested features.
The audience would be able to gain insights into how automated feature can be done by applying the DFS algorithm and how it can be enhanced to allow for more broader use cases.
Data Engineers, Data Scientists, Architects, Product Owners
Prerequisites for Attendees
The following are ideal to have prerequisites to get the best out of this session.
- Understanding of machine learning workflows
- Feature Engineering
- Model scoring
- Data pipelines