YOW! Data 2019 Day 1

Mon, May 6
Timezone: Australia/Sydney (AEST)
08:00

    Registration for YOW! Data 2019 - 45 mins

08:45

    Session Overviews & Introductions - 15 mins

09:00
09:50
10:20

    Morning Break - 30 mins

10:50
11:25
  • Added to My Schedule
    keyboard_arrow_down
    Enrique Bustamante

    Enrique Bustamante - From Sparse Data-sets to Graphs: When Explicit Relationships Bridge the Gaps

    schedule  11:25 - 11:55 AM place Red Room people 82 Interested star_halfRate

    The various ways we frame a problem have significant impact on how we approach it. From the questions we ask to the tools we use. A simple change can have great repercussions and be of significant benefit to how you tackle the issue at hand.

    At EB Games, in trying to develop a customer segmentation/targeting model, we struggled with traditional clustering algorithms on tabular data. All the necessary information was there, but the resulting datasets were so sparse that results were difficult to come by. When we changed to a graph based model, it greatly increased the ease with which we could ask questions and add further detail. The power afforded to us by having explicit relationships meant that suggestions and ideas from subject matter experts were more easily translated into something that could be quantified and/or qualified.

    In this presentation I will share the process and journey of this project and provide insights on the benefits gained from using a different structure to store and analyse your data.

  • Added to My Schedule
    keyboard_arrow_down
    Suneeta Mall

    Suneeta Mall - The Three-Rs of Data-Science - Repeatability, Reproducibility, and Replicability

    schedule  11:25 - 11:55 AM place Green Room people 60 Interested star_halfRate

    Adaptation of data-science in industry has been phenomenal in the last 5 years. Primary focus of these adaptations has been about combining the three dimensions of machine-learning i.e. the ‘data’, the ‘model architecture’ and the ‘parameters’ to predict an outcome. Slight change in any of these dimensions has potential to skew the predicted outcomes. So how do we build trust with our models? And how do we manage the variances across multiple models trained on varied set of data, model-architectures and parameters? Why the three Rs i.e. “Repeatability, Reproducibility, and Replicability” may have a relevance in industry application of data-science?

    This talk has following goals:

    • Justify (with demonstrations) as to why “Repeatability, Reproducibility, and Replicability” is important in data-science even if the application is beyond experimental research and is geared towards industry applications.
    • Discuss in detail the requirements around ensuring “Repeatability, Reproducibility, and Replicability” in data-science.
    • Discuss ways to observe repeatability, reproducibility, and replicability with provenance and automated model management.
    • Present various approaches and available tooling pertaining to provenance and model managements and compare and contrast them.
12:00
  • Added to My Schedule
    keyboard_arrow_down
    Diana Mozo-Anderson

    Diana Mozo-Anderson - How we Built a Predictive Model to which an Online Advertising Titan Wanted to Learn From

    schedule  12:00 - 12:30 PM place Red Room people 62 Interested star_halfRate

    No longer is the ‘spray and pray’ methodology for finding customers working. No more is spamming people with numerous, unsolicited emails effective. Never again will 'stalking' with clumsy banners be cutting edge. Today it’s all about a strategy based on finding the right prospects, using the right channels, at the right time AND making them feel like they found you – not the other way around.

    I will walk attendees through Marketing Science in the era of big data. We’ll begin with defining an ‘ideal/value customer’ , and - spoiler alert - it is not set in stone, smart tracking elements and AI models will allow your company to create your own portraits of the perfect customer and adjust it as you learn more, then you will know every little thing about them – inside and out.

    With this knowledge half the journey is complete. We then capture those customers – at the right time and at the right place. How? We work to understand their behaviour, we capture their signals, we leverage advertising platform optimisation models, and we let the magic of data science do its thing – and shine. Every competitive advertising platform today incorporates optimisation models. I have extensive experience with some of them (Facebook, Google, Instagram) and I want to share what I have learnt and how you can take advantage of a gigantic data science effort put into those smart machines. To illustrate all of this we’ll go through various approaches, & will conclude will the model I built; the one ultimately probed & used with data scientists – at one of the biggest online advertising platforms in the world.

  • schedule  12:00 - 12:30 PM place Green Room people 76 Interested star_halfRate

    Day 1: one engineer vs. a heap of time-series data on a 1990s-era database
    Four years on, there's 8 of us, we run TensorFlow analytics on a Hadoop cluster to detect subtle signs of a potential breakdown on earthmoving equipment. We've prevented million-dollar component failures, and reduced a lot of "parasite" stoppages.

    This talk details the strategy and lessons learned from building an analytics department from scratch, in particular:

    • Many analytics depts. were created as a "Flavour of the month". How do you approach this perception, survive and go beyond?
    • Choosing the right projects to create a credible and sellable offering as quickly as possible to build your reputation.
    • Expectation management, and choosing projects: Dealing with those who think "it won't work", and those who think you can solve all problems,
    • Growing from a "start-up in a large company" to a more mature group. Change management, scaling, velocity, etc.
    • Approach to R&D and launching new projects, dealing with the "shiny toys"
12:30

    Lunch Break - 60 mins

13:30
14:05
14:40
  • Added to My Schedule
    keyboard_arrow_down
    Ananth Gundabattula

    Ananth Gundabattula - Auto feature engineering - Rapid feature harvesting using DFS and data engineering techniques

    schedule  02:40 - 03:10 PM place Red Room people 63 Interested star_halfRate

    As machine learning adoption permeates across many business models, so is the need to deliver models at a much faster rate. Feature engineering arguably is one of the core foundations of model development cycle. While approaches like deep learning tend to take a different approach to feature engineering, it might not be exaggerating to say that feature engineering is the core construct which can make or break a classical machine learning model. Automating feature engineering would immensely shorten the time to market classical machine learning models.

    Deep Feature Synthesis (DFS) is an algorithm that is implemented in the FeatureTools python package. DFS helps in rapid harvesting of new features by taking a stacking approach on top of a relational data model. DFS also has first class support for time dimensions as a fundamental construct. Some of these factors make the feature tools package a compelling tool/library for data practitioners. However the base algorithm itself can be enriched in multiple ways to make it truly appealing for many other use cases. This session will present a high level summary of DFS algorithmic constructs followed by enhancements that can be done on featuretools library to enable it for many other use cases

  • Added to My Schedule
    keyboard_arrow_down
    Reza Yousefzadeh

    Reza Yousefzadeh - Bringing Continuous Delivery to Big Data Applications

    schedule  02:40 - 03:10 PM place Green Room people 56 Interested star_halfRate

    In this presentation I will talk about our experience at SEEK implementing Continuous- Integration & Delivery (CI/CD) in two of our Big Data applications.

    I will talk about the Data Lake project and its use of micro-services to break down data ingestion and validation tasks, and how it enables us to deploy changes to production more frequently. Data enters SEEK’s data lake through a variety of sources, including AWS S3, Kinesis and SNS. We use a number of loosely coupled serverless microservices and Spark jobs to implement a multi-layer data ingestion and validation pipeline. Using the microservices architecture enables us to develop, test and deploy the components of the pipeline independently and while the pipeline is operating.

    We use Test-Driven Development to define the behaviour of micro-services and verify that they transform the data correctly. Our deployment pipeline is triggered on each code check-in and deploys a component once its tests pass. The data processing pipeline is idempotent so if there is a bug or integration problem in a component we can fix it by replaying the affected data batches through the component.

    In the last part of the talk, I’ll dive deeper into some of the challenges we solved to implement a CI/CD pipeline for our Spark applications written in Scala.

15:10

    Afternoon Break - 30 mins

15:40
16:15
16:50
  • Added to My Schedule
    keyboard_arrow_down
    Gareth Seneque

    Gareth Seneque - Search at Scale: Using Machine Learning to Automate Content Metadata

    schedule  04:50 - 05:20 PM place Red Room people 61 Interested star_halfRate

    For media organisations, reach is everything. Getting eyeballs and ears in front of content is their raison d'être.

    Search plays a critical role in connecting audiences with t-1 content (yesterday's news, last week's podcast). However, with audience expectations conditioned by Google and others, it is challenging to deliver robust, scalable search that people actually want to use.

    The relevance of your results is everything, and to produce relevant results you need good metadata for every object in your search index. With hundreds of thousands of content objects and an audience of millions, the ABC has unique challenges in this regard.

    This talk will explore the ABC's use of Machine Learning (ML) to automatically generate meaningful metadata for pieces of content (audio/video/text), including AWS MLaaS for full transcripts of audio podcasts and a platform developed in-house for NLP tasks such as entity recognition and automated document summarisation, and image-related tasks such as segmentation and tagging.

  • Added to My Schedule
    keyboard_arrow_down
    Larene Le Gassick

    Larene Le Gassick - Data Driven Diversity

    schedule  04:50 - 05:20 PM place Green Room people 43 Interested star_halfRate

    Since January, we have RSVPd “yes” 26,610 times to tech Meetups in Brisbane*. That’s a lot of pizza.

    Every Monday, I run a script which posts in the #meetup channel of the Brisbane Developers Slack group. It's a simple Node script that calls the Meetup API, and lists every tech event in Brisbane for the following week. The script was conceived from curiosity, a want to share information, and because I'm a stats nerd.

    Apart from writing code, I also co-host two Brisbane tech Meetups: Women Who Code and CTO School. In 2018, these user groups have grown from a humble handful of regulars, to almost standing room only.

    I will share with you a statistical analysis of a years worth of Brisbane tech Meetup data (updated for YOW! Data 2019), the secret life of a Meetup organiser, and how the transparency of information (such as speaker gender ratio) has started to affect change in this community, for the better.

    *data from Jan 1 to Sept 30, 2018

17:30

    Conference Drinks, Canap├ęs & Networking at The Arthouse Hotel, 275 Pitt Street - 60 mins

YOW! Data 2019 Day 2

Tue, May 7
Timezone: Australia/Sydney (AEST)
08:45

    Session Overviews & Introductions - 15 mins

09:00
09:50

    Sketch algorithms - 30 mins

10:20

    Morning Break - 30 mins

10:50
11:25
12:00
12:30

    Lunch Break - 60 mins

13:30
  • Added to My Schedule
    keyboard_arrow_down
    Paris Buttfield-Addison

    Paris Buttfield-Addison / Mars Geldard - Game Engines and Machine Learning: Training a Self-Driving Car Without a Car?

    schedule  01:30 - 02:00 PM place Red Room people 83 Interested star_halfRate

    Are you a scientist who wants to test a research problem without building costly and complicated real-world rigs? A self-driving car engineer who wants to test their AI logic in a constrained virtual world? A data scientist who needs to solve a thorny real-world problem without touching a production environment? Have you considered AI problem solving using game engines?

    No? This session will teach you how to solve AI and ML problems using the Unity game engine, and Google’s TensorFlow for Python, as well as other popular ML tools.

    In this session, we’ll show you ML and AI problem solving with game engines. Learn how you could use a game engine to train, explore, and manipulate intelligence agents that learn.

    Game engines are a great place to explore ML and AI. They’re wonderful constrained problem spaces, tiny little ecosystems for you to explore a problem in. Here you can learn how to use them even though you’re not a game developer, with no game development experience required!

    In this session, we’ll look at:

    • how video game engines are a perfect environment to constrain a problem and train an agent
    • how easy it is to get started, using the Unity engine and Google’s TensorFlow for Python
    • how to build up a model, and use it in the engine, to explore a particular idea or problem
    • PPO (proximal policy optimisation) for generic but useful machine learning
    • deep reinforcement learning, and how it lets you explore and study complex behaviours

    Specifically, this session will:

    • teach the very basics of the Unity game engine
    • explore how to setup a scene in Unity for both training and use of a ML model
    • show how to train a model, using TensorFlow (and Docker), using the Unity scene
    • discuss the use of the trained model, and potential applications
    • show you how to train AI agents in complicated scenarios and make the real world better by leveraging the virtual

    We’ll explore fun, engaging scenarios, including virtual self-driving cars, bipedal human-like walking robots, and disembodied hands that can play tennis.

    This session is for non-game developers to learn how they can use game technologies to further their understanding of machine learning fundamentals, and solve problems using a combination of open source tools and (sadly often not open source) game engines. Deep reinforcement learning using virtual environments is the beginning of an exciting new wave of AI.

    It’s a bit technical, a bit creative.

  • Added to My Schedule
    keyboard_arrow_down
    Mel Flanagan

    Mel Flanagan - Transparent Government: The Stories we can Tell with Data

    schedule  01:30 - 02:00 PM place Green Room people 50 Interested star_halfRate

    There is an increasing and powerful global push to open up the trove of information governments generate, collect, and manage. There is a vast array of data ranging from; open data, big data from multiple sources, sensitive information about citizens, and complex information about businesses interactions with government, such as contracts and procurement, taxes and royalties.

    This creates many opportunities to use this data to tell stories. To help citizens and the private sector understand what is happening across governments, means not just access to this data, but tools that allow people to dig through, analyse it, use diagrams and maps to make sense of it and to connect information in a way that is understandable, engaging and useful to the general public.

    This talk shows some of the possibilities available today and looks at what may be possible as governments around the world open up. It shows some of the ground-breaking work done in Australia. The talk touches on issues around transparency, accountability, systems of protection, data ethics frameworks, and ultimately how to build trust.

    Throughout, we'll see some of the work by Nook Studios building systems for government, including the ground-breaking Common Ground mining title information system, as well as tools that help link and connect information in a meaningful way using data pathways.

14:05
14:40
15:10

    Afternoon Break - 30 mins

15:40
  • schedule  03:40 - 04:10 PM place Red Room people 77 Interested star_halfRate

    Geometric Deep Learning (GDL) is a fast developing machine learning specialisation that uses the network structure underlying the data to improve learning outcomes. GDL has been successfully applied to problems in various domains with network-structured data, such as social science, medicine, media, finance, etc.

    Inspired by the success of neural networks in domains such as computer vision and natural language processing, the core component driving GDL is the graph convolution operator. This operator is used as the building block for deep learning models applied to networks. This approach takes advantage of many algorithmic and computational developments from modern neural network research and practice – such as composability, optimisation, and end-to-end training – to improve predictive performance.

    However, there is a lack of tools for geometric deep learning targeting data scientists and machine learning practitioners.

    In response, CSIRO’s Data61 has developed StellarGraph, an open source Python library. StellarGraph implements a number of state-of-the-art methods for GDL with a clean and consistent API. Furthermore, StellarGraph is designed to make the application of GDL algorithms to network-structured data easy to integrate with existing machine learning workflows.

    In this talk, we will start with an overview of GDL and its real-world applications. Then we will introduce StellarGraph with a focus on its design philosophy, API and analytics workflow. Finally, we will demonstrate StellarGraph’s flexibility and ease-of-use for developing solutions targeting important applications such as product recommendation and social network moderation. Lastly, we will touch on the challenges of designing and implementing a library for a fast evolving machine learning field.

  • Added to My Schedule
    keyboard_arrow_down
    Linda McIver

    Linda McIver - The Sceptical Data Scientist

    schedule  03:40 - 04:10 PM place Green Room people 49 Interested star_halfRate

    More and more decisions are data driven now - and that’s awesome! Much better than ideologically driven, or personality driven, or “whatever mood management’s in today” driven. But it does mean we want to be confident of our analyses. And there’s a tendency to have deep faith in data science. “Look! I did a calculation! It must be true!” Numbers don’t lie. And maths is reliable.

    But so much depends on the questions we ask, how we ask them, and how we test those results.

    So how do we create a generation of sceptical data scientists? Whose first approach to a result is to challenge it. To try to disprove it.

    We need to give them confidence in their skills, but teach them to doubt their own work.

16:15
16:50