Hila Weisman-Zohar & Adi Schwartz

A/B Testing in the Wild

Hila Weisman-Zohar
Adi Schwartz

Hila Weisman-Zohar & Adi Schwartz

A/B Testing in the Wild

Hila Weisman-Zohar
Adi Schwartz

Bio

Hila: Hila has been processing, analyzing, and generating algorithms for the past decade. After earning her masters (summa cum laude) at BIU NLP lab and publishing at elite academic venues such as EMNLP, she began to research & develop algorithms that analyze call center calls as a senior researcher at NICE. She published 4 US patents and presented academic posters at various venues during that time. For the past two years, she has worked as DS Guild Master & algorithm engineer at Outbrain, where she works on large-scale super-fast algorithms for the native ads field. Hila also loves to teach and share her experience and has talked at various meetups and conferences.

Adi: Adi Schwartz is a Data Scientist in the AutoML team at Outbrain. Adi develops machine learning infrastructure and researches, develops, and deploys ML models. Adi is very curious and loves understanding data and how algorithms are implemented. Adi loves binge-watching, scuba diving, and trekking high and snowy mountains when she’s not working.

 

Bio

Hila: Hila has been processing, analyzing, and generating algorithms for the past decade. After earning her masters (summa cum laude) at BIU NLP lab and publishing at elite academic venues such as EMNLP, she began to research & develop algorithms that analyze call center calls as a senior researcher at NICE. She published 4 US patents and presented academic posters at various venues during that time. For the past two years, she has worked as DS Guild Master & algorithm engineer at Outbrain, where she works on large-scale super-fast algorithms for the native ads field. Hila also loves to teach and share her experience and has talked at various meetups and conferences.

Adi: Adi Schwartz is a Data Scientist in the AutoML team at Outbrain. Adi develops machine learning infrastructure and researches, develops, and deploys ML models. Adi is very curious and loves understanding data and how algorithms are implemented. Adi loves binge-watching, scuba diving, and trekking high and snowy mountains when she’s not working.

Abstract

A/B testing is a powerful tool in any Data Scientist’s arsenal. It enables us to construct hypotheses and make careful changes to our users’ experiences while collecting results. This allows us to learn why certain elements of our experiments impact user behavior. In online A/B testing, A refers to ‘control’ or the original testing variable, and B refers to ‘variation’ or a new version. Both control and variant are shown to users at random, and statistical analysis determines which variation performs better for a given KPI or business goal. As visitors are served either the control or variant, their engagement with each experience is measured and collected in a dashboard and analyzed through a statistical engine. We can then determine whether changing the experience had a positive, negative, or neutral effect on visitor behavior. But A/B testing can also be complex. If you’re not careful, you can make incorrect assumptions regarding the distribution of traffic, the statistical test, or even the hypothesis rejection. We want to use this roundtable as an opportunity to have an open discussion on the different perspectives and ways to tackle the problems of online A/B testing. What works for your specific use case? How often do you run A/B tests? When do you decide to stop? Etc.

Abstract

A/B testing is a powerful tool in any Data Scientist’s arsenal. It enables us to construct hypotheses and make careful changes to our users’ experiences while collecting results. This allows us to learn why certain elements of our experiments impact user behavior. In online A/B testing, A refers to ‘control’ or the original testing variable, and B refers to ‘variation’ or a new version. Both control and variant are shown to users at random, and statistical analysis determines which variation performs better for a given KPI or business goal. As visitors are served either the control or variant, their engagement with each experience is measured and collected in a dashboard and analyzed through a statistical engine. We can then determine whether changing the experience had a positive, negative, or neutral effect on visitor behavior. But A/B testing can also be complex. If you’re not careful, you can make incorrect assumptions regarding the distribution of traffic, the statistical test, or even the hypothesis rejection. We want to use this roundtable as an opportunity to have an open discussion on the different perspectives and ways to tackle the problems of online A/B testing. What works for your specific use case? How often do you run A/B tests? When do you decide to stop? Etc.

Discussion Points

  • Data scientist role definitions – full stack data scientists vs. specialisations
  • Pure data science teams vs embedded teams
  • Data science reporting lines
  • Professional and personal development in embedded teams

Discussion Points

  • Data scientist role definitions – full stack data scientists vs. specialisations
  • Pure data science teams vs embedded teams
  • Data science reporting lines
  • Professional and personal development in embedded teams

Planned Agenda

8:45 Reception
9:30 Opening words by WiDS TLV ambassadors Or Basson and Noah Eyal Altman
9:40 Dr. Kira Radinski - Learning to predict the future of healthcare
10:10 Prof. Yonina Eldar - Model-Based Deep Learning: Applications to Imaging and Communications
10:40 Break
10:50 Lightning talks
12:20 Lunch & Poster session
13:20 Roundtable session & Poster session
14:05 Roundtable closure
14:20 Break
14:30 Dr. Anna Levant - 3D Metrology: Seeing the Unseen
15:00 Aviv Ben-Arie - Counterfactual Explanations: The Future of Explainable AI?
15:30 Closing remarks
15:40 End