Our CEO, Vijaye, sometimes introduces me as the influencer at Statsig.
At first, I felt uneasy about being called an influencer. Among data scientists, “influencer” is at best considered a neutral term, and often is frowned upon.
The negative sentiment is justifiable, too.
Data Science is a serious field. We require rigor in our analysis, we need to be technical, and we go deep and thorough. We pay attention to details because tiny details often can flip the direction of our analysis.
Influencers, on the other hand, need to please a broad audience—and broad audiences generally aren’t interested in deep and thorough analysis. They want conclusions, stories, and twists.
In short, the best influencers are really good at clickbait.
This causes many of my coworkers and friends to deem influencers as people who value fame and popularity over truth; they twist ideas in pursuit of more impressions and subscribers. Generally speaking, influencers are usually not taken seriously.
Where do I fit in?
First things first, we are not engineers. We drive impact through other people—our stakeholders, our cross functions, and the decision-maker. Our analyses, no matter how good they are, are useless, unless they make decisions better.
But people have so many misconceptions about data. We are all similar to the broad internet audiences in some way. We all have limited patience, limited attention, and limited brain power. We chase stories and want conclusions. We are all bored by details to some degree.
That’s why we see phenomena like companies making bad decisions despite having good data science. Everyone lauds the benefits of data-driven decisions, but so few people do it right.
Throughout my career, I’ve seen too many data scientists helplessly trying to persuade their stakeholders to open their eyes and pay attention to what the data truly tells, trying to show better approaches, trying to correct mistakes, and getting frustrated—even reprimanded—over and over again.
What I’ve identified in the data science space is that there is a significant gap between the quality of data analysis and the quality of decisions made by companies.
To use a somber analogy, think of food banks that donate unused food to hungry people. Their chief complaint is always that, due to distribution issues, lots of food goes to waste—despite the large amount of hungry people who could benefit from it.
It’s a massive shame.
The same concept applies to data science, though. Despite a surplus of data science talent and insight, they still fail to make an impact, resulting in companies and decision-makers setting benchmarks and making decisions based on other factors.
Someone has to solve this head-on. Someone needs to grab the audience’s attention and repeat the basic concepts over and over again until they get internalized. And no, that’s not a sleight—it’s how humans learn.
Data scientists aren’t innocent in this conflict either.
Oftentimes, we’ll know that we are right, and we’ll know why the other party is wrong (and how they are wrong). At some point, though, we decide that they won’t listen, so we give up trying to convince them.
Sometimes it’s our ego, sometimes it’s a lack of consistency, and sometimes it’s our subconscious being afraid of being proven wrong. None of these we can afford.
So that’s why I launched the channel, and why Statsig is willing to allow me to invest a large percentage of my time (that I could spend improving our product) to do this.
I believe our product can only be as big as the problem, yet many people haven’t yet realized that there is a solution, and many more haven’t realized that there is a problem.
Most data science education is centered around tools.
Just like how a fisherman can only get better by fishing, and a hunter can only get better by hunting, we can’t be good data scientists by learning models in school. We learn the craft by solving real-world problems.
The best pieces of knowledge are from the masters in applications.
One of the most important pillars of this channel is to learn from these masters and share their knowledge and stories.
Here are some of my favorite videos (and collaborations) thus far:
This video explores the nuanced concept of retention in data science, emphasizing the distinction between junior and senior data scientists' understanding of the term.
It introduces the idea that retention analysis is more complex than it seems, involving three critical durations: the starting period, the ending period, and the gap between them. Without clear definitions of these durations, data scientists risk misinterpretation and misalignment in their analyses.
This video provides a detailed framework for defining retention, aiming to eliminate ambiguity by specifying the activity durations and the gap. Through a hypothetical example involving user retention for ChatGPT, I illustrate potential ambiguities in interpreting retention queries and stress the importance of precise definitions.
This video was important to me because, at Facebook, varying definitions of retention across teams highlighted the need for a standardized approach. This led to an internal effort to unify retention definitions across the company.
In this interview, I engaged with John Meakin, a Senior Staff Data Scientist at Meta, who has achieved remarkable career progression, securing two promotions within his initial two years at the company.
John attributes his success to consistent hard work, a deep understanding of his domain, creating significant impact, and fostering strong relationships. He emphasizes the importance of seizing opportunities and adeptly navigating the dynamics of promotions.
John also highlights the critical role of documenting work and the value of trust in advancing one's career. Throughout the discussion, John offers practical advice on focusing and specializing in one's field, leveraging mentorship for growth, and the strategies for identifying and capitalizing on substantial opportunities.
Moreover, he shares insights into overcoming challenges with persistence, convincing skeptics, learning from past experiences, and understanding the intricacies of career promotions.
I really enjoyed this chat because it touches on the balance between making an impact and exhibiting the right behaviors for promotion, underscoring the significance of consistency and passion in one's career journey.
In this video, Pushpendra, a seasoned data engineer with a rich background spanning 15 years, including stints at consulting firms, Amazon AWS, and Statsig, shares the experience he’s gained on his journey.
Pushpendra delves into the nuances of data engineering across different work environments, comparing the experiences at big tech companies to those at startups. In this video, we discuss the evolution of data modeling and the critical challenge of maintaining data quality, emphasizing how workplace culture significantly influences these aspects.
He also stresses the importance of aligning technology choices with business requirements and finding a balance between managing vast data volumes and addressing business complexity.
Lastly, Pushpendra offers insights into the specialized realm of business-specific data modeling and reflects on how technological advancements have reshaped this process. All throughout the conversation, he provides a deep dive into the core components of data engineering, the role it plays in today's tech landscape, and its challenges and opportunities.
I chose this video because I specifically appreciate the insight it provides into data science engineering, which I believe is useful info for data engineers regardless of tenure.
I want to take this opportunity to deliver a sincere thank you to everyone who has given my videos a chance. As a career data scientist, this is a topic that is very close to my heart, and your support means the world to me.
Going forward, my only mission is to create better content, address more issues head-on, and teach as many people about data science as possible.
You can even call it “influencing,” if you insist.
VWO (Visual Website Optimization) is a great platform for website testing, conversion optimization, and more. How does it stack up against competitors like Optimizely?
“Pricing experiments,” once considered a tactic available only to the major online merchants, have been adopted as a core component within the e-commerce playbook.
Hyper-personalization often involves a dual approach: quantitative metrics for robust performance analysis and insights from customer feedback to enhance satisfaction.
Optimizely is a marketing and website experimentation platform that extends into the full-stack development space—but how does it stack up against the competition?
How Statsig aims to introduce every San Franciscan to the tools that simplify data-driven product development at fast-growth startups and Fortune 500s alike.
With warehouse-native experimentation, all statistical analysis across metrics occurs on the warehouse resources, and results are written locally to a designated table.