Happy Black History Month! Back to a data science/ML focused round up after an all-GameStop edition last week - which somehow became one of my most shared writeups!
The Five Best Things
O’Reilly media publishes some of the most widely read technology books, conducts conferences and operates a learning portal. As such, they have access to a lot of data to discern trends in developer tools. Interesting trends from this year’s report:
PyTorch (a Machine Learning framework from Facebook) is seeing rapid adoption
Kubernetes is cementing its role as the orchestration layer of cloud
Finally - cloud computing trends point heavily to multicloud (using more than one cloud provider)
Cloud computing is hybrid by nature. Think about how companies “get into the cloud.” It’s often a chaotic grassroots process rather than a carefully planned strategy. An engineer can’t get the resources for some project, so they create an AWS account, billed to the company credit card. Then someone in another group runs into the same problem, but goes with Azure. Next there’s an acquisition, and the new company has built its infrastructure on Google Cloud. And there’s petabytes of data on-premises, and that data is subject to regulatory requirements that make it difficult to move. The result? Companies have hybrid clouds long before anyone at the C-level perceives the need for a coherent cloud strategy. By the time the C suite is building a master plan, there are already mission-critical apps in marketing, sales, and product development. And the one way to fail is to dictate that “we’ve decided to unify on cloud X.”
Tecton.ai Blog: What is a feature store?
Tecton.ai, a startup focused on feature stores, explains what exactly this means. In Machine Learning, a “feature” is a characteristic of the data you are trying to analyze. Think - square footage, or number of rooms, when analyzing a data set of home prices.
Modern data companies rely on both legacy data (batch) and fresh incoming data (streaming). Feature stores enable unification of both data flows, and thus present a consistent definition of a feature, the most up to date data across an organization’s data teams, and monitor for operational health. In Tecton’s own words, a feature store -
Runs data pipelines that transform raw data into feature values
Stores and manages the feature data itself, and
Serves feature data consistently for training and inference purposes
A trend increasingly occurring in the industry is having a data scientist “embedded” within a team, v.s. a separately staffed data science team. Tools such as this which provide a “single pane of glass” are very critical in decentralized operations.
Another trend in the MLOps space is Data Observability. Monte Carlo is an observability company, whose CEO Barr Moses defines it as
A Data Observability layer literally “observes” data assets from end to end, alerting data engineers and analysts when issues arise so they can be addressed before they affect the business.
The key pillars of observability, and the questions they help us answer -
Freshness: Is my data up to date?
Distribution: Are there abnormalities in my incoming data (i.e. unexpected values, null values)?
Volume: Do I have too few or too many data records than expected?
Schema: Did the database organization change in a catastrophic way?
Lineage: Where did my pipeline break? Where did the data come from?
Jamin Ball: The Modern Data Cloud: Warehouse vs Lakehouse
Jamin Ball, a data platforms-focused VC at Altimeter Capital, presents an emerging trend in big data management and transformation: the data Lakehouse. This is in contrast to the data Warehouse approach of Snowflake.
A data lake stores ALL of the raw data of an organization.
A data warehouse stores data that has been Extracted from the data lake, Loaded into the warehouse, and then processed and Transformed into a form that can be readily analyzed. This process is called ELT or Extract - Load - Transform.
As you can see, Snowflake plays in the Business Intelligence (BI) and Analysis space where users rely on SQL (structured query language), while Databricks (which recently raised $1B at a $28B valuation) plays in the ML realm, where users use Spark / DataFrames.
Jamin predicts that Snowflake will start making lateral moves into the data science/ML space. Databricks on the other hand, is moving to vertically integrate the warehouse and data lake, in a new open source standard for building data lakes called ‘Delta Lake’. This preserves both the SQL and Spark/ DataFrame access methods and might look like this -
Oldie but Goodie
Some interesting pieces for Black History Month -
Anna Gifty: Sadie Alexander: Meet the First Black Woman Economist in the U.S. Sadie Alexander was the first black woman to earn a PhD in Economics in the U.S, in 1921. As of 2017, Black women still made up less than 0.6% of all doctoral recipients. Sadie subsequently received a law degree from U. Penn in 1927. Highly recommend following the work of economist Anna Gifty, and the Sadie Collective
She was the first woman to serve as secretary of the National Bar Association, the largest and oldest network of Black lawyers and judges. She marched alongside Dr. Martin Luther King Jr. in Selma, Alabama, in 1965 to demand civil rights for Black Americans. She advocated for re-centering the economy on Black women workers as one way to promote economic growth. She proposed desegregating the military and many other progressive political and economic remedies to the exclusion Black Americans faced. And she also served as the first national president of Delta Sigma Theta Sorority Incorporated, one of the largest Black greek-lettered organizations in the world.
WSJ: Roz Brewer to Bring Pandemic Experience to Walgreens at Pivotal Time Roz Brewer, previously CEO of Sam’s Club and second in command at Starbucks, is appointed CEO of Walgreens. She will become the only black woman leading a fortune 500 company, and at a pivotal time as vaccinations get mainlined.
Vox: 6 myths about the history of Black people in America Great read
WSJ: U.S. Backs Nigeria’s Former Finance Minister for Next WTO Director Ngozi Okonjo-Iweala a dual U.S-Nigerian citizen will become the first female director of the WTO. She spent 25 years in the World Bank and twice served at Nigeria’s finance minister.
Other pieces -
Glamour: ‘Revenge Bedtime Procrastination’ Is Real, According to Psychologists Of course this phenomena is real, how else do I write this thing every week?
WSJ: Pizza Hut Launches ‘Detroit-Style’ Pizza, and America Says ‘Huh?’ The rest of America has been missing out. Detroit-style square cut pizza is AWESOME. Consider this for your SuperBowl meal!
WSJ: Jamie Tarses Rose Fast as TV Executive, Then Hit Static The NBC wunderkind exec who is credited with Friends and Mad About You in the 90s, died this week at 56. She subsequently worked at ABC and had a prominent fall from grace. The WSJ article about her “downfall”, written in the 1999 just reeks of sexism and is a mirror to see how far we have come in 2021. Spoiler alert: not far enough.
NYTimes: America’s mothers are in crisis A few folks sent me this feature about working moms struggling in the pandemic, but I haven’t read it. That’s because I lived it, and do not want to remind myself of how traumatic the beginning of the pandemic was. Plus, any list of suggestions to improve this situation that doesn’t have re-starting safe in-person schooling at the very top, is bogus.
At this rate, we should be near herd immunity by mid-summer?
Disclaimer: The views and opinions expressed in this post are my own and do not represent my employer.