By Greg Mabrito, Director of Data and Analytics at Slickdeals
Big data is a blessing and a curse. As the director of data and analytics at a large, user-driven deal-sharing company, my team has a steep hill to climb. We started with an on-prem server to manage our data in 2012. Today, it’s a very different situation. We have continually added new tools to our stack to better parse and leverage the data our teams need to make critical business decisions, and in 2020 we made some major changes to fully modernize our data architecture.
Here’s what we’ve gleaned on our journey and the key changes we’ve made over the last year or so to empower users across the business with access to fresh and verified data.
In Pursuit Of Processing Power
Like many organizations, we deal with a massive amount of data every day. To give you a sense of scale, we’re talking about one billion visits annually and 12 million unique monthly users across our services. Slickdeals is the top external traffic referrer for Amazon, eBay, and Walmart. We ingest data from many different sources across the internet, and we are continually updating the deal information we provide our users with. To them, up-to-date data is the only valuable data.
This is also true internally. Our goal always has been for employees across the company to easily access data and gain business insights without needing to learn to code or forcing them to use a specific tool. From Excel to Power BI to Tableau, we’ve always made it our mission to make data accessible to people when and how they need it.
As you might imagine, a constant theme for us has been working to handle ever-growing volumes of data. By 2019, we had effectively pushed the limits of SQL Server Analysis Services (SSAS). Trying to store huge volumes of data via SSAS while enabling users to create pivot tables without filters led to serious slowdowns and user experience problems.
No Lift And Shift Here
In 2020, we realized it was time to move to the cloud. Rather than approaching this as a pure “lift and shift” exercise, however, we recognized an opportunity to rethink our data infrastructure and strategy from the ground up.
Our goal was to modernize our entire data strategy. We studied exactly what we needed for our business and, from there, started customizing the stack to meet our needs. There were a few keys to success here, which I’ll cover below.
Event-Driven Architecture: Connecting Loosely Coupled Systems
One of the most important software architecture changes we made was to switch over to an event-driven architecture. This enables us to use any relevant or significant change in data to push updates to decoupled or loosely coupled systems. Kafka has been a large part of our strategy and stack here. This enables us to publish events out to the business and ingest them in Snowflake (our data warehouse), increasing speed, scalability, and resilience across our entire data stack.
The Semantic Layer: A Single Source Of Truth
Data is complex, in part because every department in a business speaks a different “language.” A simple term like “employee” can have different meanings to different people. Does “employee” include contractors? Does it include part-time employees? How do you accurately calculate employee count with these nuances in play?
This is why a semantic layer is so important when it comes to empowering all business users to extract value from data. A semantic layer pre-defines specific types of data so that everyone across the organization can rely on a single definition. A semantic layer also makes it so those who need access to data do not need in-depth knowledge of how calculations work; they simply need to understand their business domains.
The concept of a semantic layer has been around for a long time (originally patented in 1991), and it has evolved along with the challenges of managing Big Data. Data is increasing in magnitude, speed, and diversity. Not surprisingly, this makes it very challenging to establish certainty. A semantic layer can provide a single source of truth, enabling business users to agree upon definitions for terms like “employee,” “customer,” or “net sales.”
In other words, a semantic layer provides a single definition, so that when data is queried and an answer returned, it can be trusted as the truth. Applying AtScale’s semantic layer to our Big Data allowed us to accomplish exactly that.
Self-Service Analytics: No Coding Required
To drive the point home on why this semantic layer matters so much, I want to go back to my earlier point about enabling all users to extract value from data. Having a semantic layer in place means IT and development are not standing in the way of access to data.
Users across all departments can ask questions and get answers from data without bottlenecks or slowdowns. This gives them independence from the tech team, while still empowering them with confidence that the answers they derive are accurate, truthful, and meaningful—and that they will translate across lines of business and job functions.
Empowering The Business To Make Smarter Data-Driven Decisions
It’s our job as data scientists and engineers to make data more accessible—not less. Investing in flexible, extensible, and user-friendly Big Data architectures makes this possible. At Slickdeals, every department from customer success to marketing to engineering will soon be able to access the data they need with no need to understand data architectures or code.
We’re continuing to evolve our approach and hope to build a single universal cube this year and completely retire SSAS. Refactoring our pipelines for an event-driven architecture and implementing a semantic layer has enabled us to tap into fresher data faster and more flexibly to better meet the needs of our entire business.