Subscribe now to join smart, curious folks who get the Data Ops 📊 newsletter
Hi, I’m Greg 👋! I publish this newsletter on finding data products and interesting data observations with the goal of finding patterns and future product insights. (Also, it’s fun.) If you need a background on how we got here, check out What is Data Operations?
This week’s toy: a template you can print out, upload, and then get an animated gif of the result. Edition 89 of this newsletter is here - it’s April 18, 2022.
The Big Idea
A short long-form essay about data things
⚙️ Salesforce is a lousy data warehouse
Salesforce is one of the most successful software companies in the world. Their flagship product, Sales Cloud, helps you “sell smarter and grow your business faster from anywhere.” If you look at the Salesforce marketing material, you’ll see images like the one below that put you (the user) at the center of their technology ecosystem, using all of their products in a seamlessly integrated whole.
I love marketing (and marketecture diagrams) as much as the next person, and I think this image is hopeful, optimistic, and not exactly representative of the typical environment where Salesforce is used.
There are many different sources of truth
According to BetterCloud, in a typical environment at an average Saas company, 80 different Saas apps are used in 2022. That means that the picture above has lots of logos, some of which are not Salesforce logos. If Salesforce is the source of truth for a company running 80 different Saas organizations, it’s either got to act as a data warehouse or play very nicely with a data warehouse.
This means that each department (perhaps every department) has a source of truth they use to engage with customers, prospects, or other related pieces of information in a company. Because access to each one of these systems is licensed differently, it might be challenging to get access to the data for everyone in the system where people use it. One way organizations handle this problem is by sending data to a data warehouse, and then creating a unified view of all of the data related to a person or a company.
There are many different data warehouses you might use to create a unified view of the data within your business. If you try to build a unified view of data in Salesforce, here are several reasons why this might quickly feel like patting your head and rubbing your stomach at the same time. (SPOILER: it’s also hard to get the data to line up in every system so that you know you are talking about the same person or company when you reference a bit of data.)
Why Salesforce does not excel at data warehousing
Salesforce is great (when configured correctly) at managing the sales process across different sellers, and even creating a through line from marketing automation to the bottom of the funnel sent to a revenue system like Netsuite. This makes the use of a Salesforce Account ID or Contact ID a convenient way to unify part of your data. But Salesforce is the source of truth for sales, not for the whole business.
Reason 1: Salesforce is not the SSOT for every system
Different departments structure their data differently. In the absence of a unified data view, they are likely to use a lowest common denominator field like email for people and website for companies to identify unique entities. Matching these records up to the canonical record for that entity in Salesforce is not always realistic or possible, so then you are relying on a separate foreign key to line up data across systems. This is great until you run into edge cases where incomplete or incorrect data is introduced into your other systems.
Because you are not controlling the data that comes into your systems, it’s hard to depend on another system like Salesforce to be the single source of truth. There’s more than one variable changing all the time.
Reason 2: Salesforce is optimized for single table searching
Have you searched for something in Salesforce, or any other customer relationship management system, recently? If the system is modern, you might have a fancy search that shows you related items based the name of an account, the related objects like opportunity or contact (through the account ID field), but you probably don’t have links to other more distantly related items.
Salesforce is optimized for a single table query. By this, I mean it’s really good about giving you information about accounts and contacts and opportunities. These are all items closely related with the single key of account ID.
But getting insights off of multiple objects – even if they are all related to account – is pretty hard in Salesforce. There is no place to create a temporary structure where you can relate pieces of objects that all relate to the same thing. Yes, it’s true you can do this with custom reporting, custom objects, and a bit of Apex code. That’s all a lot of basic work to do to go beyond transactions and enter into data analysis.
Many people instead use a CSV export to place data into Excel or Google Sheets, or dump the data into a data warehouse. While this is easier to do than making your data warehouse work well, it also introduces a lot of potential data skew and makes it hard for others to share your analysis. The goal of creating useful reports and dashboards can be … kinda frustrating.
Reason 3: Aggregates and rollups are hard to do
Another barrier to using Salesforce as a data warehouse is the difficulty in building aggregates (counts, sums, min, max, median, etc). Counting things is much easier if your data is already in the right format. There is a reason Rollup Helper exists, which is that it is very difficult to do these kinds of counting in a Salesforce environment without custom Apex code or an external tool to do the work for you.
If you are pulling in data from other systems into Salesforce (or into a data warehouse), you also need to normalize the data on the same time grain. This means if you want daily rollups of data that only exists weekly, you’re in for disappointment. It might be that you can only create a weekly report of the information you want.
Data needs to be in the same shape and time grain to be adequately compared, and Salesforce doesn’t help you out with this.
Reason 4: The more data you have in Salesforce, the more you pay
Storage for data is cheap. Even cloud-baed storage is cheap. Storage inside of Salesforce … is not cheap. More licenses to let all of the people in your company access Salesforce data costs … more for your Salesforce budget.
In addition, there are other related costs that start making Salesforce a less attractive alternative to a data warehouse. Unstructured data having many columns is untenable in Salesforce. Because adding columns is hard to do dynamically, you need to be pretty deliberate with the information you’re adding. And adding more columns multiplied by the number of times you need to store them costs you a compounding amount of storage. (But they don’t offer you a back up service natively … 🧐
Reason 5: Transaction History is Limited in Salesforce
There is a limited ability to get the history of transactions in Salesforce. You need to turn on the history of each field so that you can then capture that information. If you don’t have history tracking on, then you’ll need to track that change in a another system to let you know what changed. If things don’t change all that often and you don’t have that many fields, it’s not a problem.
If you have lots of fields, then you need to know ahead of time which ones you want to track. There are ways to pay more money to track more fields for longer periods of time, but this sounds like a replay of the former item. It costs a lot to keep track of lots of fields in Salesforce. Because the history report is slightly different in its dimensions, it can be challenging to see how changes in the lineage of your Salesforce records should cause you to react.
Reason 6: It’s hard to adapt to changes in another system
This brings us to the final reason that Salesforce can be a challenge to use as a data warehouse. If you’re using only the apps in the Salesforce world (but not always), you’ll have a better chance of knowing when the schema (the metadata of the fields and data types) or the actual data (the information contained in those fields) has changed.
If you’re like most organizations described above and the data in your environment changes, you will need to use your detective abilities.
What can change? The schema of connected objects. The format of the data you receive. The accuracy of data that may or may not be in restricted picklists. All kinds of stuff can change, and custom objects in Salesforce are as good as the logic you put into them.
What could possibly go wrong?
Getting Salesforce data in your Data Warehouse
If you’ve come around to the idea that Salesforce might not be the best place to keep all of your operational data, there are a few strategies you can use to get data out in a structured way and on a regular basis.
Direct copy or export gives you the ability to take arbitrary data out of Salesforce and put it a common format like CSV so that you can load it into a data warehouse (or keep it in Excel or Google Sheets, as some do)
Change Data Capture is a way to stream changes from Salesforce to another system that captures these events. It is not as reliable as a transaction from a database because it’s an event sent out to a service. If it’s not delivered, Salesforce doesn’t know that happened
An integrated solution using various Salesforce APIs, so that a combination of changes in the data and changes in the Salesforce environment can trigger a push of data to the outside world to verify the contents of the information. If the data service is also aware of the schema of the object, it could also update Salesforce with the new information. (P.s. if you want to try this, check out Syncari. Disclaimer: I work there. I also think our tech is very cool.)
Even if you think Salesforce is a great tool and you want to buy Salesforce licenses for everyone in your organization, it would make sense for you to figure out how to make that data available to the rest of the applications in your system. Understanding the limitations of Salesforce as a data warehouse is a good place to start.
What’s the takeaway? Salesforce is an excellent way to track data for your sales organization (well, at least some of the data for your sales organization.) Making that data available to the rest of your organization requires a better way to engage than making Salesforce a de facto data warehouse. Data warehouses and operational data hubs are much better at stitching together data, keeping up with changes, and aligning your business than a single line of business tool.
Links for Reading and Sharing
These are links that caught my 👀
1/ Human-powered AI is going to be a thing - Ben Thompson surveys DALL-E, the generative AI that creates pictures based on human input, and finds that it has reduced the effort to create new things to almost zero. In a world with zero marginal content, what we will create could be a lot … stranger.
Here’s an example:
When the marginal cost of creation is close to zero, how will creators compete? I think they’ll be writing better prompts to computers than anyone else.
2/ Google thinks AI-written content is spam - If everyone starts using AI to create images (or to create content), how will a search engine understand which things were created by a person? Presumably by using AI to detect the presence of AI. Until then, Google is saying what you would expect Google to say: that people shouldn’t try to game the system.
3/ Experts teaching experts - Apple - one of the most successful companies of all time - uses a functional organization to run the company. This structure drives an organizational culture demanding simultaneous competition and collaboration. At Apple, you have to know much more than a typical leader to get things done. The benefit is that there are many more leaders being created at all levels of the organization.
What to do next
Hit reply if you’ve got links to share, data stories, or want to say hello.
I’m grateful you read this far. Thank you. If you found this useful, consider sharing with a friend.