Satyen Sangani: We’re Still in the Dark Ages of Data

March 1, 2022

Episode transcript

Auren Hoffman 0:01

Welcome to World of DaaS, a show for data enthusiasts. I'm your host, Auren Hoffman CEO of SafeGraph. For more conversations, videos, and transcripts, visit safegraph.com/podcasts.

Hello, fellow data nerds. My guest today is Satya Sangani. Satyen is the co-founder and CEO of Alation, a data catalog and governance solution. Just last year Alation raised $110 million series D to drive growth and product expansion. Satyen welcome to World of DaaS.

Satyen Sangani 0:37

Auren it's great to be here and great to see you. Thank you for having me.

Auren Hoffman 0:41

Yeah, absolutely. Now, a few years ago, you wrote this piece in the World Economic Forum blog where you basically claim we're still kind of in the dark ages of data. You mentioned like data literacy is incredibly low. that we even if we just marginally increased data literacy, it will result in just a ton of innovation. What are some maybe non-obvious steps to increasing data literacy?

Satyen Sangani 1:09

Yeah, I mean, so I think just to give a sense for the statistics. That title “Being in The Dark Ages” is obviously super dramatic, but I actually don't think it's super far off from the truth. If you look at the data, there's an estimate from the Bureau of Labor Statistics that basically said that there's going to be roughly 67,000 new jobs created in analytics and data. Maybe somewhat narrowly defined, but that's basically the number. Now roughly, let's call it seven/eight billion people over that same period in the human race. So you would think that… I think that's 1/100 of 1% if my math is right. If that's true, the opportunity of getting data in the hands of an order of magnitude more--hopefully two to three order of magnitudes more--people could be massive in terms of the amount of innovation that we unlock. Because as you and I know, as we work in the valley, there are so many people with ideas, but those ideas are limited by people's ability to basically build software programs and figure out what's true about the world around them. Which is how effectively data works and functions. So to answer your question now, how do we get more people to be data literate? I think certainly the educational system has a lot to do with that. Our ability to focus on math, science, statistics, and to be able to give people the basic skills of analysis matters. But I also think that the world of software, and certainly the world of enterprises and companies writ large, have to take a much bigger role in training their people to use data and to think with data. That's not a responsibility that we can slough off on the educational system anymore. Like that's got to be something that all of us as enterprises for our own self-interest take on in training people.

Auren Hoffman 3:06

Isn’t that like the classic like McKinsey training to learn how to use data or something, or do you think it has to be a lot more rigorous than that?

Satyen Sangani 3:14

I think it probably has to be more rigorous. But I think that it doesn't necessarily have to be all done in one go, and I think software has a role to play with that. Like Yelp teaches us how to go pick what restaurant we want to go to. Software should be able to teach us, in many ways, how to think. So it's not, I don't think, only the role of teachers in classrooms. It's the role of every person in a meeting. It's the role of managers and incentivizing this work and motivating it. But we've got to all learn how to use data better, and how to make that a more comfortable thing.

Auren Hoffman 3:47

In the education world, are there any like countries or educational systems that are doing a decent job?

Satyen Sangani 3:52

I think there are some companies that are doing a great job. I mean, we've got some phenomenal examples. I think probably because we're at the leading edge--bleeding edge if you will--of companies that are interested in building data cultures. We've got some amazing examples like Pfizer who have a 101, 201, 301 core system for how they train their data analysts and ultimately graduate them to being data scientists. Interestingly, these programs look very similar across enterprises. So if you talk to the folks at Munich Re insurance, they've got a very similar program with that same sort of academic structure.

Auren Hoffman 4:27

Is this just understanding like things like false positives, and you know, just basic data?

Satyen Sangani 4:32

Yeah. At the most basic level, there's like an actual statistical literacy question of being--

Auren Hoffman 4:37

It’s not like we're getting into the Monty Hall problem or some other type of thing, or is it all the way through that type of stuff?

Satyen Sangani 4:44

Well, I think you do over time, right? So I think over time, you get to beyond the linear regression into more sophisticated distributions and more sophisticated econometric analyses and structures, but certainly that's not where you started. That's not where the vast majority of people need to even end, but you can get there. Being able to get the people who have the declivity and the interest and the curiosity to get there is exactly what we, I think, need to be doing. So I think there are great companies that are doing this work. Sometimes they partner with universities. Sometimes they do it endemically within their companies, but lots of people are doing it, and more people need to be doing it.

Auren Hoffman 5:21

Sometimes you think of databases as like rows and columns, right? Then you have this naming convention of these columns, which varies a lot from company to company and database to database. One thing that I think Alation does, it uses machine learning and a AI to kind of like help clean it and speed it up. Can you walk us through a little bit about how that works?

Satyen Sangani 5:43

Yeah, and I love the way that you started this question. Because I think it's really important for people that want to understand data to understand how data is born. Most people who are building those tables and columns are computer programmers who are basically trying to name these columns because they want to efficiently write a computer program to do something. So if I'm writing Uber like I might say rider address, and I might call it RDR_ADD. That's because I'm a programmer and I don't want to type any more than I have to type because I type a lot, and I want to be able to shorten the names of my variables. I don't really care how that's done. I'm just doing it. So now some person has to come along and figure out rider address because they want to build some algorithm to understand what the most common zip codes are that people who are hailing Uber Premium have to come from. So that might be a question for somebody. Now they've got to go back in time and understand what rider RDR_ADD means. That could be rider address, or it could be the fact that the rider has Attention Deficit Disorder, but like it could be anything in the world. The reality is that those people don't talk to each other. So how do you then figure out what that person intended and meant when they were writing that program, and how do you communicate that more broadly? How do you understand the assumptions? Because even capturing something as simple as rider address, well, what if you geo encode it for the address as opposed to using an actual physical street address. Like those are two different things, and knowing what that means is really hard for the person that's downstream. So then what we do is we basically try to use machine learning to look at the lexicon and the writings within that company, within an Uber per se. They're not a customer, but they're a great example. Then say, “Oh, well, we see RDR is always rider. That's probably what you meant in this particular instance.” Machine learning can do that.

Auren Hoffman 7:45

You're actually searching through the code to do that? Or how does that? The comments in the code and stuff.

Satyen Sangani 8:10

So we searched through aspects of the code. So we do search through the SQL queries, and typically the logs of how people are trying to ask questions around it because that's quite informative. The physical underlying like machine code, like the Python and the Java and the whatever that may exist in the company tends to be less useful and relevant because it tends to be not very semantically rich.

Auren Hoffman 8:28

Is there a place for like the human in the loop to help figure some of these things out? Because I imagine this is gonna be really hard. Then you learn from the humans over time.

Satyen Sangani 8:37

Yeah, it's a supervisory learning system. So we will guess, and we have this thing called Ali the Robot. Ali is this like orange looking robot. Super on brand for Alation, but she's the personification of what we do. She basically sort of says, “Hey RDR or ADDR could be one of these four things that we've found. Can you, user, confirm?” We know who the experts are because we're searching through all the code and understand who's looking at this database table most often and who's querying it most often. So when we get that information, we can go to the expert and locate them and say, is our guest correct? Once it's been corrected or confirmed, we can then move on and extrapolate that for everybody else who's searching for that thing forevermore.

Auren Hoffman 9:21

Is there also like a sense of like where you're looking at on how these different columns are joined or these different foreign keys and stuff? I imagine that can help really quickly if there's an RDR_ADD and then there's another field that says like CSV or you know or city state zip or something like that. Maybe there's some sort of relationship between them. One might be easier to decipher than another.

Satyen Sangani 9:45

Yeah, absolutely. Then there's also correctness, right? So one example, and you talked about like join key. So you might say, somebody might say, “Oh give me all of the people in a certain area.” That area might be the Northeast. Maybe somebody lists Connecticut, New Hampshire and Boston and New York, right. Then there might be another person that basically did the same thing, but they excluded New York from the definition. It turns out that the first definition is used by lots of people. The second definition is not used by very many people. So what we can do is we can see in the code which one's used more often, and we can say, “Oh, this is a better definition for ne with that's used more commonly in the company than this other one that's only been used by one person by mistake.” So that incidence of commonality within the language also is super relevant, in addition to the guessing of what the person might meant. So there's lots of interesting ways that we can inspect the code and the underlying database structures to learn about what the data means.

Auren Hoffman 10:44

I know you really care a lot about like data governance. Like where do you think the data governance is kind of fundamentally broken? And how can we improve that?

Satyen Sangani 10:53

Data governance sounds like maybe the world's most boring topic in the galaxy. Who wants to govern data? Governance itself suddenly is a heavy handed thing. Then like I'm just gonna go into like these weird database tables and start to control things. It's like what do you what are we doing here? But I care about it a lot because it is a means by which more people, I believe, will use data. So why don't people use data? Well, they don't use data because they don't understand it. They don't use data because they don't trust it. They don't use data because they can't find it. Right? So those are the basic comprehension questions. So if you have data governance in place, what you're doing is you're saying, “Look, well, I've got these things called policies. A policy can be anything from Auren can't access this HR data. Or he's maybe the only person that can access this HR data.” To things like “Oh, this data has to be of a certain quality.” If we're making recommendations to physicians about treatments, then we need to make sure that the data quality that we're feeding this system is going to be super high quality. So one then has policies. Like we've got to make sure that the test records that are recorded in this data set are going to be in the right range, with the right level of certification, before we can start feeding this to physicians or people who are making medical decisions. Those are the policies, if you will, broadly defined--and there are lots of different types of them--that you need to apply to data for data to get used more often. Policies can help data findability because they can describe data better. If you have policies that say when you have a data set, you’ve got to describe it in these seven ways: who owns it, how often it's used, where it was produced, what were the downstream systems that are using it. Like those are all helpful things to know because if you know them, more people will be more likely to use data. So that's why I care about data governance. Why it falls down often is because of two reasons. First of all, it's really hard to do. Just the best example that I can often give people is we all have an email stack. We all try to find stuff in our email. Some of us had at one point tried to compulsively folder our email. I don't know about you, but I've given up on that because there's just no way to build a foldering system that's going to work for me across time, right. Companies now try to do this across 10s of 1000s of database across 10s of 1000s of employees, and it just never works. So manually curating this stuff never works. Then the other thing is it's just not a very interesting--Like the outcome of the work is really interesting, but the actual work itself is not very interesting. So you don't have enough people that need to be able to do this work or want to be able to do this work and the work is really hard. So it tends to be unsuccessful. So what we do, broadly defined, is we try to distribute the work amongst a lot of people, and we also try to make it a lot easier by applying machine learning.

Auren Hoffman 13:51

Now, some of the most valuable data that's out there is data that could be like pretty sensitive. If you think of like medical data or financial data or the IRS data, but those that would be the datasets where we would learn the most about society. Where we could really solve deep mysteries of the world, where we could help people cure cancer faster, etc. Is there a way to essentially have our cake and eat it too? Can we protect everyone's privacy, either by creating synthetic data or by you know only allowing people to query the data in certain ways, and then to really get the benefit of data without seeing like the underlying, very sensitive piece of the data?

Satyen Sangani 14:31

You know, if we go back to that dark ages point that we started in earlier, I think there is so much opportunity for data to be transacted in the same way that we transact around any good. So let's assume that I have, we all have phones. One great example of data portability is sort of Apple Health where they have your health data. If I knew about clinical trials that I was interested in and wanted to support those causes, I could contribute my data, possibly for compensation, possibly just charitably. But you can imagine a world where people have a lot more ownership of their own data and where and how it's used and the control over where it's how it's used. That can be true for people. That could also be true for institutions and companies. I think it will be and it has to be for us to be able to collaborate openly and learn openly about what it is that we need to go do. So yes synthetic data, I think, is an interesting idea. I think it's applicable in certain domains. But we're going to have to have a Chinese menu of options for how people can both contribute data and also consume data. I think over time, in order for us to be successful as a society and solving a lot of the problems that we solve.

Auren Hoffman 15:43

Over the last five years, the number of data scientists has probably grown by an order of magnitude. That's probably true across like every single industry. Do you kind of expect this trend to continue in the next five years? And then how has that kind of impacted your strategy?

Satyen Sangani 15:58

Yeah, I think it has to continue. I think of broadly, the data science market--which I think is in some sense, a subset of the broader computer programming market or computer engineering market--to be one that is broadly negative on employment. Right. I mean as long as I've been building Alation, I've never seen salaries for engineers go down. Certainly that's true for every role. That's quasi technical. I think that's going to continue to happen. I think the only real solve is how do you enable more people faster? You know, how do you build computational literacy? How do you build data literacy? How do you build literacy around certainly computer science and programming. All of those things have to happen to scale order of magnitude faster.

Auren Hoffman 16:45

Today a smart person plus Databricks or Snowflake or some of these other tools that are out there certainly make them more dangerous today. Dangerous in a good way. They can solve more problems than they could in the past.

Satyen Sangani 16:59

Yeah. That's what we're going to continue to have to keep on doing. If we don't want to live in a world where lots of people there, the haves and the have nots. I think the biggest determiner of that is going to be your ability to understand and leverage technology.

Auren Hoffman 17:11

You know, I'm sure you work with companies that have just like great data science teams, and have like a super good grasp on data. Then you work with customers that maybe are more emerging in that area. They're not yet at the point of greatness. How does this play into like our customer success strategy?

Satyen Sangani 17:32

Yeah, we have to be fairly facile and responsive to the customer. So it is true that there is a maturity curve that we see. There are different ways of cutting that. But certainly one vector is the number of people that are using data. Another vector of that is the level of average sophistication for the users within the company. We partner with our customers to be able to build data literacy within their organizations. Sometimes in the most creative roles, that becomes the form of a newsletter where often the best data teams inside of companies are themselves manufacturing data problems where those problems are then solved to build interest and literacy. It's cast in the form of the business problem that that institution is trying to solve. I mean, I don't know if you ever--Like Google had this great ethos where you'd go to a bathroom, and there'd be like a computer programming problem on the wall. So that ethos in the Silicon Valley, I think, is now starting to make its way outside the valley. I think it's great because it's just another way of like helping people learn.

Auren Hoffman 18:40

At SafeGraph, we sell data. So because we just sell data, we've been selling data mainly to companies that can consume data, which are very data oriented companies. Usually they have some sort of core data science team or smart engineers and product people are fairly technical. We made a choice not to sell companies that are not yet data oriented. Now, lucky for us, there are more companies today that’re detail oriented than there were five years ago, and hopefully that trend continues. But we kind of said, okay so we have some qualifying questions. Sometimes they're just not ready to buy us. Maybe they're ready to buy from our customers who build products on top of our data or something. Do you have a similar analysis? Or do you say, “Hey, we want to meet you where you're at. If you're not yet super data oriented, that's okay. We're going to build products for you. As you get more data oriented, we could help you grow more.”

Satyen Sangani 19:27

When we started the company, we probably started at the most organizationally data literate set of companies that were out there. I think that held us in good stead. Because if you think about it, the catalog at the very simplest formulation is basically a search engine. The people that need our search engine have two characteristics. A lot of people that use data and the companies that use us have basically two characteristics. One of them is they have a lot of people that want to use data or use data, and they have a lot of data inside of their companies. If you have both of those things, you're a great person in our ICP, or a company in our ICP. The world is obviously shifted. There's a lot more companies that want to use data and need to use data in the 10 years since we founded the company, and the product that we built has become much more sophisticated. So I would say we probably start at a much earlier point than we used to, but there's still probably an early stage in the market where a company might just be using one data source or have a team of five people. At this moment in time, we don't have a product for those very, very early customers. I would say that what you'll see from us is to even drive towards more simplicity over the next 12 to 18 months where we do start getting to those customers to be users of our products. I think that's important not only just to start down market, which is great for all software companies, but also because making that dead simple is so much a part of the ethos of what we're trying to do in broadening the use of data in the world.

Auren Hoffman 21:01

In your company Alation, you've recently started doing acquisitions. I know you recently acquired a company called, I think it's called Lingo Analytics, which is a data intelligence company. How are you developing and thinking about--You're probably still like early in thinking about the M&A strategy. You're not yet like Oracle or something or Salesforce. So how are you thinking about that M&A strategy? And how do you think other companies maybe around your size should be thinking about it?

Satyen Sangani 21:26

I can tell you how we're thinking about it, and maybe trying to tease out what might be similar or different for companies that are in our phase. So the data ecosystem, I think, as we all know is just super complicated and super rich. It feels like for every company that may die off of those MarketScape maps that we see, there are five more companies that are replacing it with new venture funding. That makes it really complicated. We build our own version of those market apps internally based upon our architecture and our future state expectation for what we think the markets going to look like, the capabilities that we think have to exist.

Auren Hoffman 22:03

So these are like basically products that you want to offer--either features or products--that you want to offer, essentially, in the future.

Satyen Sangani 22:09

That's exactly right. So what are the things customers are saying to us? What are the products that we're seeing on other Marketscapes out there. Then we basically take all of that learning, inside outside, and try to build our own version of what we think the future market might look like. Then it becomes a matter of mapping companies into those various boxes. Then when we do, that's a lot of boxes by the way. When we do we basically say, “Okay, which ones do we really care about?” So there are certain boxes and certain spaces that are super low priority. We just don't think that we need to be in those spaces. We can partner. We can just ignore, frankly, in many cases. Or there's already great companies. The one thing I think we believe, which I don't necessarily know is true for everybody else is, look I don't want to buy or acquire or build in a space where there's a solved problem.

Auren Hoffman 22:41

You don't have to do everything.

Satyen Sangani 22:42

Yeah, we don't do everything. People are like “Oh, why don't you do visualizations?” I'm like well, haven’t you heard of Tableau? Like, we don't need to do that. So I do think it's important to pick not just the spaces where you care, but also the spaces where you think you can have real leverage. What is going to make you turn from 1x to 100x. Because growth at scale is really hard. So the thing that I'm always thinking about is how do I get real leverage out of this thing that we're looking at? To me, that's a relatively important thing to be focused on. The other things that you focus on are our culture match. Does the culture match of the company? Because M&A is really hard, and getting a totally separate group of people to come into your group of people and be a part of your tribe can often be really debilitating. I think the last thing that we think about is just technical debt. Like is the architecture going to be something that we have to replace or build on top of? There's some cases where replacement is okay. But often, gosh, like inheriting somebody else's technical problems is really tough.

Auren Hoffman 23:59

Once you have this market map of companies, you're narrowing down to 30 really interesting companies you think would really be a good fit. Are you doing outbound into those companies? Are you waiting for an inbound? Or how does that work?

Satyen Sangani 24:13

We will do some outbound on a very limited basis for those two or three priority spaces that we think that we really need to know something about. Most of that takes the form of sort of business development activities because--

Auren Hoffman 24:08

Because there could be a way to partner potentially.

Satyen Sangani 24:10

Exactly. So generally speaking, we're not just going to have somebody out of the blue in Corp dev call you up and say, “Hey, we want to buy you.”

Auren Hoffman 24:37

Let’s partner. Let's work together. We have a common customer, let's solve their needs together or something like that.

Satyen Sangani 24:25

Exactly.

Auren Hoffman 24:26

But once you've got a deal that you want to do, how does one think of structuring it? Obviously you can use debt, you can raise equity, you can use your own equity to do the acquisition. There's lots of different levers. Like how do you think about that?

Satyen Sangani 24:58

It's a really tough problem because when you're growing really fast like we are, you have this challenge of how do you value your own equity. Because there's a public market for it, you always have a conversation there around what is it worth. But assuming that you can get over that hurdle, my default position is, “Look, I believe in our equity. So I'm happy to pay cash. Cash is available. So let's do that.” It tends to be, honestly, a negotiation based upon what the other company is willing to accept and what their investors--to the extent they have them--or themselves as founders are trying to optimize for. I think you learn so much from that negotiation. Because if somebody sits there and says, “Hey guess what, I really want all cash, and I want it today.” You’re like yeah, that's maybe not the culture that I'm trying to kind of build into this firm. So the negotiation does give you a lot of information around intentions, and software companies are people. So it just tells you about whether you're getting the right people.

Auren Hoffman 26:02

I know you also partner a lot in your go to market effort with companies like AWS and other types of places. Basically, how do you how do you think of partnerships versus channel versus direct sales and kind of growing the go to market?

Satyen Sangani 26:15

Our partnerships in the did the data ecosystem have been primarily ones that are less channel oriented and more what I’ll call solution oriented. So it's more likely that a partner for us will bring us into a deal, and our sales team will execute it rather than a partner having to just resell us wholesale as a part of their solution. Often that's because these data architectures are still fairly complicated. This is a considered buy. You're not just going to start off on a data discovery and build your corporate data portal around something without having that be something that you think about pretty strongly. So our two great partners for us. AWS is obviously one, and you mentioned them. Another great partner for us is Snowflake. They recently invested in our series D. We see them as their growth has obviously been fantastic. Maybe one of the fastest growing software companies ever. But what's been interesting about that we’ve got over 100 customers in the wild commonly with them before we even started the partnership discussions. The same thing is true for Tableau, where they're just like snowflake, a customer, and a partner, and also an investor. Those relationships have all evolved very naturally because customers basically tell you who to partner with and what they care about. Then your job is to basically make it better than what the customer otherwise has in front of them today.

Auren Hoffman 27:42

We do a lot of stuff with Snowflake as well. Really because so many of our customers use Snowflake. It's a great company. It's a great product. So many of our customers use them. So they're constantly asked, “Hey, how do you get your data into Snowflake better?” Then you start to realize, okay, we're starting to talk to Snowflake more. Then we're sharing our CRMs or you're starting to do this. So is it just like a natural progression? Or is like in BD? Is it like a top down kind of thing?

Satyen Sangani 28:08

I think the best partnerships have an element of both. I think that you have to, on the first level have common demand. I mean, I certainly get a lot of companies that come to me and say, “Hey, do you want to just partner with us?” My first question, even if they're a great friend or somebody that I know well, will be, “Hey, do we have a customer that's actually doing this stuff in the wild? Because no sales guy, no matter how great they are going to be or saleswoman, no matter how great she is, is going to be able to come in and sell something that no customer actually sees value in.” So that's always the first set of questions. If you have that bottoms up demand, then you have to sort of really pick a direction and steer that demand in the appropriate way. That has to be something that's commonly strategic to both companies or it's never going to get done. So I do think there's always this kind of balance between at the bottom level, having real demand, but then at the top level, defining a strategy that's going to take one example and turn it into 10.

Auren Hoffman 29:02

Now I'd love to ask you a couple questions about just the data and data industry in general. How have you seen customers increase their adoption of alternative data to drive decision making? So that would be data that they don't have internally, but that they're getting externally to help them drive those decisions.

Satyen Sangani 29:20

It's interesting. I've seen over the last two years--and in particular, over the last 12 months--the notion of alternative data really improve and increase. I think that speaks to, particularly for those bleeding edge companies that you sell to that we started off with. Those companies see so much differentiated demand and competitive differentiation from having data that they otherwise wouldn't have had because it makes their algorithms that much better. It makes their decisions that much better. That now you would expect, and I think now we're seeing in the data, more of these companies to crop up because having really clean, really reliable third party data is a competitive advantage. So we're seeing that.

Auren Hoffman 30:06

Yeah, we’re seeing something very similar. Where at SafeGraph, I would say, when we a company buys our data--and most of our customers are very large companies--when they buy our data, almost all of them didn't buy alternative data more than a year before they bought us at all. So it's like a relatively new motion. We’re maybe not the first data set that they’ve bought, but it's not like these companies have been buying data for 20 years or something.

Satyen Sangani 30:11

Right. I would imagine their sizes are also more variable. It's not just companies that are 20,000 people with a 50 person data science team. It's also companies that are 100 people.

Auren Hoffman 30:24

Sure, startups. Yeah, sometimes even easier for startups to go do that. We talked a little bit about these join keys that are joining data. I love thing about join keys. Like I think like Unix time, it's just a great join key to put time. Are there particular joint keys that you admire or that you guys are using a lot internally within Alation?

Satyen Sangani 31:03

Yeah, so internally, we're precluded a little bit by the notion of sharing data between instances and learning between instances. But I do think there's a world as we do more with our customers in the cloud, and we have more customers that are open, where there will be sharing.

Auren Hoffman 31:21

So because there could be like a more of a data co-op that happens between like understanding data, and etc.

Satyen Sangani 31:27

Yeah, absolutely. I think the really clever ones are things like the nouns, sort of person, place, thing, I think, are the very obvious ones. But when you start to get into derivatives of the nouns, like foot traffic data in real estate and where you merge person and place, that stuff starts getting super interesting. I think can be very, very interesting in terms of supplementary data. Then you can imagine sort of second derivative, speed of traffic. Third derivative stuff where there are increasingly deep levels of sort of what you're calling join key data or master data, as it were, that people are going to continue to source externally. I think it's super interesting. I think our imaginations are almost limited by what we see with this stuff.

Auren Hoffman 32:15

At SafeGraph, we're part of a whole group of--a couple thousand companies--that we use this thing called Placekey, which is like an open… Basically just converts like postal address into a string. Then obviously, a string is very, very easy to join against each other. It basically takes a very complex thing of just an address, which could be of permutations for any given address, and creates a very, very standardized way to be able to share data with whoever you're sharing data with. You can imagine it's similar for like a company or something like sharing data about a company. It's like well, there's even many different ways of writing out the company. Microsoft, Microsoft Inc. MSFT, right. There's probably many different ways--microsoft.com--there's probably many different ways of just like deciding how to name a company or convention around a company.

Satyen Sangani 33:00

Well, it's so interesting because that broad problem is what, historically, people have sold software around called Master Data Management. Historically, every company has had to buy their own master data management software, reconcile their own systems, and do that internally as a single one off project, that literally there's no replicative learning from company to company to company to company. So everybody's doing the same thing all over again. You can imagine with something like Placekey, which is tremendously complicated underneath the hood, people would no longer have to do those $70 to $80 million projects just to figure out what their customer names were. Instead could do that with a single merge entity. That's where I think the data magic could be significant and real and so much more powerful in terms of our ability to move faster with this stuff.

Auren Hoffman 33:51

Cool. Yeah. Well, I'm very hopeful about the future. Okay a couple personal questions. I know you started your career as a banker at Morgan Stanley, and then you were doing private equity at TPG. This is maybe not the traditional path for software engineer. Maybe more of a traditional path for maybe somebody who wanted a career in finance or something. What advice would you give those people who are kind of in that banker/private equity path that want to get into entrepreneurship? Should they go work at Oracle like you did for a long time? Or is there another? What advice would you give them to getting into technology?

Satyen Sangani 34:28

Yeah, I mean, I think there's kind of three rewards of being in a lot of those career paths. I think the first is that you're the variety. Like you're seeing a lot of different companies if you're in a finance job, and you're seeing a lot of different situations and examples. So there's just this great variety of experience that you get intellectually if you have the skills. A second might be the intellectual ability to be able to analyze companies consistently, and that might be very interesting to you. If you like those things, great. Those are good things. That would probably be the reasons to be in a finance career. I think the other two things that often motivate people to be in finance careers is because they really like the money, and they really like the status. I think in those cases, you kind of have to get off the sauce. Look, I say that with full realization that, I did not know that I wanted to be a founding tech CEO when I left those jobs. I had no idea what I wanted to do. I always worried that maybe I should have been in finance because well, that would have paid a lot more money. But it took me a long time to find my way. But I think the fact that I left helped me obviously find my way, rather than just taking the career not knowing what I wanted. So the first thing is just leave. You just got to get off the sauce.

Auren Hoffman 35:45

It's hard because you're giving up a salary or something. Your comp might be lower or something.

Satyen Sangani 35:51

Well significantly right. I think, not just lower, but those careers often produce an outcome where you have full life security after 10 or 15 years of being in them. That's really hard to do in this world which is super competitive and where it's really expensive to live in places like the Bay Area and in Virginia and New York. That's all hard to do. But you have to make that choice. I think the other thing I'd say is that a lot of the analytical professions, a lot of the financial professions are professionals have discernment. You have to figure out what is a good deal and what is a bad deal. What's a good company, and what's a bad company. Where to put money and where not to put money. So you're on you're constantly a critic, where being in technology is all about construction and being motivated by building something. So then the question is, are you motivated by building something? Is that what you want to do? How you move from being a skeptic to being somebody who's an evangelist is that is a tough transition to make. I guess maybe the third bit of advice I give people is do something technical. Like go learn scratch. If that's what you can do, go do that but figure out how to be facile in terms of talking about data and programs and know how to talk about things like micro services. Just learn the technical language because the people who have both the numbers and the technology are 10X more powerful than the people that just have one part of that equation.

Auren Hoffman 37:17

Interesting. I've heard that you spent a lot of time working with like orphans in India. What prompted you to do this? What's your takeaway? How do we get more--Orphans are not adopted very quickly. Like, how do we get them adopted faster?

Satyen Sangani 37:33

I wouldn't say it was a lot of time. So I wouldn't want to oversell my qualifications there. Before I went between high school and college, I spent five months in Bombay, and I worked at an orphanage in Bombay. My job at that point was to… These were basically, in most cases, kids that were not going to get adopted. In the cases of kids that I was working with, they had polio and were unable to move their limbs in many cases properly. So that was a rough go. Now later on, my sister ended up adopting from an orphanage in India. I think a lot of people would like to do that. I think one thing that we all could work on is the laws and the supervision around how to make adoption as a process a little bit easier because it is actually a lot harder than one might expect in terms of the international adoptions. It's really hard to match. Now, that's because there's been so much exploitation in many countries. It's also the case that there has to be a lot of work done there in order to be able to make it more of a trustworthy process, a more reliable process. But it's also a more efficient process so that people who want to give their love to these amazing children can do so. But what else can we do? I mean, donate money, and donate time.

Auren Hoffman 38:54

Alright, last question we ask all of our guests. What's the conventional wisdom or kind of advice that you think is generally bad advice?

Satyen Sangani 38:59

Fail fast. So I know that's like a hallowed--

Auren Hoffman 38:45

That's a common advice. Yeah.

Satyen Sangani 38:46

That's the hallowed, like Silicon Valley trope. Just go and do something and fail fast and iterate. I think on one level, look obviously you want to learn. So if the optimization point is learning then I agree, you should fail fast. But I often think that what that ends up meeting in the case of many entrepreneurs and perhaps even VCs is like hey, look if you don't figure out this idea A, great. Like if you don't figure out how to go make the world a better place, well then figure out how to sell Pez dispensers to really rich people and that'll be like a really good idea. That idea of sort of leaving the central problem that you're trying to solve, and just constantly pivoting almost on a rudderless basis I think is where entrepreneurs often get hung up. Maybe you reach a locally optimal point, but you don't ever drive the change that you really want to go see. So I think sticking with a problem has a lot of benefits and rewards and joys. If you can really stick with a problem as opposed to just iterating the problem, that has a lot of benefit.

Auren Hoffman 39:47

Also these like 1% improvements. Like you're just going to get better and better and better at this thing over time. There is probably a point where you do have to give up, but sometimes it's hard to know when that is. You're making a lot of progress towards--

Satyen Sangani 39:58

I mean Peter Thiel, who like I think is a really interesting example. Like Zero to One is obviously a great book, and you know him. Well, I think there are lots of you would love to think of the world in zero to one improvements, and you'd want everybody to work on these revolutionary ideas. But often revolutionary ideas are achieved because somebody has worked for years by grinding away on a problem. You can't just like jump the shark. You have to problem solve your way to an outcome.

Auren Hoffman 40:29

Yeah. Oh, this is really great. Thank you for being with us Satyen. Where can people find out more about you on the broader interwebs?

Satyen Sangani 40:37

Yeah, like you, I've been inspired to start a podcast. So that's called Data Radicals. It's all about how you as an individual, or somebody who's motivated by data, can drive an organization, or help other people use data more often and think--

Auren Hoffman 40:52

Which is awesome. Yeah, so great. We’re the fellow podcasters here.

Satyen Sangani 40:57

Yeah. So I'm certainly gonna learn from this experience.

Auren Hoffman 41:00

Yeah, I highly encourage people to download Data Radicals.

Satyen Sangani 41:03

So that'll be on Apple and Spotify. Separately from that, you can see me on Twitter at @satyx on Twitter. So I will occasionally post there and of course on LinkedIn. So please connect with me and Auren, thank you for your time.

Auren Hoffman 41:17

Absolutely. Thank you. It's been great.

Thanks for listening. If you enjoyed the show, consider rating this podcast and leaving a review. For more World of DaaS, and DaaS is D-A-A-S you can subscribe on Spotify or Apple podcasts or anywhere you get your podcasts and also check out YouTube for videos. You can find me at Twitter @auren. That’s A-U-R-E-N, Auren, and we'd love to hear from you.

World of DaaS is brought to you by SafeGraph.

Transcript

Satyen Sangani, Co-Founder and CEO at Alation, joins World of DaaS host Auren Hoffman. Auren and Satyen discuss the importance of data literacy and ways for the private and public sectors to increase data literacy globally. They also break down key trends in the data science world and how powerful new innovation in the past few years is accelerating growth in the data industry.

World of DaaS is brought to you by SafeGraph & Flex Capital. For more episodes, visit safegraph.com/podcasts.

Listen on

Apple Podcasts

Spotify

Youtube

Share this episode:

Satyen Sangani: We’re Still in the Dark Ages of Data

Listen to more great episodes

Gabe Rogol: A Masterclass in Demand Gen

Susan Athey: Tech Economists, Machine Learning, and Causation

Richard Haass: Why We Need More Global Coordination