The two types of proprietary data
I’ve begun to think about how companies collect and leverage proprietary data. My in-progress framework is that there are two types of proprietary data:
Acquired data that is gathered and transformed from semi-public sources. For example, Opendoor creates datasets to accurately value homes and CircleUp creates datasets on CPG brand performance to make investments in CPG brands.
Incidental data that is gathered as a result of a product/service being used. Basically all companies fall into this bucket. For example, by using Facebook, Netflix, and Amazon, you’re giving those companies information on the content (whether it’s news articles, TV shows, or vacuum cleaners) that you’re most likely to share/watch/buy, which in turn helps those companies build more compelling services.
While leveraging this data is necessary to build a valuable company, it is not sufficient – the underlying business model matters. For example, restaurants have great incidental data on what dishes customers order, but it’s doubtful that that alone will help the restaurant differentiate itself.
I personally find it most interesting when companies use proprietary data to create novel business models that upend the status quo:
Owning the value chain: Chris Dixon talks about this in his post on full stack startups. “The old approach startups took was to sell or license their new technology to incumbents. The new, ‘full stack’ approach is to build a complete, end-to-end product or service that bypasses incumbents and other competitors.” Opendoor doesn’t use its home valuation data to help real estate agents price homes, they use it to purchase and sell homes themselves. Similarly, Uber doesn’t license their technology to taxi cabs, they use it to bypass taxi cabs altogether.
Building a flywheel: Proprietary data can also help companies charge simpler, lower fees than their competitors, which can result in a flywheel. For example, Stripe offers a chargeback guarantee where for 40 bps, they cover all chargebacks. Stripe is probably doing this because they believe that their technology will keep the cost of chargebacks below 40 bps over the long term. Therefore, they can charge a guaranteed fee to eliminate merchant uncertainty, and focus on reducing chargebacks to improve their margins. You can see a flywheel start to develop: lower fees leads to increased merchant adoption, which leads to better technology to reduce chargebacks, which lead to lower fees, etc.
