Practical Predictive Analytics
上QQ阅读APP看书,第一时间看更新

External data

Data can also be augmented by integrating a company's internal data combined with data obtained from a variety of sources, including government data, social media feeds, and data purchased from vendors. Often demographic, behavioral, and risk data is purchased separately, and then merged within the data warehouse.

Beware of the daunting task of data integration. A key problem encountered in this task is being able to associate data from a variety of disparate sources with each other. This is less of a problem when dealing with internal data, but when dealing with external data you may have to use alternative methods available to perform matching, such as using fuzzy match (similarity) algorithms or performing entity extraction methods (which, for example, can extract a customer name). These are just two ways which can help you with associating data with external sources.