Search and Find
More eBook Details
Analytical CRM - Developing and Maintaining Profitable Customer Relationships in Non-Contractual Settings
Chapter 5 Support Vector Machines for Predicting Customer Activity and Future Best Customers in Non-Contractual Settings (S. 135-136)
The Pareto/NBD and the BG/NBD models owe their names to their underlying distributional assumptions, which emphasizes the strong theoretical foundation of the models. Yet, the last chapter showed that they do not outperform simple management heuristics. In fact, even back in the late 1960s, Tukey (1969) has already postulated that putting too much emphasis on the mathematical theories of statistics did not help in solving the real world problems. It was his mantra that statistical work is detective work and that one should let the data speak for itself. The branch of exploratory data analysis emerged, but was dismissed by mathematical statisticians for a long period of time. Many of them proclaimed that proper statistical analysis must be based on hypothesis and distributional assumptions.
Their argument was that looking at data before formulating a scienti.c hypothesis would bias the hypothesis towards what the data might show. The term data mining typically was used in a derogatory connotation. The argument culminated in the reproach of improper scientific use, the reproach of torturing the data until it confesses everything. The .eld of marketing has for long favored models that are based on structured parametric statistics, such as logit, probit, hazard and NBD models, and, at the same time, are easy to interpret. However, Bucklin et al. (2002, p. 253) remark, with today’s diverse data sets, "[...] it may be counterproductive to rely primarily on standard statistical methods. Emphasizing scalable methods and predictive results may enable us to observe a richer set of behavioral phenomena [...]"
Customer bases may often be very large, containing hundreds of thousands of customers and myriad variables. Additionally, the customer databases of companies are often incomplete, i.e., not the same information is always available for all customers. For example, customers may or may not have participated in surveys and even if they have participated, some of them may only have given partial information. Others may have had customer service contact while others have not. Last but not least, customer service employees sometimes maintain customer records carelessly and leave out important information they should have recorded.
Nevertheless, managers request tools and methods that work for all of their customers alike. In the .eld of machine learning, researchers have long sought for methods that are scalable and work on large samples (Schölkopf and Smola 2002). One of these methodologies is the support vector machine (Vapnik 1995). Still, recognition and diffusion of these methodologies, especially the support vector machine, into the marketing literature has been slow. Just recently Gupta et al. (2006, p. 148) remark that "many of these [machine learning] approaches may be more suitable to the study of customer churn where we typically have a very large number of variables, which is commonly referred to as the ’curse of dimensionality’.
The sparseness of data in these situations in.ates the variance of the estimates, making traditional parametric and non-parametric models less useful." In this spirit, it appears inevitable to empirically analyze the applicability, performance and limitations of these methods in marketing. Given the unsatisfactory results of the NBD models from the last chapter, this chapter analyzes the applicability of the support vector machine for customer (in)activity and future best customer prediction.