aseth (over on the Yahoo! Data Mining Club) wrote:
"The few factors that bias the choice are:
A. Type of data. If most of the data types are numeric and the non-numeric data types have ordinality associated with them, then the choice usually is NNs. Data sets rich in non-numeric data types usually warrant some kind of a rule induction based system."
Yes, definitely. While different data types can be "bent" into others (dummy variables, binning, etc.), modeling systems make more natural fits with some kinds of data than others.
aseth continues:
"B. Customer preference of model type. Some customers are adamant on seeing rules and have been (mis)educated on the black box nature of NNs. While some others (especially in the financial/banking sectors), love NNs and have large systems deployed around NNs (like fraud detection systems)."
This is what I call the "political" factor. I think many people think they're getting something they're not, and not just with neural networks. In marketing, for example, it is popular to find "segments" (clusters) in populations, and look for "triggers" of important behaviors (purchasing, defection, etc.). I don't think this is as simple as many people seem to believe.
aseth continues:
"Bottom line:
Usually, I let my guys loose on both types of modeling for a given data set (obviously after careful preparation of the data to suit the type of modeling that the data is going to be put through). The results of all of these are compared - a.k.a model efficiencies are discerned using various methodologies (error rates, tendencies for over-fitting, etc.)."
This is what I was really wondering about. I like to throw multiple tools at any problem, but I have far more tools than I generally have time to try. As a consequence, sometools get used more than others, for a variety of reasons (convenience, testing capabilities, etc. in addition to typical accuracy performance). I have noticed that many data miners develop a preference for particular types of tools.
Predictor