Different thresholds will offer different trade-offs between errors and throughput: at one end, errors are less frequent but sometimes no outcome is chosen; at the other end, some outcome is always chosen, but errors are more frequent. You will need to decide what trade-off best suits your problem.
If I have two series of numbers, series A contains either 1s or 0s, depending on if a patient took a pill or not. Series B contains random numbers. All of the series B numbers that coincide with the patient taking a pill have an average of 100, whereas those that coincide with NOT taking a pill...
Methodologies are largely checklists to help avoid overlooking anything. I don't think that one really presents a substantial advantage over the next. Personally, I use my own process, which I hone over time.
In most cases, the most predictive in data regarding delinquent customers likelihood to pay will be their activity with the loan product (purchasing and payment activity on a credit card, etc.). Credit bureau data is also popular, and I know some people have had success with demographic data...
I work for the collections department of a bank, building predictive models of customer behavior. Out data is stored in Oracle, which I retrieve to a PC for analysis and model development in MATLAB.
Did you have more specific questions?
Linked below is another paper on the subject of data mining versus statistics, "Data Mining and Statistics: What's the Connection?", by Friedman:
http://www-stat.stanford.edu/~jhf/ftp/dm-stat.pdf
-Will
I suppose association rule analysis (also called "market basket analysis") might work. You can find a list of commercial and free tools which perform such analysis at:
http://www.kdnuggets.com/software/associations.html
If, however, you know how the groups will be defined (model, color, A/C...
In general, for train/test splitting, I try to stratify as much as possible within reason, and yes, I do stratify on the dependent variable. "Within reason" means: 1. I worry most about variables believed to be important, and 2. individual stratification cells should not become too small.
You...
Readers here may be interested in my article,Family Recipe For Neural Networks, which was posted to the Data Mining and Predictive Analytics Web log:
http://abbottanalytics.blogspot.com/2006/11/family-recipe-for-neural-networks.html#links
I hope this is helpful!
There is a post on the Data Mining and Predictive Analytics Web log, Free And Inexpensive Data Mining Software, which may be of interest:
http://abbottanalytics.blogspot.com/2006/11/free-and-inexpensive-data-mining.html
Note the discussion which follows in the Comments section, as well.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.