Code covered by the BSD License  

Highlights from
Credit Risk Modeling with MATLAB

5.0

5.0 | 9 ratings Rate this file 96 Downloads (last 30 days) File Size: 3.99 MB File ID: #27847
image thumbnail

Credit Risk Modeling with MATLAB

by

 

07 Jun 2010 (Updated )

These are the supporting MATLAB files for the MathWorks webinar of the same name.

| Watch this File

File Information
Description

In this Credit Risk Modeling webinar, you will learn how MATLAB can help risk teams build an agile Credit Risk Management infrastructure. If you are interested in developing and deploying risk analytics, this webinar will be ideal for you.

Webinar highlights include:
• Credit rating classification
• Transition matrices and probabilities of default
• Credit risk analysis

This webinar is for practitioners or academics in finance whose focus is risk management, credit structuring, quantitative analysis, or asset valuation. Familiarity with MATLAB is helpful, but not required.

Acknowledgements

Customizable Heat Maps inspired this file.

Required Products Database Toolbox
Financial Toolbox
MATLAB Builder EX
MATLAB Compiler
Optimization Toolbox
Statistics Toolbox
MATLAB release MATLAB 7.10 (R2010a)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (33)
09 Dec 2013 Benjamin

Dear Michael,

I wanted to say that the problem in my post from December 8th is solved. The problem was indeed the 32 bit version of Microsoft Access which is not supported by the 64 bit version of Matlab R2010b. A friend who had the 64 bit version of Access installed, could easily define the ODBC data source and run the scripts "Credit_Rating" and "TransitionProbabilities".

Unfortunately, we discovered another problem: When the script "Credit_VaR" is run, the following error occurs:

??? Error using ==> datenum at 182
DATENUM failed.

Error in ==> Credit_VaR at 52
BondData.Maturity = datenum(BondData.Maturity, 'mm/dd/yyyy');

Caused by:
Error using ==> dtstr2dtnummx
Failed on converting date string to date number.

Have you got a suggestion to solve the problem?

Again, thanks a lot.

08 Dec 2013 Benjamin

Dear Michael,

I am not able to follow the first step in the readme file which reads:
"Define the database in the “Data” folder as an ODBC data source".

I therefore tried to follow the steps presented by:
http://www.mathworks.de/de/help/database/gs/configuring-your-environment.html#bt1lm85-1

Although I got Microsoft Office installed, the "Microsoft Access driver" is not deiplayed in the list of "Step 6" and thus, I am not able to establish a connection to the "HistoricalCreditRatings" database. Maybe this is due to the 32 bit version of Microsoft Access 2007 which is in conflict to my MATLAB 64 version, however, this would not make sense to me, since MATLAB should be backward compatible.

Is there an alternative way to import the "HistoricalCreditRatings"? What steps would you recommend to do the pre-work task "Define the database in the “Data” folder as an ODBC data source".

Any help is highly appreciated.

23 Jul 2012 Princewill Olali

I created an ODBC datasource but it's not showing in my querrybuilder (no item to select or insert). Who can help me pls?

12 Jul 2012 Nikolay

I can't import the database file. I think my matlab can only import the JDBC files. I use Mac OS by the way.

I would appreciate any help. I am new at this

10 Nov 2011 Michael Weidman

Michelle-

These results are entirely consistent with how classification trees work. Simply rescaling each of the inputs by multiplying them with different coefficients should have no effect on the tree.

For exactly why this is, I'd recommend Breiman's book (which is referenced in the doc), but the short answer is that trees sort each predictor's observations and try a candidate split within each of the gaps. The tree will then select the split that gives the "best" splitting criterion (and that's an entirely different discussion). Scaling the predictor only serves to scale this process, but it doesn't fundamentally change the results.

As an example: suppose we have a simple set of obervations where the predictor has been measured at 1, 2, 4, and 10. The tree will try splits at 1.5, 3, and 7. Let's say that the "best" split is at 7.

Now we go ahead and rescale this input-- mulitply it by 100 or some other coefficient. Now, the tree tries splits at 150, 300, and 700, and it will still select the split at 700. Rescaling doesn't change anything.

Now, if we were to cleverly create _new_ predictors out of a well-chosen combination (linear or otherwise) of our existing predictors, then that certainly would change the tree's performance. For instance, make a 6th predictor in your X from Altman's coefficient's times your original X-- then you might get some interesting results.

10 Nov 2011 Michelle

Hi Michael,

I've noticed a very curious result of the treebagger, and was wondering if you have had experience with this.

I send a matrix X which has 5 columns of a variety of accounting data. I also send Y, which is a vector of credit ratings. I have about 3000 rows of data.

I understand that the bankruptcy academic community have done extensive research to determine optimal coefficients to predict bankruptcy. I thought it would be interesting to see what the impacts of the coefficients are on the bagging results. So I created a vector, coeff, and multiplied each row of X by the parameters in coeff.

for example: Altman's coefficients are {0.717 0.847 3.107 0.42 0.998}

Curiously, varying the coefficients has NO effect on the oobErrors. I've run exhaustive loops to vary all of the coefficients to track this down.

It seems like in the case of the limit where the coefficient goes to zero, there should be an effect.

thanks for any insights.

09 Nov 2011 Ab  
20 Oct 2011 Michelle

Thank you Michael for the clarification on the use of the predict vs oobPredict functions. I am aware of the benefits of the treebagger in automatically splitting the data, I just wanted to be able to see the individual bits.

Your point about the histogram is well taken, but I do think histograms have some value in seeing the data distribution before and after, even if it doesn't provide insight into the performance of the bagger. The various methods you highlighted in your code are great for this.

I am working with real credit data, so being a visual person, I like to see what the ratings distribution looks like at the beginning/end of the process. I should think it would be curious if the classified data had a completely different distribution.

Thanks again for your demo and webinar, I'm finding it incredibly helpful.

19 Oct 2011 Michael Weidman

Michelle -

I'm afriad that I don't understand what histograms will do for you in this case. One typically matches the "actual" outputs to the model's "predicted" outputs and compares the difference between them in some way to assess the model's performance. Confusion matrices, ROC curves, and other techniques are commonly used to do this. Histograms are not used because they can hide a lot of information. Consider this simple case of classifying data that can take values of either "1" or "2":

% The "actual" data:
Y = [1 2 1 2 2 1];
% The "predicted" data, arrived at through some model:
Y_Pred = [2 1 2 1 1 2];

Most would argue that this "model" is terrible: it has a 100% misclassification rate! In spite of this, hist(Y) and hist(Y_Pred) give the exact same plot.

That visualization concern aside, I think there is some confusion about how Bagging works. In this example, the "actual" ratings are Y, and every observation is used for training in some way. So, Y also represents the training ratings. One of the strengths of ensemble methods like Bagging is that it's not necessary to manually split the data into training and validation sets: you can have your proverbial cake and eat it, too. The out-of-bag errors in this case, though, have a special significance that is too much to explain here-- you should check the doc or (better yet) Breiman's original article for more details on that. Suffice to say, you should not use the OOB errors in the way that you seem to be using them here. If you're looking for the ensemble's predicted ratings, they are simply found by
Y_Pred = predict(b,X);

19 Oct 2011 Michelle

Hi Michael,
I am trying to add lines to the code to plot histograms of ratings for three sets of observations.
I would like to see three different histograms as a result of calling the treebagger: ground truth ratings, training ratings, predicted ratings.

Would you be able to confirm that I have implemented the code correctly and advise on the possibility of the 2nd? I have added my own data file with different ratings, but using the same ideas.

1) Ground Truth Ratings

hist(Y)

2) Training Ratings
Can't seem to find the matrix that stores these ratings.

3) Predicted Ratings: Out-of-bag predictions made within the treebagger routine.
The documentation says that the routine automatically partitions the data as training and to be predicted.

Y_Pred = oobPredict(b);
Y_Pred_Num=ordinal(Y_Pred,[],{'AAA' 'AA+' 'AA' 'AA-' 'A+' 'A' 'A-'...
'BBB+' 'BBB' 'BBB-' 'BB+' 'BB' 'BB-' 'B+' 'B' 'B-'...
'CCC+' 'CCC' 'CCC-' 'CC' 'C' 'D'});
figure(4);
hist(Y_Pred_Num);
xlabel('Ratings');
ylabel('Out of Bag Occurrences');
title('Out of Bag Prediction Results');

Thanks for your help!
Michelle

24 Sep 2011 Philip

Hi Michael,
I still have the error message, "??? Undefined function or method 'fetch' for input arguments of type 'struct'.

Error in ==> getdbdata at 23
e = fetch(e);"

and when I run the CreditRisk_pkg.exe, I still have the "LoadLibrary("CreditRisk_1_0.dll) failed - The specified module could not be found". Can you help please? Thanks. By the way, how do I specify where is the file in the code, "conn = database('Historical Credit Ratings','','password');? I could have saved the HistoricalCreditRatings anywhere in the hard drive.

By the way, I can run the Credit_VaR.m though. Thanks.

24 Sep 2011 Michael Weidman

Philip: On the top of this page we list the required products to run this code. R2010b of MATLAB should be fine as long as you have all of the needed toolboxes as well.

Within this package is a README file that provides step-by-step instructions on how to define the data sources (which is what seems to be going wrong in your error message above), how to get a copy of the MCR, and when you might need to recompile the code. I'm always looking for ways to improve those instructions, so let me know where you find them lacking.

23 Sep 2011 Philip

Hi. I have MATLAB 7.11.0 (R2010b). Am I able to run the code?

22 Sep 2011 Philip

I got this error message when I run the getdbdata." ??? Undefined function or method 'fetch' for input arguments of type 'struct'.

Error in ==> getdbdata at 23
e = fetch(e);"

By the way, how do I assign data source name? I also got error message when I tried to run "CreditRisk_pkg.exe" file, "LoadLibrary("CreditRisk_1_0.dll) failed - The specified module could not be found". How do I get "MCR version 7.13"? Thanks.

15 Sep 2011 Michael Weidman

That's useful feedback, Michelle. I've taken it back to our development team, and we're looking into incorporating those suggestions. Thanks!

15 Sep 2011 Michelle

Thank you for the tip on curly braces. It makes sense now. It's very interesting to be able to graphically see the various trees.

On a side note, perhaps I am overlooking some features of the standard MATLAB window to view the decision trees leaves, but it would be extremely helpful to be able to:

a) read all of the node/leaf labels. Some of them currently get overwritten.
b) cut/paste diagram so it could be manipulated/printed in powerpoint.
c) write in the window to make comments

14 Sep 2011 Michael Weidman

Kostas: You're basically correct. PRBYZERO quotes clean prices; to worry about the partial coupon period, you'd need to use a dirty price convention or (equivalently) calculate the accrued interest. The ACCRFRAC function is useful in this case, as is the (more robust) BONDBYZERO function within Financial Derivatives Toolbox.

You're also correct that our choice of clean or dirty prices doesn't really affect the result as long as we're consistent: if present, the accrued interests from the original bond prices and the simulated bond prices just end up cancelling each other out.

14 Sep 2011 Michael Weidman

Michelle: You have the right idea but the syntax is a little off. If you instead try (note the curly braces):

a = b.Trees{1}
view(a)

Then you'll see what you want. Note that the trees in a bagged ensemble tend to be much bushier than in a well-pruned decision tree on its own-- this typically makes the ensemble a stronger learner.

14 Sep 2011 Michelle

This is a great demo! I am interested in "viewing" some of the individual decision trees generated by the treebagger function. When I look at
b.Trees(1)

ans =

[1x1 classregtree]

the fact that this is associated with the classregtree leads me to believe that I should be able to call the "view" command on this variable. The Iris Data examples show this to be the case. For example:

a = b.Trees(1)
view(a)

But I get an error statement:
??? Error using ==> view at 37
Invalid input arguments

Could you please advise if there is a way to plot individual decision trees as a result of the treebagger function?

08 Sep 2011 Konstantinos

Michael thank you for this great demo
I have some question regarding the valuation of bonds in the |Credit_VaR.m| file.

1. Regarding the valuation date I guess it must be the one year ahead date, since we are using one-year forward interest rates
2. If we assume (as in Credit Metrics – technical Document. p27), a bond with Face value: 100, Maturity: 5 years, Coupon: 6%, and a one-year forward zero curve, Year 1 3.72 %,Year 2 4.32 %,Year 3 4.93 %,Year 4 5.32 %
Then we will have a cash flow [6 6 6 6 106] and a price
P = 6 + 6/(1+3.72/100) + 6/(1+4.32/100)^2 + 6/(1+4.93/100)^3 + 106/(1+5.32/100)^4 = 108.6430

However if we use the prbyzero function (as you use in Credit_VaR.m file)
prbyzero([datenum('1-Dec-2005'),0.06],datenum('1-Dec-2001'),[3.72 4.32 4.93 5.32]'/100,datenum({'1-Dec-2001';'1-Dec-2002';'1-Dec-2003';'1-Dec-2004'})')
will get 102.5294

I think this happens because with the above syntax we lose the coupon at the end of the first year.
Is there any syntax that can help us price correctly the bond N - years ahead?

I suppose that there is no difference in results as
we are interesting in the difference of original and new portfolio values (please confirm)
thank you
Kostas

08 Sep 2011 Konstantinos  
16 Aug 2011 Michael Weidman

Eli: It depends on which version of Credit_Rating you are looking at. The version within the "MATLAB files" folder gives the full development process workflow and definitely shows TreeBagger in all its glory.

The version within the "Excel Deployment" folder follows a different workflow. In this workflow, we have already trained the classifier on historical data, and our only remaining task is to apply it to new data. As such, it simply loads a pre-trained TreeBagger from the included MAT-file (thus creating the TreeBagger viariable "b" in the workspace) and runs the PREDICT method on the new data.

I'd recommend focusing on the code in the "MATLAB files" folder first and only looking at "Excel Deployment" once you need to push the analytics to another environment.

16 Aug 2011 Michael Weidman

Bill: I'm pretty sure that the issue is the version of Office you are running. Note that these spreadsheets are in the XLSX and XLSM file formats, which were only supported with Office 2007 and later. Since MATLAB uses Microsoft technology to read from its own file formats, your Office 2003 isn't up to the task of reading that file properly.

That's my best guess-- I hope that helps.

16 Aug 2011 Eli Haroch

Hi,
I can not find the reference to TreeBagger in the Credit_Rating file. Do I miss something ?

15 Aug 2011 Bill Zou

Michael: Thank you very much for the code. I try to run your code Credit_VaR, but the sytem give me the following message

??? Error using ==> setobsnames at 28
NEWNAMES must be a nonempty string or a cell array of nonempty strings.

when I track the problem, it is due the the following line:

BondData = dataset( ...
'XLSFile', 'CreditPortfolio.xlsx', ...
'Sheet', 'Portfolio Information', ...
'ReadObsNames', true, ...
'ReadVarNames', true);

I am not sure what's wrong with it. Can you give me a clue. My version for matlab is 7.12.0.635. excel is 2003 version.

thank you again for the help!

08 Aug 2011 Michael Weidman

Panagiotis--

That is an excellent catch that you've made. My choice to create a very large "ratingsMask" variable was decided through balancing concerns of computational speed and memory overhead. The code that you see here does have big memory usage, but in return it runs faster than any other variation on the code that I tried.

In fact, I specifically chose the number of simulations to be just under the 32-bit memory limit for the number of bonds in this portfolio. This is useful for seminar and webinar purposes, but you're correct that it leads to memory issues when the portfolio grows in size.

If you do wish to grow the size of the portfolio and/or the number of simulations, you do have options:

1. Use a 64-bit version of MATLAB. That should clear everything up nicely with no code changes.

2. Reduce the number of simulations done at a time, potentially performing the simulations in blocks instead of all at once. This will require some code changes, like an outer FOR-loop for the simulation blocks. If you have access to Parallel Computing, these blocks should be able to be evaluated in parallel.

3. Refactor the simulation code, perhaps by moving to a one-bond-at-a-time structure instead of the heavily vectorized version shown here. This will easily remove the memory concerns, but it will require more rewriting of the code and will slow down the speed considerably, too.

Of course, I may easily have missed some solution that offers a perfect mix of speed and memory efficiency-- if I did, please let me know!

26 Jul 2011 Panagiotis Braimakis

Michael very nice illustration. I got a small point to stress on this part of your code in Credit_VaR:

%% 2. Compute original value of portfolio
OriginalPrices = zeros(size(BondData, 1), 1);
dates = [valDate; valDate+365; valDate+2*365; valDate+3*365];

% The |prbyzero| function in the Financial Toolbox allows us to value
% these bonds.
for idx = 1 : length(Ratings)-1
ratingsMask = BondData.Rating == Ratings{idx};
OriginalPrices(ratingsMask) = prbyzero( ...
[x2mdate(BondData.Maturity(ratingsMask)) BondData.Coupon(ratingsMask)],... %x2mdate was missing
valDate, Rates.(Ratings{idx}), dates);
end

i think is not very optimal cause it leads to big Memory usage on larger portfolios more than 3K bonds. You can check that for bigger portfolios this is were you have memory problems.

OriginalPrices(ratingsMask) = prbyzero( ...
Ones needs to rewrite this part i guess since the direct assignment of the 0,1s ratings Mask has some problems. What do you think?

14 Jan 2011 Michael Weidman

Good questions, Luis. In order to accurately compare the value of the initial portfolio and the simulated portfolios, I could roll the simulated valuation dates one year forward as you suggest. But then to accurately compare the two results, I'd need to discount the simulated values back to the original valuation date. Those two effects exactly cancel each other (as they should in an arbitrage-free environment), so there's no real harm in using the same valuation date for both the original and the simulated portfolios. This is a bit of a simplification, but it works for this demo.

Note also that this ignores the risks associated with interest rates changing over the simulated year. A risk manager could easily worry about those types of risks (and MATLAB provides tools to help), but this code is focusing on credit risk only.

As for the interest rates: yes, they're basically the one-year forward rates. This isn't necessarily a realistic view of interest rates-- again, the focus of the demo is on credit risk, so we make some simplification on secondary matters like the rates. More realistic interest rate curves can be calibrated and simulated using the IRDataCurve object in Fixed-Income Toolbox and related functions.

13 Jan 2011 Luis

Thanks for the code. It is well written and very well documented. In particular, I have a question regarding the code in Credit_VaR.m. In step 2 (initial portfolio valuation) and in step 6 (valuation based on the simulated ratings) you use the same dates for pricing. I think one should use dates one year ahead (or at the end of the VaR estimation horizon) in step 6 (using the prbyzero function). Furthermore, could you be more specific on what you mean by interest rates in the file Credit Portfolio.xlsx (sheet Transitions and Rates). Are these one year forward rates?

10 Dec 2010 Yen Hanning

exactly what i want. please make more toolkit for financial analysis as well.

02 Jul 2010 Michael Weidman

Andres - I am happy that you found this code useful! This use of the tilde ("~") operator is a feature of MATLAB R2009b and later. It allows us to ignore function arguments if we don't need them. In earlier versions of MATLAB, a dummy variable (like "a1") would be used instead.

Also, note that in addition to this syntax above, other parts of the MATLAB analysis (especially those using TreeBagger) require more recent versions of MATLAB because they are using newer features.

Thanks again!

01 Jul 2010 Andres Licona

Line 67 in TransitionProbabilities.m you can replace[~,idx,~] by [a1,idx,a1].

Great code Michael!!

24 Jun 2010 Lisheng Su  
Updates
10 Nov 2011

Small update to heatmap.m to remove a warning that is thrown in R2011a and later.

Contact us