I was looking for a data feed to get historical options prices, when I stumbled upon http://www.trade-strategy.com/. The site is quite broken, since most of the links don't work. (Maybe the guy got picked up by some company.) From the downloads page, I found a series of Matlab snippets, one of which tries to download the member companies of various Russell indices. Probably because the web site has changed the name of the PDF file, and probably because the author did not make his .jar files available, I was unable to run his code. But I did get some neat ideas.
Using the Java external interface, in combination with a .jar file, one can greatly extend the capabilities of Matlab. From http://pdfbox.apache.org/download.html, one can download
* pdfbox-1.x.0.jar (in my case it was pdfbox-1.1.0.jar)
* fontbox-1.x.0.jar (in my case it was fontbox-1.1.0.jar)
and use these java classes and methods to strip out the text from a PDF file.
The top-level program called getRussellTickers2.m does the following.
* It goes to the www.Russell.com web site and retrieves the list of PDF files.
* It allows the user to choose which Russell index should be downloaded and parsed.
* The PDF file is downloaded using java.net.URL.
* The text is stripped out using PDFTextStripper.
* The text is cleaned up to return just a cell array of strings containing the company names and ticker symbols.
* The text is parsed for the ticker symbols as the last word of each line, and the company name comprises the rest.
* download 'fontbox-1.1.0.jar' and 'pdfbox-1.1.0.jar' from http://pdfbox.apache.org/download.html
(or just get the latest versions)
* place in the same directory as this .m file.
* download and install the latest Java Development Kit
* add "C:\Program Files\Java\jdk1.7.0_02\bin" to your PATH environment variable
(your JDK version will very likely be different)
* run the script file getRussellTickers2.
To get a complete list of classes, use the system command (easily done in Matlab with a "!"):
!jar tf pdfbox-1.1.0.jar
Raj Sodhi (2020). Russell Index Member Companies (https://www.mathworks.com/matlabcentral/fileexchange/28071-russell-index-member-companies), MATLAB Central File Exchange. Retrieved .
Somehow, it stopped working, perhaps because Matlab evolved between 2008 and 2010. Got it working again.