Code covered by the BSD License  

Highlights from
Russell Index Member Companies

Be the first to rate this file! 7 Downloads (last 30 days) File Size: 272.54 KB File ID: #28071

Russell Index Member Companies

by Raj Sodhi

 

29 Jun 2010 (Updated 26 Jan 2012)

Downloads PDF file from www.Russell.com to get Russell Index member companies.

| Watch this File

File Information
Description

I was looking for a data feed to get historical options prices, when I stumbled upon http://www.trade-strategy.com/. The site is quite broken, since most of the links don't work. (Maybe the guy got picked up by some company.) From the downloads page, I found a series of Matlab snippets, one of which tries to download the member companies of various Russell indices. Probably because the web site has changed the name of the PDF file, and probably because the author did not make his .jar files available, I was unable to run his code. But I did get some neat ideas.

Using the Java external interface, in combination with a .jar file, one can greatly extend the capabilities of Matlab. From http://pdfbox.apache.org/download.html, one can download
* pdfbox-1.x.0.jar (in my case it was pdfbox-1.1.0.jar)
* fontbox-1.x.0.jar (in my case it was fontbox-1.1.0.jar)
and use these java classes and methods to strip out the text from a PDF file.

The top-level program called getRussellTickers2.m does the following.
* It goes to the www.Russell.com web site and retrieves the list of PDF files.
* It allows the user to choose which Russell index should be downloaded and parsed.
* The PDF file is downloaded using java.net.URL.
* The text is stripped out using PDFTextStripper.
* The text is cleaned up to return just a cell array of strings containing the company names and ticker symbols.
* The text is parsed for the ticker symbols as the last word of each line, and the company name comprises the rest.

  Instructions:
    * download 'fontbox-1.1.0.jar' and 'pdfbox-1.1.0.jar' from http://pdfbox.apache.org/download.html
        (or just get the latest versions)
    * place in the same directory as this .m file.
    * download and install the latest Java Development Kit
    * add "C:\Program Files\Java\jdk1.7.0_02\bin" to your PATH environment variable
             (your JDK version will very likely be different)
    * run the script file getRussellTickers2.

To get a complete list of classes, use the system command (easily done in Matlab with a "!"):
!jar tf pdfbox-1.1.0.jar

Enjoy!

Raj

MATLAB release MATLAB 7.11 (2010b)
Other requirements My platform: Windows XP. Don't know if it works on a Mac.
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Please login to add a comment or rating.
Updates
26 Jan 2012

Somehow, it stopped working, perhaps because Matlab evolved between 2008 and 2010. Got it working again.

Tag Activity for this File
Tag Applied By Date/Time
java external interface Raj Sodhi 30 Jun 2010 11:35:36
jar Raj Sodhi 30 Jun 2010 11:35:36
jar files Raj Sodhi 30 Jun 2010 11:35:36
pdf Raj Sodhi 30 Jun 2010 11:35:36
finance Raj Sodhi 30 Jun 2010 11:35:36
string manipulations Raj Sodhi 30 Jun 2010 11:35:36

Contact us at files@mathworks.com