Thread Subject: accessing and downloading online files

Subject: accessing and downloading online files

From: Adam Chapman

Date: 28 Dec, 2008 16:41:10

Message: 1 of 5

Hi,

I want to use all the speech database files at
http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/

but it would take ages to dowload them and unzip them all manually. I
remember using the dir command once to view all the file names in a
directory.

is there anything similar I could do here? It would be really useful.

Thanks
Adam

Subject: accessing and downloading online files

From: Adam Chapman

Date: 28 Dec, 2008 18:58:10

Message: 2 of 5

On Dec 28, 4:41=A0pm, Adam Chapman
<adam.chap...@student.manchester.ac.uk> wrote:
> Hi,
>
> I want to use all the speech database files athttp://www.repository.voxfo=
rge1.org/downloads/SpeechCorpus/Trunk/Audi...
>
> but it would take ages to dowload them and unzip them all manually. I
> remember using the dir command once to view all the file names in a
> directory.
>
> is there anything similar I could do here? It would be really useful.
>
> Thanks
> Adam

Not to worry, I've worked it out. I will put my code on the file
exchange so others can read/use it.

Subject: accessing and downloading online files

From: Lars Barring

Date: 28 Dec, 2008 19:00:05

Message: 3 of 5

Adam Chapman <adam.chapman@student.manchester.ac.uk> wrote
> I want to use all the speech database files at
> http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/
>
> but it would take ages to dowload them and unzip them all manually. I
> remember using the dir command once to view all the file names in a
> directory.

Here is an outline of what I did in a similar situation:

1. save the webpage listing all the file (the link above)
2. remove (in a text editor) all the initial html code before the actual link list begins
3. in the same way, remove unnecessary stuff at the end of the file.
Now, you should basically have the lines containing the links with some additional
stuff at the beginning and end of each line
4. read this edited file into a matlab cell array
5. use regexp to extract only the link text to a new cell array
6. for each cell element add the right matlab (or shell script) code to read the web link. This will be something like ['wget ' linkbase '/' cellelement ...] for *n*x shell script or ['urlwrite' .... ], where <linkbase> is your link above.
7. Having downloaded all the files you will now have to generate another script (matlab or shell) to unpack the files. <dir> and <gunzip> will help you here if you do it in matlab (though I am not sure if gunzip handles .tgz)

hth
Lars

Subject: accessing and downloading online files

From: per isakson

Date: 28 Dec, 2008 19:14:02

Message: 4 of 5

Adam Chapman <adam.chapman@student.manchester.ac.uk> wrote in message <ceda4722-c04a-4083-b6ec-e2c87637d2dc@d36g2000prf.googlegroups.com>...
> Hi,
>
> I want to use all the speech database files at
> http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/
>
> but it would take ages to dowload them and unzip them all manually. I
> remember using the dir command once to view all the file names in a
> directory.
>
> is there anything similar I could do here? It would be really useful.
>
> Thanks
> Adam

One approach:
1. read the page with the function urlread
2. extract the filenames with regexp
3. read the files with urlwrite (it writes to your HD)

On the file exchange there are variants urlread2 and urlwrite2, which handle timeout better.

/per

Subject: accessing and downloading online files

From: Adam Chapman

Date: 28 Dec, 2008 23:58:34

Message: 5 of 5

On Dec 28, 7:14=A0pm, "per isakson" <poi.nos...@bimDOTkthDOT.se> wrote:
> Adam Chapman <adam.chap...@student.manchester.ac.uk> wrote in message <ce=
da4722-c04a-4083-b6ec-e2c87637d...@d36g2000prf.googlegroups.com>...
> > Hi,
>
> > I want to use all the speech database files at
> >http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audi...
>
> > but it would take ages to dowload them and unzip them all manually. I
> > remember using the dir command once to view all the file names in a
> > directory.
>
> > is there anything similar I could do here? It would be really useful.
>
> > Thanks
> > Adam
>
> One approach:
> 1. read the page with the function urlread
> 2. extract the filenames with regexp
> 3. read the files with urlwrite (it writes to your HD)
>
> On the file exchange there are variants urlread2 and urlwrite2, which han=
dle timeout better.
>
> /per

What i did was use urlread, to rip all the source code from the
website.
Then I used strfind to locate the filenames
finally saved them to hard drive with untar(url,targetdir)

you can use untar to unzip folders from a url onto your hardrive.
Thanks for your suggestions, It's always good to see alternative
methods.

I'll post the adress to my script when it comes up on the file exchange

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Contact us at files@mathworks.com