Thread Subject: Search Wikipedia Network

Subject: Search Wikipedia Network

From: Francesco Pozzi

Date: 31 Jul, 2008 06:27:01

Message: 1 of 2

Hi,
I was wondering if someone has found a quick solution to
this problem:

****************************************

Connect to article A of Wikipedia and find all those
articles that link there (for example:
http://en.wikipedia.org/w/index.php?
title=Special:WhatLinksHere/MATLAB&limit=500&namespace=0),
create a list where
the first entry is A's address and the following entries
are the addresses of all the articles that link to A:

A, B, C, D, E, ecc.;

Then search the network, starting from article B and create
a list as before:

B, R, S, T, ecc.;

Then go on with article C and create a list as before:

C, A, M, P, ecc.;

Do the same for all articles that exist.

What you obtain at the end is a Network of connections
between articles (a digraph).
A, B, C, D, E, ...
B, R, S, T, ...
C, A, M, P, ...
...
...

Then (here: http://en.wikipedia.org/w/index.php?
title=Special:LonelyPages&limit=500&offset=0) you could get
those articles which exist and link to other pages but
which no other page links to.

****************************************

Thank you in advance.

Subject: Search Wikipedia Network

From: Kris De Gussem

Date: 31 Jul, 2008 07:08:08

Message: 2 of 2

Hi Francesco,

When you would follow this scheme, you would defintively encounter some
problems, because pages on one of the wikimedia projects are constantly edited,
and so when you are at the end of the analysis, links are not correct anymore.
Instead, you should copy a dump of the project that you want to study (to be
downloaded from http://download.wikimedia.org/) to a local mySQL server. I guess
that the pagelinks.sql.gz files fullfull your needs. Then you can write a
program that queries the SQL server.
Anyhow, I wonder why you want to do this in matlab. Is C# with the MySQL server
and MySQLdriver component not better suited?

Regards
K.


Francesco Pozzi schreef:
> Hi,
> I was wondering if someone has found a quick solution to
> this problem:
>
> ****************************************
>
> Connect to article A of Wikipedia and find all those
> articles that link there (for example:
> http://en.wikipedia.org/w/index.php?
> title=Special:WhatLinksHere/MATLAB&limit=500&namespace=0),
> create a list where
> the first entry is A's address and the following entries
> are the addresses of all the articles that link to A:
>
> A, B, C, D, E, ecc.;
>
> Then search the network, starting from article B and create
> a list as before:
>
> B, R, S, T, ecc.;
>
> Then go on with article C and create a list as before:
>
> C, A, M, P, ecc.;
>
> Do the same for all articles that exist.
>
> What you obtain at the end is a Network of connections
> between articles (a digraph).
> A, B, C, D, E, ...
> B, R, S, T, ...
> C, A, M, P, ...
> ...
> ...
>
> Then (here: http://en.wikipedia.org/w/index.php?
> title=Special:LonelyPages&limit=500&offset=0) you could get
> those articles which exist and link to other pages but
> which no other page links to.
>
> ****************************************
>
> Thank you in advance.

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
search wikipedi... Francesco Pozzi 31 Jul, 2008 02:30:08
rssFeed for this Thread

Contact us at files@mathworks.com