|
Hi Francesco,
When you would follow this scheme, you would defintively encounter some
problems, because pages on one of the wikimedia projects are constantly edited,
and so when you are at the end of the analysis, links are not correct anymore.
Instead, you should copy a dump of the project that you want to study (to be
downloaded from http://download.wikimedia.org/) to a local mySQL server. I guess
that the pagelinks.sql.gz files fullfull your needs. Then you can write a
program that queries the SQL server.
Anyhow, I wonder why you want to do this in matlab. Is C# with the MySQL server
and MySQLdriver component not better suited?
Regards
K.
Francesco Pozzi schreef:
> Hi,
> I was wondering if someone has found a quick solution to
> this problem:
>
> ****************************************
>
> Connect to article A of Wikipedia and find all those
> articles that link there (for example:
> http://en.wikipedia.org/w/index.php?
> title=Special:WhatLinksHere/MATLAB&limit=500&namespace=0),
> create a list where
> the first entry is A's address and the following entries
> are the addresses of all the articles that link to A:
>
> A, B, C, D, E, ecc.;
>
> Then search the network, starting from article B and create
> a list as before:
>
> B, R, S, T, ecc.;
>
> Then go on with article C and create a list as before:
>
> C, A, M, P, ecc.;
>
> Do the same for all articles that exist.
>
> What you obtain at the end is a Network of connections
> between articles (a digraph).
> A, B, C, D, E, ...
> B, R, S, T, ...
> C, A, M, P, ...
> ...
> ...
>
> Then (here: http://en.wikipedia.org/w/index.php?
> title=Special:LonelyPages&limit=500&offset=0) you could get
> those articles which exist and link to other pages but
> which no other page links to.
>
> ****************************************
>
> Thank you in advance.
|