Text mining with matlab of affiliation strings of a pubblication database

Question

pietro on 17 Nov 2017

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/367728-text-mining-with-matlab-of-affiliation-strings-of-a-pubblication-database

Closed: John D'Errico on 18 Nov 2017

image.png

Hi all,

I want to carry out an authorship analysis by means of complex networks. Therefore, I downloaded data from Scopus as CSV file. Each node (that is author) will be identified from the combination of name and affiliation code, which can be something like "University of London". Thus, the result is not biased from author of the same name. It is easy to extract the same author name but not that easy for the affiliation, because the affiliation strings have not any standard structure. They appear in many forms, like: "university of XXX…", "XXX university…", "Department of YYY…", acronym of the department, the address is not always included, etc. In few cases, the affiliations lack of details, therefore it is simply "university of XXX". This makes the rather challinging to assign to each affiliation string the affiliation code. I partially solved the problem using the following approach: 1- Manually definition a word bank for each affiliation, which can be (street name, city, acronym of the deparment, etc) 2- Separating each affiliation string in substrings of single words 3- Each substring set was compared with the word bank of each affiliation and likely the affiliation is the one where the intersection with the relative word bank is the largest.

Unfortunately, this approach doesn't work as good as expected, in many the affiliation code is wrongly assigned and it requires more manual work than I thought. So which can be an improved method than the adopted one?

Thank you

Best regards

0 Comments
Show -2 older commentsHide -2 older comments

This question is closed.

Text mining with matlab of affiliation strings of a pubblication database

0 Comments
Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Tags

Products

Community Treasure Hunt

Text mining with matlab of affiliation strings of a pubblication database

0 Comments Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments