Institute for Web Science and Technologies · Universität Koblenz - Landau
Institute WeST

Evolution of Wikipedia Hyperlink Networks (ICWSM 2013)

[zur Übersicht]

These datasets contain the temporal hyperlink networks of the largest language Wikipedias.  These datasets include both the addition of hyperlinks and the removal of hyperlinks with timestamps. 

NOTE:  The datasets are not available publically yet.


The datasets were used in the following paper by Julia Preusse et al.:

[1] Structural Dynamics of Knowledge Networks, Julia Preusse, Jérôme Kunegis, Matthias Thimm, Thomas Gottron, Steffen Staab. Proc. Int. Conf. on Weblogs and Social Media (ICWSM), 2013, pp. 506–515.

Please cite this paper when using the datasets.  


The datasets are available as part of the Koblenz Network Collection (KONECT):

The datasets were extracted from the four biggest Wikipedias except the English one, using the dumps available here.


Each of the files is contained of lines of the follwing format:

id1, id2, op, tstamp

where id1 and id2 are ids of Wikipedia articles, tstamp is the unix timestamp of the operation and the definition of the operation op is as follows.

It can either be 1 if a link from id1 to id2 is added, -1 if a link from id1 to id2 was removed or 0 indicating that article id1 was just updated textually. In the latter case id2 is set to -1.