Sie sind hier

PINTS - Experiments Data Sets

Self-organizing structure and availability of almost unlimited resource capacities make the peer-to-peer architecture very attractive for large-scale sharing of annotated data in Web 2.0 scenarios. We addressed the problem of information aggregation and utilization in a decentralized tagging environment, introduced the vector space model for characterization of users, resources, and tags, and analyzed the problem of constructing a reliable approximation for feature vectors in a fully decentralized setting.
A large-scale systematic evaluation with realistic data sets was done to prove the viability of our approach.

Data Set

Two large-scale folksonomy data sets were used for the simulation of PINTS. They were obtained by systematically crawling the Flickr and Del.icio.us portals during 2006 and 2007. The crawls were done in the context of the Tagora project. The crawling targets were the core elements, namely users, tags, resources and tag assignments.

The statistics of the crawled datasets are summarized below
                

Dataset  Users Tags Resources Tag assignm. Download
Flickr 319,686 1,607,879 28,153,045 112,900,000 flickr_UsrResTag.7z
(518 MB) packed with 7zip
Delicious 532,924 2,481,698 17,262,480 140,126,586 delicious_UsrResTag.7z
(848 MB) packed with 7zip

The archives were compressed with 7zip and contain a single text file with time-ordered tag assignments in 4 tab-separated columns. The columns are (in following order): posting date, user ID, resource ID, and tag label.

Contact

Olaf Görlitz
Prof. Dr. Dr. Sergej Sizov
Prof. Dr. Steffen Staab

Publications

2008

Goerlitz2008PPI
Görlitz, Olaf; Sizov, Sergej; Staab, Steffen (2008): PINTS: Peer-to-Peer Infrastructure for Tagging Systems. In: Proceedings of the Seventh International Workshop on Peer-to-Peer Systems, IPTPS. Tampa Bay, USA: