My Account
Users of the downloadable data set who use data in accordance with the Curlie license are included in this category. More about the download of Curlie directory data under an Open Source license.
More information
In her doctoral thesis, Pelayo Vallina Rodríguez examines various criteria for the quality of web services using Curlie data: the timeliness of data, plausible application of taxonomy, a multi-lingual classification of domains, and frequency of identical site content etc. Universidad Carlos III de Madrid / IMDEA Networks Institute, 2022, 163 pages. [PDF]
Richard Zowalla, Thomas Wetter, and Daniel Pfeifer from Heilbronn University and Heidelberg University used the World/Deutsch/Gesundheit branch dataset to examine the structure, origin, plausibility, and familiarity of health sites in Austria, Switzerland and Germany. The results were incorporated into further scientific research. Journal of Medical Internet Research, 2020.
The use of the Curlie dataset for software for language-independent classification and embedding of arbitrary websites is explained. The concept, model architecture, features, and training phase of a neural network, as well as the evaluation, are also described with diagrams. Sylvain Lugeon, Tiziano Piccardi, Robert West from Ecole polytechnique fédérale de Lausanne (EPFL), 2022, 7 pages.
Development of a text classifier to identify sensitive data URLs based on criteria such as religion, health, sexual orientation, and others. Curlie data was used for scientific training of the classifier in order to implement data protection laws. Srdjan Matic und Georgios Smaragdakis from Technische Universität Berlin, Costas Iordanou from Cyprus University of Technology, and Nikolaos Laoutaris from IMDEA Networks Institute Madrid; 2020, 15 pages.
Study on the prevalence of plain language on the German web using web analytics and qualitative methods based on Curlie data. Technical and political recommendations for a barrier-free web are provided. Hadi Asghari, Freya Hewett, Theresa Züger from Alexander von Humboldt Institute for Internet and Society Berlin, Germany; 2023, 6 pages.
The data in the Curlie directory are part of the Open Web Index, which is maintained and developed at the University of Passau, Germany. With information about the index and research activities in the fields of information science and others.
The Curlie taxonomy act as a lexical standard to classify words and word meanings from web texts for text collections in over 35 languages. The collections are used in linguistics to validate linguistic rules, and to research the frequency of terms and word meanings, and to determine language development etc. The goal of the collection is 10 billion words (“tenTen”) in size.
Anubhaw Kumar Suman and Dr. Madhu Patel from the Mahatma Gandhi Central University, India, describe history, types, structure and function of web directories, Curlie included. Advantages and disadvantages are examined. Target group-specific aspects (academic, kids, family etc.) and relevance are explained. International Conference on Knowledge Management in Higher Education Institutions at Manipal University Jaipur, India. April 2022, 18 pages.
The study examines the role of Wikipedia in the “information ecosystem” of how Wikipedia is linked across the Web, providing foundational estimates of its presence and influence. Curlie's taxonomy and statistical data were used to evaluate the collected data. The frequency of citations and references to Wikipedia and the use of images from the encyclopedia were measured. From Veselovsky (Princeton University), Piccardi (Stanford University) et al.; 2025, 13 pages. [PDF]
Anubhaw Kumar Suman and Dr. Madhu Patel from the Mahatma Gandhi Central University, India, describe history, types, structure and function of web directories, Curlie included. Advantages and disadvantages are examined. Target group-specific aspects (academic, kids, family etc.) and relevance are explained. International Conference on Knowledge Management in Higher Education Institutions at Manipal University Jaipur, India. April 2022, 18 pages.
The Curlie taxonomy act as a lexical standard to classify words and word meanings from web texts for text collections in over 35 languages. The collections are used in linguistics to validate linguistic rules, and to research the frequency of terms and word meanings, and to determine language development etc. The goal of the collection is 10 billion words (“tenTen”) in size.
The study examines the role of Wikipedia in the “information ecosystem” of how Wikipedia is linked across the Web, providing foundational estimates of its presence and influence. Curlie's taxonomy and statistical data were used to evaluate the collected data. The frequency of citations and references to Wikipedia and the use of images from the encyclopedia were measured. From Veselovsky (Princeton University), Piccardi (Stanford University) et al.; 2025, 13 pages. [PDF]
Richard Zowalla, Thomas Wetter, and Daniel Pfeifer from Heilbronn University and Heidelberg University used the World/Deutsch/Gesundheit branch dataset to examine the structure, origin, plausibility, and familiarity of health sites in Austria, Switzerland and Germany. The results were incorporated into further scientific research. Journal of Medical Internet Research, 2020.
Study on the prevalence of plain language on the German web using web analytics and qualitative methods based on Curlie data. Technical and political recommendations for a barrier-free web are provided. Hadi Asghari, Freya Hewett, Theresa Züger from Alexander von Humboldt Institute for Internet and Society Berlin, Germany; 2023, 6 pages.
The use of the Curlie dataset for software for language-independent classification and embedding of arbitrary websites is explained. The concept, model architecture, features, and training phase of a neural network, as well as the evaluation, are also described with diagrams. Sylvain Lugeon, Tiziano Piccardi, Robert West from Ecole polytechnique fédérale de Lausanne (EPFL), 2022, 7 pages.
In her doctoral thesis, Pelayo Vallina Rodríguez examines various criteria for the quality of web services using Curlie data: the timeliness of data, plausible application of taxonomy, a multi-lingual classification of domains, and frequency of identical site content etc. Universidad Carlos III de Madrid / IMDEA Networks Institute, 2022, 163 pages. [PDF]
The data in the Curlie directory are part of the Open Web Index, which is maintained and developed at the University of Passau, Germany. With information about the index and research activities in the fields of information science and others.
Development of a text classifier to identify sensitive data URLs based on criteria such as religion, health, sexual orientation, and others. Curlie data was used for scientific training of the classifier in order to implement data protection laws. Srdjan Matic und Georgios Smaragdakis from Technische Universität Berlin, Costas Iordanou from Cyprus University of Technology, and Nikolaos Laoutaris from IMDEA Networks Institute Madrid; 2020, 15 pages.

Other languages 1

Last update:
February 22, 2026 at 22:07:34 UTC
Computers
Games
Health
Home
News
Recreation
Reference
Regional
Science
Shopping
Society
Sports
All Languages
Arts
Business