Curlie - Computers: Internet: Searching: Directories: Curlie: Data Users

Data Users

Users of the downloadable data set who use data in accordance with the Curlie license are included in this category. More about the download of Curlie directory data under an Open Source license.
More information

Sites 9 Sorted by Review Date Sorted Alphabetically

Advanced Methods to Audit Online Web Services

In her doctoral thesis, Pelayo Vallina Rodríguez examines various criteria for the quality of web services using Curlie data: the timeliness of data, plausible application of taxonomy, a multi-lingual classification of domains, and frequency of identical site content etc. Universidad Carlos III de Madrid / IMDEA Networks Institute, 2022, 163 pages. [PDF]

Crawling the German Health Web: Exploratory Study and Graph Analysis

Richard Zowalla, Thomas Wetter, and Daniel Pfeifer from Heilbronn University and Heidelberg University used the World/Deutsch/Gesundheit branch dataset to examine the structure, origin, plausibility, and familiarity of health sites in Austria, Switzerland and Germany. The results were incorporated into further scientific research. Journal of Medical Internet Research, 2020.

Homepage2Vec: Language-Agnostic Website Embedding and Classification

The use of the Curlie dataset for software for language-independent classification and embedding of arbitrary websites is explained. The concept, model architecture, features, and training phase of a neural network, as well as the evaluation, are also described with diagrams. Sylvain Lugeon, Tiziano Piccardi, Robert West from Ecole polytechnique fédérale de Lausanne (EPFL), 2022, 7 pages.

Identifying Sensitive URLs at Web-Scale

Development of a text classifier to identify sensitive data URLs based on criteria such as religion, health, sexual orientation, and others. Curlie data was used for scientific training of the classifier in order to implement data protection laws. Srdjan Matic und Georgios Smaragdakis from Technische Universität Berlin, Costas Iordanou from Cyprus University of Technology, and Nikolaos Laoutaris from IMDEA Networks Institute Madrid; 2020, 15 pages.

On the Prevalence of Leichte Sprache on the German Web

Study on the prevalence of plain language on the German web using web analytics and qualitative methods based on Curlie data. Technical and political recommendations for a barrier-free web are provided. Hadi Asghari, Freya Hewett, Theresa Züger from Alexander von Humboldt Institute for Internet and Society Berlin, Germany; 2023, 6 pages.

Open WebSearch - Open Search Foundation e.V.

The data in the Curlie directory are part of the Open Web Index, which is maintained and developed at the University of Passau, Germany. With information about the index and research activities in the fields of information science and others.

TenTen Corpus Family - Lexical Computing CZ s.r.o.

The Curlie taxonomy act as a lexical standard to classify words and word meanings from web texts for text collections in over 35 languages. The collections are used in linguistics to validate linguistic rules, and to research the frequency of terms and word meanings, and to determine language development etc. The goal of the collection is 10 billion words (“tenTen”) in size.

Web Directories: A Searching Tool

Anubhaw Kumar Suman and Dr. Madhu Patel from the Mahatma Gandhi Central University, India, describe history, types, structure and function of web directories, Curlie included. Advantages and disadvantages are examined. Target group-specific aspects (academic, kids, family etc.) and relevance are explained. International Conference on Knowledge Management in Higher Education Institutions at Manipal University Jaipur, India. April 2022, 18 pages.

Web2Wiki: Characterizing Wikipedia Linking Across the Web

The study examines the role of Wikipedia in the “information ecosystem” of how Wikipedia is linked across the Web, providing foundational estimates of its presence and influence. Curlie's taxonomy and statistical data were used to evaluate the collected data. The frequency of citations and references to Wikipedia and the use of images from the encyclopedia were measured. From Veselovsky (Princeton University), Piccardi (Stanford University) et al.; 2025, 13 pages. [PDF]

Web Directories: A Searching Tool

TenTen Corpus Family - Lexical Computing CZ s.r.o.

Web2Wiki: Characterizing Wikipedia Linking Across the Web

Crawling the German Health Web: Exploratory Study and Graph Analysis

On the Prevalence of Leichte Sprache on the German Web

Homepage2Vec: Language-Agnostic Website Embedding and Classification

Advanced Methods to Audit Online Web Services

Open WebSearch - Open Search Foundation e.V.

Identifying Sensitive URLs at Web-Scale

Other languages 1

Deutsch

Last update:

February 22, 2026 at 22:07:34 UTC

Computers

Games

Health

Home

News

Recreation

Reference

Regional

Science

Shopping

Society

Sports

All Languages

Arts

Business

"Users of Curlie Data" search on:

AOL - Bing - Brave - DuckDuckGo - Ecosia - Mojeek - Google - StartPage - Tiger - Wiby - Yahoo - Yandex