Wikidata

From   

The first idea of what was to become Wikidata was born in early May 2005. Markus Krötzsch, Max Völkel, and Denny Vrandečić had each recently joined the research group of Rudi Studer at University of Karlsruhe, a leading location of Semantic Web research. Fascinated by the Wikipedia concept, it was natural to ask how the Semantic Web ideas of explicit specification and machine-readable processing could make a contribution. Vrandečić proposed to annotate links on Wikipedia pages, inspired by the notion of typed links, a well-known concept in the hypertext. The result was the early concept of "Semantic Wikipedia", a proposal to use annotations in wikitext markup for embedding structured data into Wikipedia articles.

Krötzsch and Vrandečić presented the idea at the first ever Wikimania conference on August 5th, 2005. They called for volunteers to implement it. In the 48 hours after their talk, Krötzsch and Vrandečić created a related community portal, with details on project goal, implementation plan, and envisioned applications. Within a month, the idea had gathered vocal supporters. The German company DocCheck stepped up to donate this effort, leading to the first implementation of the "Semantic MediaWiki" (SMW). The SMW saw its first release 0.1 on September 29th, 2005.

Between 2005 and 2012, the original Semantic Wikipedia evolved to the first accurate concept of Wikidata. Erik Möller, by then Deputy Director of the Wikimedia Foundation, was the driving force behind a major change. Möller favored a single Wikidata for all languages. Already in his original Wikidata proposal in 2004, he had envisioned a solution "to centrally store and manage data from all Wikimedia projects". The resulting design combined combined this idea with the more flued, graph-based data model of Semantic Wikipedia.

Based on the long-standing interest in structured data around Wikimedia projects, Danese Cooper, then CTO of the Wikimedia Foundation, convened the Wikimedia Data Summit in February 2011. Tim O'Reilly hosted the summit at the headquarters of O'Reilly in Sebastopol. The invitation included representatives from the Wikimedia Foundation, Freebase, DBpedia, Semantic MediaWiki, R.V. Guha from Google, Mark Greaves from Paul Allen's Vulcan, and others. Many different ideas were discussed, but a rough consensus between some participant emerged, which would prompt Vrandečić to start writing a proposal for what at first called data.wikimedia.org.

The project proposal draft was presented to the community by Vrandečić at Wikimania 2011 in Haifa. At that event, Vrandečić met Qamarniso Ismailova, an administrator of the Uzbek Wikipedia. They married in August 2012. The Q prefix in QIDs, used as identifiers in Wikidata, is the first letter in her name.

Möller made it clear that the Wikimedia Foundation would be reluctant to take on a project of this scale. Instead, he identified the German chapter, Wikimedia Deutschland, as a good potential host for the development. Pavel Richter, then Executive Director of Wikimedia Deutschland took the proposal to WMDE's Board, which decided to accept Wikidata as a new Wikimedia project in June 2011.

With the help of Lisa Seitz-Gruwell at the Wikimedia Foundation, they secured € 1.3 Million in funding for the project, half from the Allen Institute for AI and a quarter each from Google and the Gordon and Betty Moore Foundation.

Later, at least one major donor dropped out because the project proposal insisted that the ontology of Wikidata had to be community-controlled, and would be neither pre-defined by professional ontologists nor imported from existing ontologies. Possible funders were also worried that the project did not plan to bulk-upload DBpedia to kick-start the content. Vrandečić was convinced that both of these requirements would not have had a positive effect on the organic growth of the community. Convinced the project would fail because of that, they dropped out.

The development of Wikidata began on April 1st, 2012 in Berlin.

Google announced the Knowledge Graph in May 2012, which had a lasting positive impact on the interest into Wikidata.

Wikidata launched on October 29, 2012.

One surprisingly contentious aspect was the use of numeric QIDs. A major influence for preferring abstract QIDs were discussions with Metaweb regarding their experience with Freebase. De-coupling a concept's name from its ID can increase stability. More importantly, the founders of Wikidata did not want to use an anglo-centric solution, nor suggest the use of many different language-specific identifiers for the same item.

In September 2015, the initial Wikidata Query tool by Manske had served its purpose as a feasibility study and demo, and the Wikidata Query Service (WDQS) was launched. WDQS is a Blazegraph-based SPARQL endpoint that gives access to the RDF-ized version of the data in Wikidata in real-time. Its goal is to enable applications and services on top of Wikidata. In 2016, Google closed down Freebase and helped with the migration of the data to Wikidata.

In 2018, Wikidata was extended to also be able to cover lexicographical data, in order to provide a machine-readable dictionary.

Reaching the 2020s, however, the WDQS has started to become a bottleneck, as the growth of Wikidata has outpaced the development of Open Source triplestores.

The newest wave of expansion is the Wikibase Ecosystem, where groups and organizations outside Wikimedia use the underlying software that powers Wikidata to run their own knowledge graphs, which are often highly interconnected with Wikidata and other Wikibase instances.

Two notable Wikimedia projects under current development are Wikifunctions and Abstract Wikipedia. Abstract Wikipedia is working towards extending knowledge representation beyond Wikidata such that one can abstractly capture the contents and structures of Wikipedia articles. Building on top of the data and lexicographic knowledge in Wikidata, the abstract representation will then be used to generate encyclopedic content in many more languages. Wikifunctions in turn is envisioned as a wiki-based repository of executable functions, described in community-curated source code. These functions will be used to access and transform data in Wikidata, in order to generate views on the data.

References

  • Denny Vrandečić, Lydia Pintscher, Markus Krötzsch (2023) "Wikidata : The Making Of" Companion Proceedings of the ACM Web Conference 2023