The Art Markets of Seville Database. Compiles historical archival sources from Sevillian archives on painting, sculpture and other crafts, using NLP to mine them for information such as actors, locations, dates, objects and money amounts. In this repository, I keep resources and code employed in the creation of the database.
Background
During the 16th and 17th centuries, Seville, Spain was the locus of the world’s largest trade flows. From 1503 to 1717, it housed the Casa de Contratación, the institution that centralized trade between Europe and the Spanish colonies in the Americas. As a result, the city ballooned in wealth and population; with the city’s specialization in trade, Sevillian artists started exporting their works abroad.
Uses
This github repository is meant as a standalone project, but also as a resource for those conducting similar Digital Humanities initiatives. If you are working on computational ways of extracting information from archival documentation in historical languages through NER, this provides a model and resources that can be tailored to your needs. It provides some usable resources for documents in early modern Spanish, created for a database comprised primarily of notarial and parish documents from the city of Seville.
Data Source
This database includes information taken from 20 volumes published throughout the 19th and 20th centuries. These books compiled documents from several Sevillian archives on the activities of various local painters, sculptors, gilders, stonemasons, and architects, among other less common occupations. The books were OCR scanned, corrected for mistakes, and then divided into texts using OpenRefine.
Texts are stored as individual records within the database and usually (though not always) refer to a single archival document, either in transcription or summarized form. These texts are often accompanied by footnotes and comments that are included as an attribute of the text in the database. Where possible, we have included the archival reference to the original source, to the extent provided by the researchers that edited the published volumes.
Data Gathering
The database is not only a repository of documents, but meant as a repository of the information these documents contain. Different tables have been developed to register the actors, locations, objects, dates and money amounts present in each document, as well as the attributes of the document itself (archival reference, bibliographic source, footnotes and comments). Information on entities was extracted using the Named Entity Recognizer made available by Spacy (the medium Spanish model, es_core_news_ml). This model was retrained on a set of training data, improving the model to work more efficiently for our data. This training data was tagged manually on DataTurks.com.
Basic Facts
GitHub Repository
This Github repository is meant to record the process of development of the database, including code used, training sets, output and further resources.
Index
Document Viewer A tool to search for strings within documents contained in the database.
Research Completeness An overview of the texts included in the database, by published source and original archival source.
Tags : Summary Visualizations Summary visualizations to describe the strings identified as entities (people, locations, organizations, dates, monetary amounts and objects).
NER Resources Resources for replicating, learning from or expanding on the NER employed in this project.
Proprietary Notice
This project was developed by Felipe Álvarez de Toledo as part of a Ph.D. dissertation in the department of Art, Art History and Visual Studies at Duke University and as part of DALMI, the Duke, Art, Law and Markets Initiative.
License
This project is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This license applies to the database and its contents and the resources made available in this repository. It does not apply to the texts included in the database themselves, which were taken from published sources.