Terminology used in sessions#

Text analysis glossery#

DH terms#

Specific terms#

API#

  • What’s an API?

“An API, or Application Programming Interface, is, in general, a defined way for different parts of software systems to talk to each other. A software system is made of many individual parts that must work together to respond to commands or requests. When humans need to talk to each other, we try to use a language that all parties involved can understand, and that language has rules and guidelines that govern how we form words and sentences with particular meanings. Similarly, APIs are built with rules that must be followed for successful interactions.” Getting Data for Digital Humanities with APIs: A Gentle Introduction

Github & Github repositories#

Jupyter Notebooks#

Web Scraping#

  • “Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.” wikipedia

  • General rules: You can probably scrape any data that is publicly available and not copyrighted. Commercial use of scraped data may be limited, and do not scrape sites that require authentication (e.g. library subscription databases)

  • Check for an API before scraping

    • See: Access to some APIs is through a publicly available URL, such as The HathiTrust Bibliographic API. This API provides programmatic access to bibliographic metadata for volumes in the HathiTrust.The metadata is retrieved using a specially formatted URL and the volume ID number.

See:

Wikipedia#

  • When getting started with computational methods, looking up terms (such as Text Mining) on Wikipedia is helpful.