Terminology used in sessions#
Text analysis glossery#
DH terms#
Specific terms#
API#
What’s an API?
“An API, or Application Programming Interface, is, in general, a defined way for different parts of software systems to talk to each other. A software system is made of many individual parts that must work together to respond to commands or requests. When humans need to talk to each other, we try to use a language that all parties involved can understand, and that language has rules and guidelines that govern how we form words and sentences with particular meanings. Similarly, APIs are built with rules that must be followed for successful interactions.” Getting Data for Digital Humanities with APIs: A Gentle Introduction
Github & Github repositories#
GitHub is a web-based interface that uses Git (open source version control software that lets multiple people make separate changes to web pages at the same time). It allows for version control for files used in collaborative projects, such as workshops.
Jupyter Notebooks#
Jupyter notebooks are documents that contain both computer code (like Python or R) alongside explanatory images, figures, videos, and links. Since they can contain explanatory text and executable code, they are commonly used for introductory workshops and sessions that involve code.
Web Scraping#
“Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.” wikipedia
General rules: You can probably scrape any data that is publicly available and not copyrighted. Commercial use of scraped data may be limited, and do not scrape sites that require authentication (e.g. library subscription databases)
Check for an API before scraping
See: Access to some APIs is through a publicly available URL, such as The HathiTrust Bibliographic API. This API provides programmatic access to bibliographic metadata for volumes in the HathiTrust.The metadata is retrieved using a specially formatted URL and the volume ID number.
See:
Wikipedia#
When getting started with computational methods, looking up terms (such as Text Mining) on Wikipedia is helpful.