Coding#
Understanding computing#
This session is to familiarize you wth important foundational concepts and to give you a framework for understanding the types of computational methods you can use on a projects, including: minimal computing, tool choice or coding. Finally, this session aims to give you language to conceptualize what is possible and to communicate more effectively with partners.
Understanding the affordances and limitations of a computational approach for your project will help you better imagine, plan manage your project. Even if you are not directly involved in the coding aspect, this will give you the tools to better collaborate with those who will.
Computers & The Internet#
What is a computer?#
How do you make the computer do stuff? What is a GUI?
Refer back to the optional viewing from the introduction:
The following links will introduce terms such as input, output, storage, CPU, hardware, software, bits, circuits, and the operating system, as well as wired, cables, WiFi, packets, DNS, IP addresses, packets and routing, HTTP and HTML, encryption, public keys, and how search works.
Code.org. Introducing How Computers Work.YouTube [Watch all 6 short videos.]
What Is the Internet? YouTube [Watch all 8 short videos.]
What computational skills are necessary for your goals?#
Keep in mind: What coding language do I need to learn or do I need to learn to code is not the right question, the question is how much do I need to learn for my specific goal?
You may have multifaceted goals, some are learning related and some are productivity related.
They may not be mutually exclusive, but you will likely still need to make choices based on your circumstances.
Computing Environment#
“The computing environment involves the collection of computer machinary, data storage devices, work stations, software applications, and networks that support the processing and exchange of electronic information…” -Computing Environment
What does this all mean?
What is your environment?#
How can you interact with your environment (local, virtual, cloud)?
There are different affordances and limitation in each environment, you will make different choices depending on the needs of you project or the needs of your classroom.
Below we go further into depth about the difference and then the reasons we made the choice we did for the Python session.
Local environments#
Your laptop or desktop or tablet, etc. is your local environment.
The applications on your device can access the resources in your machine. Each local environment becomes different with use.
The kernel connects the application software to the hardware of a computer.
Local installations give you more control, and more power, but the pedagogical tradeoff is that it is more difficult to manage and configure during class. Installation is dependent on type of device.
The process of installing and learning how to work on your computer encourages more active troubleshooting as well, which is a useful long-term skill.
Installing software#
Requires Administrative permissions
Learning to install software
Learning how to install software is an important part of the process of gaining computational literacy and learning how the programs we use work.
Virtual environments#
A virtual environment is a digital instance of a local computing environment that can perform almost all the same functions as that local machine, “including running applications and operating systems. Virtual machines run on a physical machines”… using specialized software or internet browsers.
Virtual environments have high variability.
They can run a single piece of software or OS with suites of software. When you are considering using a virtual envorinment, in addition to what it is capable of doing, you should consider the following: portability (ease of set up), reproducibility (its’ capabilities and consistency of experience) and isolation (if you working in shared envorinment and your work needs to be sequestered.
The advantages of containers for for classroom projects are:
you want all students to be using the same environment, without worrying about different operating systems (OS) and individual installations
Portability
You may not have access to an HPC
There are additional types of computing environments listed here
The “Cloud”#
“Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user.Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user.”
Cloud based systems may be expensive, it may be resources intensive, so you may choose the path of Minimal Computing: “We use “minimal computing” to refer to computing done under some set of significant constraints of hardware, software, education, network capacity, power, or other factors. Minimal computing includes both the maintenance, refurbishing, and use of machines to do DH work out of necessity along with the use of new streamlined computing hardware like the Raspberry Pi or the Arduino micro controller to do DH work by choice. This dichotomy of choice vs. necessity focuses attention on computing that is decidedly not high-performance.”
Using the resource tha matches your needs can help you minimize costs and environmental impact.
How do you interact with your computer?#
Most of us are used to a graphical user interface GUI but the command line allows you more control.
Command Line#
What is the command line and why is it like this?
The command line is a text-based way of interacting with your computer. Working in command line helps you make a mental model of how you environment is layed out. This environment is the result of a series of choices, made by humans.
You may hear it called different names, such as the terminal, the shell, or bash. In practice, you can use these terms interchangeably. (If you’re curious, though, you can read more about them here. The shell we use (whether terminal, shell, or bash) is a program that accepts commands as text input and converts commands into appropriate operating system functions.
And yes, “the command line” is also laden with masculine and military metaphors, which is reflective of the history of computing and programming.
As Wendy Hui Kyong Chun discusses in “On Software, or the Persistence of Visual Knowledge,” (2004) almost all computers (as in human comput-ers) in the US during World War II were young women. Human computers received commands from analysts — predominantly men with the military — that they then had to interpret and act upon the machine. As Chun argues, “computation depends on ‘yes, sir’ in response to short declarative sentences and imperatives that are in essence commands … The command line is a mere operating system (OS) simulation” (page 34). The command line (of computers today) receives these commands as text that is typed in.
Why is the command line useful?#
Initially, for some of us, the command line can feel a bit unfamiliar. Why step away from a GUI point-and-click workflow? By using the command line, we move into an environment where we have more minute control over each task we’d like the computer to perform. Instead of ordering your food in a restaurant, you’re stepping into the kitchen. It’s more work, but there are also more possibilities.”
The command line allows you to…
Easily automate tasks such as creating, copying, and converting files.
Set up your programming environment.
Run programs you create.
Access the (many) programs and utilities that do not have graphical equivalents.
Control other computers remotely.
In addition to being a useful tool in itself, the command line gives you access to a second set of programs and utilities and is a complement to learning programming.
Wring a script or program (programming!) allows you to automate a series of repetitive tasks.
What if all these cool possibilities seem a bit abstract to you right now? That’s all right! On a very basic level, most uses of the command line are about showing information that the computer has, or modifying or making things (files, programs, etc.) on the computer.
Introduction to the command line#
By this point in our academic careers, most of us have figured out some ways we like to interact with computers. Whether that involves avoiding them as much as possible or constantly testing new software, we likely have some ideas about how we feel comfortable getting things done. How would you show a person who had never seen a computer, say Kimmy Schmidt or Brendan Fraser in Blast from the Past, how to do something on your computer?
Many of us would explain what a screen and a cursor are, and then show how to point and click on icons. This approach relies on the graphical user interface, or GUI (pronounced “gooey!”).
Another way to make your computer do things: through the command line. Instead of pointing and clicking, we’ll be typing in either Git bash (Windows) or terminal (OSX) to tell the computer directly what task we’d like it to perform.
Here is an external command line tutorial if you wish to learn more.
Coding#
Why teach coding? “…any instructor—-in the humanities or otherwise-—must first ask herself what she hopes her students will accomplish by learning to code. Is it an understanding of how to think algorithmically, so as to better comprehend how certain tasks can be abstracted into a series of steps? Is it a familiarity with the basic components of programming languages, so as to be able to understand how code is structured and produced? Is it the knowledge of a specialized programming language, one with specific applications in a particular field? Or is it the more experiential knowledge of what it feels like to move from defining functions and assigning variables to running executable code?” Digital Pedagogy in the Humanities: Concepts, Models, and Experiments: Code by Lauren Klein
Do you need to learn code?
You don’t need to be become fluent if it’s not the focus of your interest, but it is helpful to have reading fluency, like any other language that is an important part of your research. Also, like any other language, use will help you retain and gain knowledge.
Learning some coding will help you see what is technically feasible#
“I started to notice that the way people talk about technology is out of sync with what digital technology actually can do. Ultimately, everything we do with computers comes down to math, and there are fundamental limits to what we can (and should) do with it.” Hello Reader
What is coding? Is it the same as programming?#
“Put simply, programming is giving a set of instructions to a computer to execute. …While sometimes used interchangeably, programming and coding actually have different definitions.”
“Programming is the mental process of thinking up instructions to give to a machine (like a computer).”
We have also referred to this as computational thinking.
“If you’ve ever cooked using a recipe before, you can think of yourself as the computer and the recipe’s author as a programmer. The recipe author provides you with a set of instructions which you read and then follow. The more complex the instructions, the more complex the result!”
“Coding is the process of transforming those ideas into a written language that a computer can understand.”
Coding would be taking that recipe and laying out step by step what needs to be done,with no assumption of specific knowledge. Without coding, the program (recipe) cannot be run by the computer.
We will be learning to code in Python for this Institute.
Hello World#
To better understand how computers make sense of the world, please read: Chapter 2 Hello World
We will return to concepts brought up in this chapter over the next few days.
She talks about the three ways to do something (write Hello World): asking a person to do it, using a tool (Word or some other word processor) to do it, and using coding to do it.
For your “hello world” which is your project, which of the three choices makes the most sense for you?
What are the affordances and limitations of doing computational analysis for your humanities questions?
How do you interpret the statements that “data is socially constructed” and “Ultimately, data always comes down to people counting things”?
Computers are literal: Can you describe a time in which a computer. tool or program behaved in way that was confusing to you and after reading this article, do you have explanation as to why?
What can computers actually do?
“The gap between what we imagine and what computers can actually do is really vast… Often, we talk about computers as being able to do anything, and that’s just rhetoric because ultimately they’re machines, and what they do is they compute, they calculate, and so anything you can turn into math, a computer can do.” Interview with Meredith Broussard
Can you think of something that a human is better at doing then a computer?
Text#
We discussed moving from GUI to a text based commands.
Because we are Digital humanists, text can be code, and text can be data.
When does text become code?
For those of us comfortable reading and writing, the idea of “text-based” in the context of computers can seem a bit strange. As we start to get comfortable typing commands to the computer, it’s important to distinguish “text” from word processed, desktop publishing (think Microsoft Word or Google Docs) in which we use software that displays what we want to produce without showing us the code the computer is reading to render the formatting. Plain text has the advantage of being manipulable in different contexts.
Let’s take a quick moment to discuss text and text editors.
What is text?#
We want to give a general sense of this “text” we keep mentioning. Theire is what ‘text’ means in your discipline” there is text as ‘data’, and there is the ‘plain text’ you use to communicate to your computer using commands and scripts.
For those of us in the humanities, whether we follow literary theorists who read any object as a “text,” or we dive into philology, paleography, codicology or any of the fields David Greetham lays out in Textual Scholarship, “text” has its specific meanings.
As scholars working with computers, we need to be aware of the ways plain text and formatted text differ. Words on a screen may have hidden formatting. Many of us grew up using Microsoft Word and don’t realize how much is going on behind the words shown on the screen. For the purposes of communicating with the computer and for easier movement between different programs, we need to use text without hidden formatting.
If you ask the command line to read that file, this Word .docx file will look something like this
Word documents which look like “just words!” are actually comprised of an archive of extensible markup language (XML) instructions that only Microsoft Word can read. Plain text files can be opened in a number of different editors and can be read within the command line.
Plain text#
For the purposes of communicating with machines and between machines, we need characters to be as flexible as possible. Plain text includes characters of readable material but not graphical representation.
According to the Unicode Standard, “Plain text is a pure sequence of character codes; plain Unicode-encoded text is therefore a sequence of Unicode character codes.”
Plain text has a few main properties:
“plain text is the underlying content stream to which formatting can be applied. Plain text is public, standardized, and universally readable.”
Plain text shows its cards — if it’s marked up, the markup will be human readable. Plain text can be moved between programs more fluidly and can respond to programmatic manipulations. Because it is not tied to a particular font or color or placement, plain text can be styled externally.
A counterpoint to plain text is rich text (sometimes denoted by the Microsoft rich text format “.rtf” file extension) or “enriched text” (sometimes seen as an option in email programs). In rich text files, plain text is elaborated with formatting specific to the program in which they are made.
Note: Software like Microsoft Word or Excel add formatting (and can sometimes changes made by the auto-formatting can introduce errors). “Excel errors happen all the time, simply because the software is often the first thing to hand when scientists process numerical data. Scientists rename human genes to stop Microsoft Excel from misreading them as dates.
Text editors#
An important tool for programming and working in the command line is a text editor. A text editor is a program that allows you to edit plain text files, such as .txt, .csv, or .md. Text editors are not used to edit rich text documents, such as .docx or .rtf, and rich text editors should not be used to edit plain text files. This is because rich text editors will add many invisible special characters that will prevent programs from running and configuration files from being read correctly.
While it doesn’t really matter which text editor you choose, you should try to become comfortable with at least one text editor. Choosing a text editor has as much to do with personality as it does with functionality. Graphical user interfaces (GUIs), user options, and “hackability” vary from program to program.
Editors vs. IDEs#
When it comes to editing text and writing code, you can use either a text editor or an IDE (Integrated Development Environment). Text editors tend to be more lightweight solutions, while IDEs try to provide a lot of features to help you write code and tend to target specific languages. There are a lot of exceptions to that description, but the distinction isn’t that important. Just know that editors will sometimes describe themselves as IDEs, and that there’s a slight difference in philosophy between them.
Note about Special Characters#
“Special characters include characters that are not found on a standard English-language keyboard or that are not one of the 128 characters of the US-ASCII character code set. Examples include characters with diacritics and special symbols, such as the copyright sign or an ampersand. How these characters are represented varies in HTML and XML.” UNL Center for Digital Research in the Humanities
More information on: Characters, Glyphs, and Writing Modes.
“TEI tags describe the characteristics of a given text. For example, TEI tags may be used to indicate paragraph and line breaks, pagination, and major divisions of a text such as volumes, chapters, and sections. In addition, tags may be placed around typographical characteristics such as text that is underlined, italicized, superscripted, etc., and around text that needs special emphasis such as foreign words, misspellings, proper names, etc.” UNL Center for Digital Research in the Humanities
-
“Why do so many book titles have commas in them?!” Face screaming in fear- A lament brought to you by someone who wrote titles into a CSV, didn’t quote values, and now has regrets.”- @quinnanya
What does this all mean? Why are we telling you all this?#
This is useful contextual information for you if you choose to go forward with learning programming. This is also set up for our Python session.
Jupyter Notebooks#
If you are unfamiliar with Jupyter Notebooks, take a look at one of these introductory lessons.
Getting Started with Jupyter Notebooks - ITHAKA Constellate
Description: This lesson introduces Jupyter notebooks and Python for absolute beginners. If you are completely new to text analysis, this is the place to start.
When doing an introductory coding workshop, instructors choose between using the local environment, virtual environment a cloud-based environment. There are good pedagogical reasons for each choice. Cloud-based Versus Local-Based Web Development Education: An Experimental Study in Learning Experience.
Humanities HPC#
Now that we have discussing computing, what are some of the challenges in coding for humanities projects?
What do we mean by computing in the humanities?
What can be quantified?
“It’s not surprising, then, that some of the first humanities projects were indexes and concordances, since the location of a word could be given a numerical value… In 1966, the first issue of Computers and the Humanities was published by Queens College, with support from IBM and the United States Steel Foundation.”
Complicating “Great Man” narrative of digital history#
Edward Ayers, Stephen Brier, Joshua Brown, Daniel Cohen, Roy Rosenzweig, William Thomas
“These are the names that spring to mind when many people think of the individuals who pioneered the theories and methods of digital history. The oft-repeated narratives about the origins of the field are almost totally devoid of women.” -Returning Women to the History of Digital History
What do we mean by “Humanities High-Performance Computing”?#
The term “high performance computing” (HPC) is often used interchangeably with “supercomputing.” It refers to very fast computers, capable of performing calculations many times faster than standard desktop machines. High Performance Computing is used mainly by scientific disciplines for processing huge amounts of data, data mining, and simulation.
Humanities High-Performance Computing (HHPC) refers to the use of high-performance machines for humanities and social science projects. Humanities scholars often deal with large sets of unstructured data. This might take the form of historical newspapers, books, election data, archaeological fragments, audio or video contents, or a host of others.
Complicating computational humanities research#
What is elided in the process of computational work when using the HPC/HPRC?
“…valorizations of “the cloud” take it as an unquestioned good that researchers will be able to turn on the spigot without talking to a plumber — the point is that researchers should just be able to get on with their work and scale it in a relatively unregulated way. It is important to ask why that is held up as a necessary good” - Digital Humanities Application Development in the Cloud
Programming languages#
Which language should you learn?
Just as choosing which language you will use for your foreign language requirement is based on your research question, there are multiple programming languages you can choose to learn for digital humanities. R and Python are the most common.
How much of the language do you need to learn?
Does your project require basic fluency or only reading comprehension?
Python#
R#
R is a programming language and free software environment for statistical computing and graphics.
Learning R fundamentals is a gateway to analyzing data, creating visualizations, composing interactive websites, scraping the Internet, and engaging in distant reading of texts.
R vs. Python#
Comparing Python vs R Objectively.: “where Python is more object-oriented, and R is more functional.”
R vs. Python: Compares the two in action.
Should you learn coding or just use a tool?#
The answer to this depends on your immediate and long term goals.
What are you trying to do? What are the affordances or limitations of each approach?
Learning programming is like learning carpentry as it is a whole suite of skills.
However, if you just need a specific piece of furniture, then learning carpentry (coding) may be overkill.
Instead, you may just buy something that already exists (get an already existing program/tool like Omeka or Arc-GIS). If your need to common, there may already be a tool.
For example, many people want a tool to do word precessing, there are many various tools that help wth that task.
If you sill require a custom solution you can hire a carpenter (hire a developer/programer) to either modify an already existing tool to fit your specfic parameters or make something custom to fit your needs.
Some questions to consider:#
Do you see yourself using this skill in multiple contexts?
Do you have the time and interest to invest the time to learn the required skills?
Have you done a search (or conducted an environmental scan) on your topic and goals? Imagine the tool that you wished existed and search to see it exists.
Contact your librarians for help with this.
Attribution#
Session Leaders: Rafia Mirza
Written by Rafia Mirza.
Our curriculum is based on the Digital Research Institute (DRI) Curriculum by Graduate Center Digital Initiatives. It is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. When sharing this material or derivative works, preserve this paragraph, changing only the title of the derivative work, or provide comparable attribution.