Glossary
Like any specific domain, the way we talk about computing and programming is almost its own language. Words in this context may have different meaning than in other contexts. As programmers ourselves, we are so used to using words in the context of programming that we sometimes forget others aren't used to it. This is one of the biggest roadblocks to teaching and learning.
Here we have compiled a long list of computer and programming related terms in (hopefully) plain language as a resource for anyone learning about programming to look up if they don't understand a term that they heard or read.
Please feel free to suggest additions or edits.
General computing terms
* These terms are used somewhat interchangeably colloquially
Term | Definition |
---|---|
Hardware | The physical components of the computer (e.g. the hard drive, motherboard, monitor, etc.) |
Software | The programs stored on the computer that the user interacts with. |
Operating system (OS) | The software that interfaces between the computer's hardware and other user facing software. Common operating systems for desktop computers are Windows, macOS, and Linux. Common operating systems for mobile devices are iOS and Android. |
Unix | An early operating system that many modern 'Unix-like OS's, such as Linux and macOS, are based upon. |
Program | A collection of code designed for the user to perform a specific task (e.g. write a document, analyze some data, play a game, send a message). Perhaps also called an application or app. |
Command | An instruction given to the computer to perform an operation (such as opening a file, adding a row to a data table, etc.). The way a command is given to the computer is often dictated by the programming language in which it is to be interpreted (known as that language's syntax). |
Argument | Options specified in the command line when running a program or command. |
Script | A set of commands written in a particular programming language's syntax that are stored as plain text in a file. When interpreted by that programming language, the commands will be executed in order. |
Terminal* | The window in which you type commands in to be interpreted by the shell. Also known as a command line interface. |
Console* | Similar to terminal, but full screen with no graphical component, only text. |
Command line* | The text interface within a terminal where commands are typed. |
Command prompt* | The information displayed on the command line before the cursor. |
Shell* | The program that interprets commands typed into a terminal. In your (Unix-like) terminal you can type echo $SHELL to check which shell is loaded. It can also execute a shell script, which is a collection of commands in a text file. |
Bash | A common shell program for Unix-like systems. It is the default shell for most Linux distributions. |
Environment variable | A piece of information that is stored in the shell and can be accessed by commands run in it. |
PATH |
An environment variable that lists directories where the shell looks for programs to run. |
Library | Files containing general code blocks that can be used widely by different programs. |
Dependency | A program or library that is required for another program to run. |
Package | A bundle of software containing programs and their accompanying libraries and dependencies, often with scripts for installation. |
Module | Similar to library. |
File system | The way in which files and directories are organized in a nesting, tree-like structure. |
Directory | A named location on a computer that contains files and/or other directories. |
File | A named location on a computer that contains data, commonly in the form of plain text. |
User | Every person that uses a computer has an account on that computer and is then called a user. |
Group | User accounts may be placed into groups (e.g. a lab group, or other working group) so that relevant files may be easily accessed by anyone in that group |
Permissions | User accounts on a computer may have different permissions set for them that dictate which files they can read, write to, and execute (i.e. run like a command). You may check your permissions on a file or directory with ls -l , which returns a permissions string. The owner of a particular file can change its permissions with chmod (see previous link). |
Root | The lowest level in the file system in which all directories and files are stored. Critical system files are stored close to the root. Usually located at / and usually only accessible by the computer's administrators. Root may also refer to the user account with the highest permissions on Unix-like systems. |
Home | A user's home directory is where that user has read, write, and execute permissions within the file system. |
Path | The location of a file or directory within the file system, with directories separated by slash characters (/ ). |
Absolute path | The full name of a file or directory that includes all directories and sub-directories starting from the root of the file system to the specified file or directory. |
Relative path | The name of a file or directory that includes all directories and sub-directories starting from the user's current location. |
General programming terms
Term | Definition | Python example | R example |
---|---|---|---|
Programming language | A programming language is a consistent set of rules for how instructions should be typed so the computer understands them and can execute them logically. These rules are used to create software. Programming languages can be broadly categorized as compiled or interpreted based on how their instructions are executed. | Python is an intepreted language | R is an interpreted language |
Compiled language | A programming language that is translated into machine code (compiled) before the program is run. This machine code is specific to the type of computer the program will run on. Examples: C, C++, Java | NA | NA |
Interpreted language | A programming language that is executed line by line by another program called an interpreter. The interpreter reads the code and executes it directly. Examples: Perl, JavaScript, R | Python is an interpreted language | R is an interpreted language |
Syntax | The way a particular programming language expects its code to be typed, following the rules defined by that language. | NA | NA |
Comment | A piece of text in a program that is not executed when the program is run but is used to convey information to the human reading the code. Comments are often used to explain the purpose of the code, how it works, or to leave notes for future developers. | In Python, comments are denoted by # |
In R, comments are denoted by # |
Object-oriented programming (OOP)* | A programming paradigm based on the concept of objects, which can contain data, in the form of attributes, and code, in the form of methods. OOP focuses on reusability and modularity in programming. | Python is an object-oriented language, enabling the creation of classes | R, while primarily a functional programming language, supports OOP |
Functional programming | A programming paradigm that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data. | Python supports functional programming features | R is a functional programming language |
Programming constructs
Term | Definition | Python example | R example |
---|---|---|---|
Keyword | A reserved word in a programming language that has a specific meaning and purpose. Keywords cannot be used as variable names. | if , else , for |
if , else , for |
Conditional | Keywords that allows for different actions to be taken based on whether a specific condition is met. Typically in the form of if-else statements using the if , else if (or type some similar way), and else . |
if , elif , else |
if , else if , else |
Logical expression | An expression that evaluates to either True or False . Logical expressions are used in conditional statements to determine the flow of a program. |
5 > 3 returns True |
5 > 3 returns TRUE |
Loop | Keywords that repeats a sequence of instructions until a specific condition is met. Typically while and for loops are used. A while loop will continue to execute as long as a condition is met, and a for loop will execute a specific number of times. |
for , while |
for , while |
Loop variable | A variable that acts as the counter or iterator in a loop, determining the number of iterations or elements accessed. Typically discussed for for loops, this is a variable that is used to control the number of times the loop is executed or to access elements in an iterable data structure. It is defined as the for loop is called. |
In for i in range(10): , i is the loop variable |
In for (i in 1:10) , i is the loop variable |
Block of code | In general, this could refer to any lines of code one is referring to. However, by definition it is a section of code that is grouped together, often within curly braces {} . Blocks of code are often used to define the scope of variables or to group together related operations. |
In Python, blocks are defined by indentation | In R, blocks are defined by curly braces {} |
Indentation | The use of whitespace at the beginning of lines to define the structure of a program's code, which is syntactically significant in languages like Python. | Code blocks are defined using indentation. | Indentation improves readability but is not syntactically significant. |
Data representation
Term | Definition | Python example | R example |
---|---|---|---|
Literal | A literal is a notation for representing a fixed value in code. For example, 5 is a literal for the integer 5, and 'hello' is a literal for the string 'hello'. |
5 is a literal for the integer 5 |
5 is a literal for the integer 5 |
Variable | The name of a piece of information stored into memory in a computer program. This name can be referred to later in the program and be used to access the information. | In x = 5 , x is the variable |
In x <- 5 , x is the variable |
Assignment | The process of storing a value into a variable. In many programming languages, the assignment operator is = . |
In x = 5 , x is assigned the value 5 |
In x <- 5 , x is assigned the value 5 |
Hard coding | The practice of directly inserting fixed values into the code, rather than using variables. Hard coding can make code harder to maintain and reuse. | In total = 5 + 3 , 5 and 3 are hard-coded values |
In total <- 5 + 3 , 5 and 3 are hard-coded values |
Data type | The way data is encoded on the computer. Different operations can be performed on different data types. | integer, string | numeric, character |
Type casting | The process of converting data from one data type to another. This is often necessary when performing operations on data of different types. | In Python, str(5) converts the integer 5 to the string '5' |
In R, as.character(5) converts the integer 5 to the string '5' |
Boolean | A universal data type that can only have one of two values, usually TRUE or FALSE . Booleans are often used in programming to make and evaluate complex logical statements. |
True or False |
TRUE or FALSE |
Data structure | The way data is organized within a programming language. Different operations can be performed on different data structures. Many data structures are iterable | list, dictionary | vector, data.frame |
Class | A defined way to construct objects that have particular data (attributes) or methods (functions) associated with them. | Most objects in Python are classes: string, list, dictionary | Most objects in R are classes: data.frame, matrix |
Attribute | Data or properties attached to a particular instance of a class. | car.make accesses the make attribute of a car object |
attr(my_data_frame) retrieves the attributes of the my_data_frame object |
Object | An instance of a class. Colloquially, this term may be used to mean any piece of data (variable) in your program, whether it represents a class or not. | In Python, my_list = [1, 2, 3] creates an object of the class list |
In R, my_vec <- c(1, 2, 3) creates an object of the class numeric |
Instance | A specific occurrence of an object within a class. | In Python, my_list = [1, 2, 3] creates an instance of the list class |
In R, my_vec <- c(1, 2, 3) creates an instance of the numeric class |
Immutable | An object whose state cannot be modified after it is created. | string, tuple | Most objects in R are mutable |
Iterable | An object that consists of a sequence of other objects and that can be looped over usually with a for loop. |
string, list, dictionary | vector, list |
Element | A single piece of data within a data structure. Most commonly used in the context of iterable data structures. | In my_list = [1, 2, 3] , 1 is an element of the list |
In my_vec <- c(1, 2, 3) , 1 is an element of the vector |
Index | A number that represents the position of an element within an iterable data structure. In many programming languages, the first index of an iterable is 0 . |
In my_list = [1, 2, 3] , 0 is the index of 1 |
In my_vec <- c(1, 2, 3) , 1 is the element at index 1 |
Indexing | The process of selecting a specific element from an iterable data structure using its index. | In my_list = [1, 2, 3] , my_list[0] returns 1 |
In my_vec <- c(1, 2, 3) , my_vec[1] returns 1 |
Slicing | The process of selecting a subset of elements from an iterable data structure using their indices. | In my_list = [1, 2, 3] , my_list[0:2] returns [1, 2] |
In my_vec <- c(1, 4, 5) , my_vec[2:3] returns [4, 5] |
Splicing | The process of removing or inserting elements from or into an iterable data structure, often altering its size and content. | In my_list = [1, 2, 3] , my_list[1:2] = [4, 5] changes my_list to [1, 4, 5, 3] |
No built-in splicing method, objects must be manipulated via indexing. |
Functions
Term | Definition | Python example | R example |
---|---|---|---|
Function | A generalized chunk of code that can be easily be called (run) by other code. Functions usually take input arguments and return output to the main program. | print() , len() |
mean() , sd() |
Method | A function that is associated with a class and can only be used on objects of that class. | .append() for lists |
summary() , plot() |
Argument | A value passed to a function or program when it is called. | len(x) , where x is an iterable object (e.g. string, list) |
mean(x) , where x is a vector of numbers |
Parameter | A variable that is used in a function to represent an argument that is passed to it. | In def my_func(x): , x is a parameter |
In my_func <- function(x) , x is a parameter |
Return value | The value that a function or program gives back after it is called. | len(x) returns the length of the iterable object x |
mean(x) returns the average of the vector x |
Library | A collection of functions that can be imported and used in other programs. | math , os |
dplyr , ggplot2 |
Operators
Term | Definition | Python example | R example |
---|---|---|---|
Operator | In programming, operators are special functions that are denoted by specific symbols or keywords and can be used to manipulate and compare information in the program. Sometimes the way an operator works depends on the data type being used with it. | + for addition of numbers, + for concatenation of strings, in for inclusion in a list |
+ for addition of numbers, == for equality |
Assignment operator | In many programming language, a variable can be assigned a value with the equals sign (= ); e.g. var_name = 1 assigns a variable named var_name the value of the integer 1 . |
= assigns a value to a variable |
<- assigns a value to a variable; = can also be used |
Arithmetic operators | Many programming language have basic arithmetic operators to manipulate numeric data types, like + for addition, - for subtraction, * for multiplication, and / for division. Note that in some programming languages, these symbols can be used as operators for other data types with different outcomes, for example in Python, the + operator can also concatenate (combine) two strings together. |
+ , - , * , / |
+ , - , * , / |
Update operator | Some programming languages have update operators that combine arithmetic operations with assignment. For example, in Python, += adds a value to a variable and assigns the result to the same variable name. This is equivalent to x = x + 1 . |
+= , -= , *= , /= |
NA |
Logical operators | Logical operators allow comparisons of CONDITIONS. Common logical operators are and (represented as && or & or and ) and or (represented as ||, |, or or ). These return a boolean TRUE or FALSE value. |
and , or , not |
& , \| , ! |
Comparative operators | Logical comparisons of numbers include equality (== ), greater than (> ), greater than or equal to (>= ), less than (< ), less than or equal to (<= ). These return a boolean TRUE or FALSE value. |
== , > , >= , < , <= |
== , > , >= , < , <= |
Inclusion operators | Some languages use another operator to check whether a certain value is included within a larger data structure, sometimes denoted as in . These operators return a boolean TRUE or FALSE value. |
in |
%in% |
Negation operator | For any comparison or logical operation, a negation can be made, represented in many programming languages as ! or not , e.g. !TRUE is equivalent to FALSE . |
! , not |
! |
Errors
Term | Definition | Python example | R example |
---|---|---|---|
Error | An error is a mistake in code that causes the program to stop executing. Errors can be syntax errors, logical errors, or runtime errors. | A SyntaxError occurs when the code is not written in the expected format |
A SyntaxError occurs when the code is not written in the expected format |
Syntax error | An error that occurs when the code is not written in the expected format of the programming language. Syntax errors are usually detected by the interpreter or compiler before the program is executed. | For example, missing a colon at the end of an if statement in Python |
For example, missing a comma in a function call with multipl arguments in R |
Logical error | An error that occurs when the code does not produce the expected output due to a mistake in the logic of the program. Logical errors are often difficult to detect because the program will still run, but the output will be incorrect. | Using the + operator instead of the - operator in a subtraction operation |
Using the + operator instead of the - operator in a subtraction operation |
Runtime error | An error that occurs when the program is running. Runtime errors can be caused by a variety of issues, such as dividing by zero, trying to access an index that does not exist, or running out of memory. | A ZeroDivisionError occurs when dividing by zero in Python |
A data.frame not found error in R |
Exception | An exception is an event that occurs during the execution of a program that disrupts the normal flow of instructions. When an error occurs, an exception is raised. In many programming languages, exceptions can be handled, allowing the program to continue running. | A ZeroDivisionError occurs when dividing by zero in Python |
A data.frame not found error in R |
Exception handling | The process of dealing with exceptions in a program. By using try and except blocks, a program can catch exceptions and handle them gracefully, preventing the program from crashing. | In Python, try: and except: blocks are used for exception handling |
In R, tryCatch() is used for exception handling |
Programming tools
Term | Definition | Python example | R example |
---|---|---|---|
Text editor | A graphical program on a computer that displays plain text files on the screen and allows them to be edited. Many text editors designed for coding (e.g. VSCode) have features like syntax highlighting, code folding, and automatic indentation. | VSCode | VSCode |
Integrated Development Environment (IDE) | A graphical program on a computer that contains tools to aid development in a specific programming language. These usually have an integrated text editor | Spyder | Rstudio |
Notebook | Text formats that allow for the interleaving of code blocks and formatted text in a single document. Notebooks can be run interactively, with code blocks executed in real time. | Jupyter | Rmarkdown |
Markdown | A syntax for formatting plain text and interleaving code blocks that can be run when the document is generated. | NA | NA |
Python terms
Note that while we give some examples of syntax, the format of these tables does not lend itself to exact typing, so please read further documentation if needed and for more information on Python's syntax.
Term | Definition | Example |
---|---|---|
Python | An open-source, high-level, interpreted programming language known for its simplicity and readability. | print('Hello, World!') |
Script | A text file with Python code written in it such that it can be read by the interpreter and executed in sequence. Usually these files end with .py |
script.py |
Module | A file containing Python code that can be imported and used in other Python scripts. Modules can define functions, classes, and variables. | math.py |
Library | A collection of module files, or possibly just another way to say to refer to a Python module. | numpy |
Standard Library | A collection of modules and packages that come with Python. These modules are always available and can be used in any Python script without needing to be installed separately. May also be referred to as a built-in library. | math , os , random |
Package | A collection of Python modules that can be installed and used together. Packages are typically distributed through the package manager PyPI. | pandas |
PyPI | The Python Package Index, where users can download and install packages used to extend Python's functionalities. | pip install numpy |
pip | The package installer for Python. It allows users to install packages from the Python Package Index (PyPI). | pip install numpy |
NumPy | A popular Python library for numerical computing that provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays. | import numpy as np |
pandas | A Python library that provides data structures and data analysis tools for working with structured data. It is built on top of NumPy and is well-suited for data manipulation and cleaning, among other tasks. | import pandas as pd |
Indentation | Python uses indentation to define blocks of code. Indentation must occur after any line ending with : . Indentation is typically four spaces or one tab. |
for i in range(10): print(i) |
* Note: While R is primarily a functional programming language and not inherently object-oriented, the subsequent tables use OOP terms and provide R examples because R can emulate OOP behavior.
Python data types
Data Type | Description | Example | Mutability | Iterability | Boolean False Value |
---|---|---|---|---|---|
Integers (int ) |
Whole numbers, both positive and negative. Typed plainly as just numbers and the hyphen to indicate negative numbers. | 42 , -7 , 0 |
Immutable | Not Iterable | 0 |
Floating point numbers (float ) |
Decimal numbers. Typed plainly as just numbers, the decimal point, and the hyphen to indicate negative numbers | 3.14 , -0.001 , 2.0 |
Immutable | Not Iterable | 0.0 |
Strings (str ) |
Text, stored as a sequence of any character on your keyboard. Strings must be enclosed within quotation marks, single or double. | 'hello' , '99' , 'a string' |
Immutable | Iterable | '' |
Booleans (bool ) |
Boolean values, commonly used in logical operations and control structures for making decisions in code. | True , False |
Immutable | Not Iterable | False |
NoneType | Represents the absence of a value. | None |
Immutable | Not Iterable | None |
Python data structures
Data Structure | Description | Example | Mutability | Iterability | Boolean False Value |
---|---|---|---|---|---|
List (list ) |
An ordered, mutable collection of items that can contain mixed data types. Defined with square brackets. Commas separate individual elements. | [1, 'hello', 3.14] |
Mutable | Iterable | [] |
Tuple (tuple ) |
An ordered, immutable collection of items that can contain mixed data types. Defined with parentheses. Commas separate individual elements. | (1, 'hello', 3.14) |
Immutable | Iterable | () |
Set (set ) |
An unordered collection of unique items. Defined with curly brackets. Commas separate individual elements. | {1, 2, 3}, {'apple', 'banana'} |
Mutable, though an immutable version exists: frozenset() |
Iterable | set() |
Dictionary (dict ) |
A collection of key-value pairs, where keys are unique. Defined with curly brackets, colons to indicate key-value pairs, and commas to separate individual key-value pairs. | {'name': 'Alice', 'age': 25} |
Mutable | Iterable, by key | {} |
Python operators
* See below the table for examples of update operator usage in Python.
Operator | Type | Strings | Integers/floats | Boolean |
---|---|---|---|---|
= |
Assignment | Assigns a string to a variable name | Assigns a number to a variable name | Assigns a boolean to a variable name |
+ |
Arithmetic | Concatenate (combine) strings | Add two numbers | NA |
+= |
Update* | Concatenate and assign | Add and assign | NA |
- |
Arithmetic | NA | Subtract two numbers | NA |
-= |
Update* | NA | Subtract and assign | NA |
* |
Arithmetic | Repeat a string N times, where N is an integer | Multiply two numbers | NA |
*= |
Update* | Repeat a string N times and assign, where N is an integer | Multiply and assign | NA |
/ |
Arithmetic | NA | Divide two numbers, returning a decimal | NA |
/= |
Update* | NA | Divide and assign | NA |
** |
Arithmetic | NA | Exponentiation (raise to the power of) | NA |
**= |
Update* | NA | Exponentiation and assign | NA |
// |
Arithmetic | NA | Divide two numbers, rounding down to return an integer | NA |
//= |
Update* | NA | Divide and assign, rounding down to return an integer | NA |
% |
Arithmetic | NA | Modulus: divide two numbers, returning the remainder as an integer | NA |
%= |
Update* | NA | Modulus and assign | NA |
in |
Inclusion | Checks if one string is contained in another, returning a boolean | NA | NA |
and |
Logical | NA | NA | Returns True if both conditions are True , otherwise False |
or |
Logical | NA | NA | Returns True if at least one condition is True , otherwise False |
not |
Logical | NA | NA | Negates a boolean (e.g. returns True if False is input and vice versa) |
== |
Comparison | Compares 2 strings and returns True if they are identical, False otherwise |
Compares two numbers and returns True if they are identical, False otherwise |
Compares two booleans and returns True if they are identical, False otherwise |
!= |
Comparison | Returns True if two strings are not identical, False otherwise |
Returns True if two numbers are not identical, False otherwise |
Returns True if two booleans are not identical, False otherwise |
< |
Comparison | NA | Returns True if the first number is less than the second, False otherwise |
Essentially treats True as 1 and False as 0, then compares the two numbers |
<= |
Comparison | NA | Returns True if the first number is less than or equal to the second, False otherwise |
Essentially treats True as 1 and False as 0, then compares the two numbers |
> |
Comparison | NA | Returns True if the first number is greater than the second, False otherwise |
Essentially treats True as 1 and False as 0, then compares the two numbers |
>= |
Comparison | NA | Returns True if the first number is greater than or equal to the second, False otherwise |
Essentially treats True as 1 and False as 0, then compares the two numbers |
* Update operators are shortcuts to re-assign a variable to a new value based on the old one. For example, in Python one could add 3 to a number stored in a variable as follows:
At which point the current value of my_variable
would print to the screen: 8
Or, as a shortcut, one could type:
And my_variable
would have the same value as above: 8
This works for the other arithmetic operators as well. See the table for all arithmetic and update operators.
R terms
Note that while we give some examples of syntax, the format of these tables does not lend itself to exact typing, so please read further documentation if needed and for more information on R's syntax.
Term | Definition | Example |
---|---|---|
R | An open-source, functional programming language with an emphasis on statistical analysis and data visualization. | x <- 5 |
Rstudio | An IDE built around the R programming language. | NA |
Environment | The set of scripts, packages, and data currently loaded into memory in RStudio. | NA |
Console | An interactive command line in RStudio that accepts one R command at a time to be executed when the ENTER key is pressed. | NA |
Package | A collection of functions available to be installed to perform specific tasks, typically distributed through the package manager CRAN. | install.packages('ggplot2') |
Library | A collection of functions that can be loaded into the current R session to extend R's functionalities. | library(tidyverse) |
CRAN | The primary repository for R packages, where users can download and install packages used to extend R's functionalities. | install.packages('ggplot2') |
tidyverse | The tidyverse is a collection of R packages designed for data science, offering tools for data manipulation, visualization, and analysis, all adhering to a consistent grammar and intuitive syntax that emphasizes readability and ease of use. | library(tidyverse) |
R data types
Note: While these individual data types are not iterable in R, vectors made up of any data type inherit that type (i.e. a vector of numerics is itself numeric in type) and are iterable (see below)
Data Type | Description | Example | Mutability | Iterability |
---|---|---|---|---|
Numeric | Represents numbers that can be integers or decimals. Plainly typed as numbers with or without decimal points, and hyphens to indicate negative numbers | 42 , 3.14 , -7 , 0 |
Immutable | Not Iterable |
Integer | Represents whole numbers, explicitly defined with L at the end of the number or by using the as.integer() function. |
42L , as.integer(42) |
Immutable | Not Iterable |
Character | Text or string data, defined within quotation marks, single or double. | 'hello' , '99' , 'a string' |
Immutable | Not Iterable |
Logical | Boolean values representing TRUE or FALSE . |
'TRUE', 'FALSE', ('T', 'F' as shorthand) | Immutable | Not Iterable |
Complex | Numbers with real and imaginary parts, defined with i or I at the end of the imaginary part. |
1+4i , 2+0i , 3+1i |
Immutable | Not Iterable |
nan | Indicator for missing or undefined data | NA |
Immutable | Not Iterable |
R data structures
Data Structure | Description | Example | Mutability | Iterability | Element Access |
---|---|---|---|---|---|
Vector | An ordered collection of elements of the same type, created using c() . If given mixed types, the data will be coerced into a single type (e.g. by converting numerics to characters). Vectors inherit the type of the data they contain. |
c(1, 2, 3) |
Mutable in practice | Iterable | vector[1] |
List | An ordered collection of elements that can contain different types. | list(1, 'a', TRUE, c(1, 2, 3), list(x=10, y=20)) |
Mutable | Iterable | list[[1]] accesses the first element by index, returning its value directly; list[1] accesses the first element by index and returns it as a sublist; list$name accesses an element by name. |
Matrix | A 2-D collection of elements of the same type, created using matrix() . |
matrix(1:4, nrow=2) |
Mutable | Iterable | matrix[<row number>, <column number>] |
Data Frame | A table-like structure with equal-length vectors of possibly different types. | data.frame(a=1:3, b=c('A', 'B', 'C')) |
Mutable | Iterable | dataframe$<column name> or dataframe[<row number>, '<column name>'] or dataframe[<row number>, <column number>] |
Factor | A data structure used to categorize data, with a fixed number of possible values called levels. Factors are created using the factor() function. |
factor(c('low', 'medium', 'high')) |
Immutable in terms of levels | Not Iterable | factor[1] accesses the firest level of the factor by index |
Level | A distinct value within a factor. | 'low' , 'medium' , 'high' |
Immutable | Not Iterable | factor[1] accesses the first level of the factor by index |
Tibble (from tidyverse) | A modern version of data frames with enhanced print method and better support for large datasets. | tibble(a = 1:3, b = c('A', 'B', 'C')) |
Mutable | Iterable | tibble$<column name> or tibble[<row number>, '<column name>'] or tibble[<row number>, <column number>] |
Array | A multi-dimensional generalization of matrices. | array(1:8, dim=c(2,2,2)) |
Mutable | Iterable | array[1, 1, 2] |
R operators
Note that R does not have update operators like Python does (see above).
Operator | Type | Name | Description | Example |
---|---|---|---|---|
<- or = |
Assignment | Assignment | Assignment operator used to assign values to variables. | x <- 5 |
! |
Logical | NOT | Used to negate a Boolean expression. | !TRUE returns FALSE |
& |
Logical | AND | Used to test multiple conditions. | TRUE & FALSE returns FALSE |
\| |
Logical | OR | Used to test if at least one condition is TRUE. | TRUE \| FALSE returns TRUE |
+ |
Arithmetic | Addition | Adds the two numbers together. | x + y |
- |
Arithmetic | Subtraction | Subtracts the number on the right from the number on the left." | x - y |
* |
Arithmetic | Multiplication | Multiplies the two numbers together. | x * y |
/ |
Arithmetic | Division | Divides the number on the left by the number on the right. | x / y |
^ |
Arithmetic | Exponentiation | Raises the number on the left to the power on the right. | x ^ y |
%% |
Arithmetic | Modulus/Modulo | Divides the number on the left by the number on the right and returns the remainder." | x %% y |
%/% |
Arithmetic | Integer division | Divides the number on the left by the number on the right and returns the result rounded down to the nearest whole number. | x %/% y |
%in% |
Inclusion | Inclusion | Used to test if an element is contained in a vector or list. Returns TRUE if the element on the left is in the data structure on the right and FALSE otherwise. |
1 %in% c(1, 2, 3) returns TRUE |
: |
Sequence | Sequence | Generates a vector of numbers between the number on the left of the : and the number on the right, inclusive of both numbers. |
1:5 returns `c(1, 2, 3, 4, 5) |
> |
Comparison | Greater than | Returns TRUE if the number on the left is larger than the number on the right, and FALSE otherwise. |
x > y |
< |
Comparison | Less than | Returns TRUE if the number on the left is smaller than the number on the right, and FALSE otherwise. |
x < y |
>= |
Comparison | Greater than or equal to" | Returns TRUE if the number on the left is larger than or equal to the number on the right, and FALSE otherwise. |
x >= y |
<= |
Comparison | Less than or equal to | Returns TRUE if the number on the left is smaller than or equal to the number on the right, and FALSE otherwise. |
x <= y |
== |
Comparison | Equality | Used to test if two values are equal. | x == y |
!= |
Comparison | Inequality | Returns TRUE if the values on both sides are not equal, and FALSE otherwise. |
x != y |
%>% (from tidyverse) |
Pipe | Pipe | Used to chain operations, improving code readability. | data %>% filter(x > 5) |
High performance computing (HPC) terms
For more information related to Harvard's cluster, see FASRC's documentation, particularly their page on running jobs.
They also provide a more extensive glossary for more term definitions.
Term | Definition |
---|---|
Server | A general term for a computer that allows others to connect to it via a network. In HPC, this is a computer setup to have users connect and work on it remotely, usually with more resources than personal computers to accommodate more resource intensive commands and multiple users. |
ssh | Secure Shell (SSH) is a protocol used to securely connect to a server remotely, enabling encrypted communications. Typically initiated from a terminal using ssh <username>@<server address> into a terminal, though there are many ways to connect to a server. |
Cluster | An interconnected collection of servers setup such that users can connect to one and specify high resource commands to run which are distributed to the others based on available resources. |
Node | One computer within a cluster. |
Login node | The node within the cluster which users connect to and interact with. Users can submit jobs from the login node, but they are not run there. |
Head node | The node within the cluster responsible for managing job scheduling and resource allocation; sometimes serves dual roles as a login node. |
Compute node | The node within the cluster which actually runs the jobs. |
Job | A submitted command or set of commands passed from the user to the job scheduler on a cluster. |
Job scheduler | A program that coordinates job submission for all users on the cluster. This program distributes jobs to compute nodes and allocates resources. Harvard uses the SLURM job scheduling program. |
Queue | A list of jobs waiting to be run on a cluster, where priority and scheduling determine the execution sequence. |
Partition | A subset of the cluster's resources that can be allocated to a job. Partitions can have different resource limits and priorities. |
Interactive job | A job that is run on a cluster in real time, allowing the user to interact with the job as it runs. On SLURM, this involves using salloc to allocate resources and srun to execute applications interactively within the allocated environment. |
Batch job | A job that is submitted to a cluster and run without user interaction. On SLURM, this is done with the sbatch command. |
I/O (Input/Output) Operations | The process of transferring data to and from storage devices. In HPC, this is a critical consideration for performance, as slow I/O can bottleneck the speed of a job. |
Scratch | Temporary storage on a cluster that is not backed up and is intended for short-term storage of data. Typically data is deleted periodically. Users are responsible for moving data to more permanent storage. |
Installing software
Installing software is a notoriously troublesome task, especially for beginners and when working on a server on which you don't have accsess to the root of the file system.
A couple of strategies have evolved to make this easier:
- Environments: Portions of the user's file system that are adjusted so they can install and run software, giving the user full control.
- Containers: Executable files that internally emulate the file system of the developer's computer, allowing the software in the container to be run without being explicitly installed on the user's computer.
There are several ways to create environments and containers which are covered below. Additionally, different environment management systems may work with different package repositories and managers, so we go over some of those as well.
Term | Definition |
---|---|
PATH |
An environment variable that lists directories where the shell looks for programs to run. A common error when running a program is Command not found, which occurs when the program is not in one of the directories listed in the PATH variable. |
Install | The process of setting up software on a computer system, typically involving downloading, configuring, building executables, and copying files to the appropriate directories. |
Build | The process of compiling and linking source code files to create an executable program or library, often automated using build tools like make . |
Source Code | The original human-readable code of a piece of software written in a programming language like C, C++, or Python, which must be compiled or interpreted to create executable software. |
Makefile | A file containing a set of instructions used by the make build automation tool to build programs. |
make |
A build automation tool that interprets the Makefile to compile programs from source code, handling dependencies and execution order. |
Binary distribution | A precompiled package containing executable files, libraries, and resources, ready for installation on a specific platform (like Windows, macOS, or Linux) without requiring compilation. |
Package | A bundled collection of software, libraries, or code that is packaged to ensure consistent distribution, installation, and functionality through package managers. |
Package repository | A central location where software packages are stored and managed. Repositories can be remote, accessed online, or local, residing on a user's system. |
Package manager | A tool that automates the process of installing, updating, configuring, and removing software packages, while managing dependencies for applications or systems. |
Dependency | A package or library needed for another package to function properly. Dependencies are managed automatically by package managers to maintain software compatibility. |
CRAN (Comprehensive R Archive Network) | The primary repository for R packages. Packages are installed within R using the install.packages() function. |
PyPI (Python Package Index) | The official third-party software repository for Python packages. Packages are installed using the pip package manager. |
pip | The package manager for Python, used to install packages from PyPI. |
Conda | An open-source package manager and environment management system, primarily used for managing Python and R packages within isolated environments. |
Mamba | A fast, alternative package manager to Conda, designed to speed up package installation and dependency resolution. |
Homebrew | A macOS package manager that simplifies the installation of software. It is used from the command line. |
apt (Advanced Package Tool) | A package manager used primarily by Debian-based Linux distributions, facilitating the retrieval, installation, and removal of software packages. |
Install script | A small program or set of commands that automate the process of installing software, often used for custom installations not handled by package managers. |
Environment | A general term for any isolated file space where programs are installed and run, providing specific configurations and dependencies required for consistent execution. |
Virtual environment (general) | An isolated computing environment that emulates a complete, independent system, allowing for the execution of applications with their specific dependencies without affecting the host system. |
Virtual environment (Python) | An isolated environment specific to Python projects, enabling developers to install and manage dependencies separately from the system's Python installation using tools like venv and virtualenv . |
Conda environment | An isolated workspace managed by Conda that supports multiple programming languages, allowing users to manage package dependencies and software versions, ensuring consistent, conflict-free installations. |
Container | A package that includes everything needed to run software consistently across different environments, including code, libraries, dependencies, and settings. |
Singularity container" | A type of container built with Singularity, used for running applications in a secure and portable way, tailored to scientific and high-performance computing environments. |
Docker container" | A type of container created with Docker that packages applications with their dependencies, ensuring consistent execution across various computing environments. |
Git terms
Git is a program that stores the history of files in any directory that has been initialized as a git repository. Used in conjunction with web-based platforms this makes for a powerful collaboration tool. However, there are many terms associated with Git that may be confusing. In essence, many of these terms are simply other words for "a copy" or "copying" a directory, however with slight distinctions. This table tries to define these terms clearly.
Term | Definition |
---|---|
Git | Software for version control, which keeps track of changes to files in a given directory. |
Version control | The process of tracking changes to files over time, allowing you to recall specific versions later. |
Github | A web-based platform that facilitates Git's use for collaboration between individuals. Other web-based platforms include GitLab and BitBucket. |
Repository/Repo | A directory of files that has been initialized by Git for syncing, possibly including code, documentation, or data. |
Remote repository | A repository that is hosted on a server, typically on a web-based platform like Github. |
init |
The process of initializing a directory as a Git repository. |
clone |
A copy of a repository or the process of copying a repository. Typically used when downloading a repository from Github or one of the other web-based Git platforms. |
Fork | Creating a personal copy of a repository on your own account from an original repository, possibly from another account or organization, enabling modifications without affecting the original. |
Branch | A copy of a repository from a certain point within that repository's history. Typically a repository has a "main" branch and other branches are created off of it. Changes on the main branch are not reflected in the split branch unless explicitly synced and vice versa. |
pull |
The process of integrating changes from one version of a repository to another (e.g. from a fork back to the original repo, or from a branch back to the main branch). There are two general use cases: 1) When the owner of a repository makes changes to it, you pull those changes into your local copy. 2) When you make changes to a forked repository or a branch of a repository and want to incorporate the changes back to the original repo or branch, you initiate a pull request, and then whoever is in charge of the original repository can pull those changes in. |
Pull request | When someone has made changes to a fork or a branch that they wish the owner's or the original repository to incorporate, they initiate a pull request so the owner can review and potentially pull the changes. |
merge |
The process of combining changes from one branch into another branch, typically done as part of a pull request. |
Conflict | A situation where two branches have changes in a file that Git cannot automatically merge, requiring manual resolution. |
Staging area | When a user changes or adds files to a repository, they must first add them to an intermediate staging area where they can be reviewed. |
add |
The process of adding new/edited files to the staging area. |
commit |
The process of saving changes to the repository. This is done after adding files to the staging area. |
push |
The process of uploading committed changes from a local repository to a remote repository to a remote platform (e.g. Github). |
status |
A command that shows the status of the repository, including which files have been changed, added, or deleted. |
switch /checkout |
A command that allows you to switch between branches. |
.gitignore | A file that tells Git which files to ignore when committing changes. |