In programming, a variable is a symbolic name or identifier that represents a storage location in the computer’s memory. Thus, variables provide a means to access (and manipulate) data stored in the computer’s memory from within the program. In practice a variable acts as a container for storing data or values.
Data Objects. In R, data objects1 are the fundamental entities that hold data or information. They are the variables or entities that you create and manipulate in your R code. Data objects can include vectors, matrices, data frames, lists, etc. Essentially, anything that can store data or represent information in R is considered a data object. data Objects are of course stored in the computer’s memory, hence they are referenced and manipulated using variables.
Data Types. Data types in R refer to the specific classification or categorization of the values that can be stored in data objects. R has several basic data types, also known as atomic data types2. These data types represent different kinds of data, such as logical (boolean), numeric (double and integer), complex, character, and raw. Each data type has its own set of operations and behaviors associated with it. For example, logical data types can take values of TRUE, FALSE, and NA (representing missing values), while numeric data types can store numerical values.
In R, variables can hold different data types without the need to declare them in advance. Values are assigned to variables using the assignment operator <-. The symbol = can also be used, but it is not standard syntax and not recommended.
Variable names may contain dots and underscores but cannot begin with a number (or underscore).
In R, the class of a variable refers to the specific type or category of the object that the variable represents. It provides information about how the object is handled by R’s internal functions and methods. Classes in R are typically associated with specific data structures or types. For example, a variable can have a class such as “numeric” if it represents a numeric data type, “character” if it represents a character string, “data.frame” if it represents a data frame, and so on. The class of a variable is important because it influences how R treats the object and determines which functions or methods can be applied to it. Different classes have different behaviors and may have specific functions or methods associated with them.
Finally, in R, NA, NaN, and NULL are special values used to represent specific situations or indicate missing or undefined data:
NA: NA stands for “Not Available” or “Missing Value”. It is used to represent missing or undefined values in R. The value NA belong to different data types, such as logical, numeric, character, etc. For example, if a value is missing in a numeric vector, it can be represented as NA. NA values are often ignored or excluded from computations or operations unless explicitly handled.
NaN: NaN stands for “Not a Number”. It is a special value used to represent undefined or unrepresentable results of arithmetic operations that do not yield a numeric value. NaN is typically encountered when performing calculations that involve undefined operations, such as dividing zero by zero or taking the square root of a negative number.
NULL: NULL is a special object in R that represents the absence of a value or an empty object. It is often used to indicate the absence of a valid object or as a placeholder. NULL is different from NA, as it represents the absence of any value or object altogether.
A_Number <-53A_Number
[1] 53
A_Text <-"R rocks!"A_Text
[1] "R rocks!"
A_Logical <-TRUE#one of the possible values of a boolean data typeA_Logical
[1] TRUE
A_Logical <- true
Error: object 'true' not found
A_NA <-NAA_NA
[1] NA
A_NA <- na
Error: object 'na' not found
To summarize, variables are used refer to specific data objects that are the containers or entities that hold data. The data types specify the kind of values that can be stored within data objects. Data types determine how the data is interpreted and how operations are performed on the data objects.
2Basic built-in functions
A function in R is a block of code that is written to perform a specific task or operation. The base R system includes many functions that are readily available for use without requiring any additional packages or installations. These built-in functions are part of the R language itself and provide core functionality for performing various operations and tasks.
Built-in functions in R cover a wide range of areas, including data manipulation, statistical analysis, mathematical calculations, plotting, file input/output, and more. They are designed to be efficient, reliable, and consistent across different R installations.
In an introductory R course, it is essential to familiarize oneself with the built-in functions available in the language, understand their required arguments, and learn when to utilize them effectively. Here are a few examples ob basic functions used to manipulate data objects:
Function
Description
class(obj)
return class of obj
help(obj)
explains about obj
ls(obj)
list obj in environment
rm(obj)
removes obj
nchar(arg)
number of characters in arg
as.character(arg)
converts arg to character
Functions usually require arguments, which are the values or variables that are passed to a function when it is called or invoked. They provide input to the function and influence its behavior or output. In R, there are two commonly used ways to specify function arguments: by position and by keyword. 1. Arguments by Position. In this method, arguments are provided to a function in a specific order. The function expects the values in the same order as they are defined in the function’s definition. 2. Arguments by Keyword. This approach involves explicitly mentioning the argument names followed by their corresponding values when calling a function. The order of the arguments doesn’t matter as long as their names are specified correctly. Notice that the pair of names and values are made using the assign operator =.
# Previous chunk of code must be run prior this onels()
Error in log(Not_A_Number): non-numeric argument to mathematical function
rm(Not_A_Number)ls()
[1] "A_Logical" "A_NA" "A_Number" "A_Text"
3Operators
Operators are symbols or characters that represent specific operations or actions to be performed on data or variables. We have already seen one very common operator: the assignment operator <-, which is used to assign values to variables. Operators can be classified into different groups according to the type of operation they perform:
Arithmetic Operators are used to perform mathematical calculations on numeric values
Operator
Symbol
Addition
+
Subtraction
-
Multiplication
*
Division
/
Power
^ and **
Modulo
%%
Trunc. div.
%/%
Relational Operators are used to compare values and return logical values (TRUE or FALSE).
Operator
Symbol
Equal to
==
Not equal to
!=
Greater than
>
Less than
<
Equal or greater than
>=
Equal or lower than
<=
Trunc. div.
%/%
Logical Operators are used to combine or manipulate logical values.
Operator
Symbol
AND
&
OR
|
XOR
xor
Not
!
Membership are used to check if an element belongs to a set or vector. The operator %in%checks if the element on the left of the operator is present in the vector no the right.
5+2
[1] 7
"viva"+"luis"
Error in "viva" + "luis": non-numeric argument to binary operator
a =5a/3
[1] 1.666667
a%%3
[1] 2
b ="B"b +"C"
Error in b + "C": non-numeric argument to binary operator
a == b
[1] FALSE
b =="B"
[1] TRUE
b >"A"
[1] TRUE
b >"A"& a !=5
[1] FALSE
xor(b >"A", a !=5)
[1] TRUE
4Data Structures
In R, a data structure refers to the way data is organized and stored in memory. It determines how the data can be accessed, manipulated, and processed. Data structures in R provide a way to store and organize data objects. Data objects, on the other hand, are specific instances of data that are stored within the data structures. They are the actual values or elements that are held by the data structures. There are five types of basic data structures: - homogeneous: All elements must be of the same type. - Vectors: One-dimensional collection of numeric, character, or logical data. - Matrices: Two-dimensional collection of numeric, character, or logical data. - Arrays: n-dimensional collection of numeric, character, or logical data. A matrix is just a specific type of array where n=2. Arrays and matrices are just vectors with the attribute dim. Applying this attribute sets a number of dimensions that the vector is divided into and allows retrieval of the elements and its representation in a specific way.
heterogeneous: Elements can be of different types.
Lists: An ordered collection of objects (called components) that can be of the same or different types.
Data frames: Just a type of list that is represented as a matrix and has some restrictions:
The components must be vectors (numeric, character, or logical), factors, numeric matrices, lists, or other data frames.
Matrices, lists, and data frames provide as many variables to the new data frame as they have columns, elements, or variables, respectively.
Numeric vectors, logicals, and factors are included as is, and by default, character vectors are coerced to be factors, whose levels are the unique values appearing in the vector.
Vector structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same row size.
4.1Vectors
A vector is a (one dimensional) collection of elements, all of the same type. Vectors play an essential role in R, so much that R can be considered a vectorized language. That means that operations are applied to each element of the vector automatically, without the need to loop through the elements of the vector as it would be necessary in almost any other language such a python. The most common way to generate a vector is using the combine or concatenate function c() providing the list of elements separated by commas. We can accessing individual elements of the vector using brackets and the index of the elements.
Warning
Vectors are indexed starting from 1 instead of 0, following a 1-based indexing convention. This means that the first element of a vector is accessed using the index 1, not 0 as in some other programming languages such as python.
v1 <-c(1, 11:13) #the concatenate function creates a vector out of the valuesclass(v1)
[1] "numeric"
is.numeric(v1)
[1] TRUE
v4 <-as.character(v1) #change the data type of the elements in v1v4
[1] "1" "11" "12" "13"
v5 <-c(1, 3, 27)v5[1] #elements are retrieved by using square brackets and the index of the element
[1] 1
v5[0] #Warning! first element is index 1
numeric(0)
# we can 'add' labels (names) to the elements of a vectornames(v5)
NULL
names(v5) <-c("PHY", "BIO", "BQ")v5
PHY BIO BQ
1 3 27
names(v5)
[1] "PHY" "BIO" "BQ"
v5["BIO"] # then, we can retrieve elements by their label
BIO
3
sum(v5)
[1] 31
v6 <-c("PHY", "BIO", "BQ")# cheching the presence of an element in a vector"BQ"%in% v6
[1] TRUE
"BQ"%in% v5
[1] FALSE
"BQ"%in%names(v5)
[1] TRUE
"PHY"%in%"BIOPHYSICS"#Warning: it does NOT checks for substrings in strings
[1] FALSE
names(v5) =="BIO"
[1] FALSE TRUE FALSE
which(names(v5) =="BIO") #returns the index(es) of the elements matching the comparison
[1] 2
4.1.1Generation of vectors of periodic sequences
R has a number of facilities for generating commonly used sequences of numbers or characters. The most common are the colon operator :, function seq() and function rep(). See the next examples:
Note that function order() returns of the element in the current vector sorted according to the value of the element. In the example above the smallest number is -1 that occupies position 7 (index 7) in the vector. Thus, the first element in the output of function order() is 7. The second smaller is number 1, that occupies position the second position in the original vector(index 2), thus the second element in the output of the function is 2 and so on.
4.2Factors
Factors are used to record categorical (a.k.a nominal) variables. The function factor() takes a vector and identifies all the different values present in it, then each value is assumed to be a different category or level. The levels represent the distinct categories or groups that the variable can take on.
# the following vectors represent the initial and final# weight of 15 individualsW.initial <-c(55, 65, 70, 93, 71, 50, 61, 80, 81, 60, 43, 77,78, 65, 100) #initial weightW.final <-c(52, 66, 71, 92, 61, 51, 55, 81, 70, 52, 44, 78,77, 60, 90) #final weight# the following character vector represent the treatment# received by each individualTxt <-c("D3", "D2", "D3", "D3", "D1", "D3", "D2", "D2", "D2","D1", "D3", "D2", "D2", "D1", "D1") #TreatmentTxt
A matrix is a two-dimensional data structure that consists of rows and columns. It is used to store data elements of the same data type arranged in a rectangular grid-like structure. To retrieve a single element, specify the row and column indices within square brackets. An array is a multi-dimensional data structure that extends the concept of matrices to more than two dimensions. It can store elements of the same data type in a grid-like structure with multiple dimensions.
MyVector <-1:9MyVector
[1] 1 2 3 4 5 6 7 8 9
MyMatrix <-matrix(MyVector, ncol =3) #geneates a 3 colunm matrix with elements from MyVector MyMatrix
t(MyMatrix) #flips a matrix over its main diagonal (transposition)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
MyVector2 <-1:12MyArray <-array(MyVector2, dim =c(2, 2, 3)) #geneates a 2x2x3 array with elements from MyVector2 :# accessing individual elementsMyArray[, , 2]
[,1] [,2]
[1,] 5 7
[2,] 6 8
4.4Data frames
Data frames are two-dimensional tabular data structures where each column is a vector. Unlike matrices, data frames allow columns of different data types within the same structure. This flexibility allows you to handle mixed or heterogeneous data, such as storing numeric measurements, categorical variables, dates, and more within a single data frame. Data frames are created with the data.frame() function and the elements can be retrieved with the square bracket notation specifying the row and column indexes, as we saw for matrices. In addition, column vectors can be retrieved using the $ operator (list-subset operator).
A list is a data structure consisting of an ordered collection of (potentially different) objects. They are constructed using the function list(). The elements (a.k.a components) in a list are numbered and can be accessed using a double bracket notation and the index of the component. Oftentimes, the elements in a list are also named. In this case, the components can be accessed using the double square bracket and the name of the item instead of its index. Additionally names components can be retrieved using the $ operator (list-subset operator).
# A list of students recording different features notice# that each element can be a different type of object or# data structureStudents <-list(Names =c("Ana", "Juan", "Mar"), Age =c(20,20, 21), Sex =factor(c("F", "M", "F")), Courses =list(c("HPBBM","BD", "BIBMS"), c("HPBBM", "MEB", "BIBMS"), c("BD", "BQS")),Grades =list(c(7, 9, 8), c(5, 5, 6), c(10, 9)))Students
$Names
[1] "Ana" "Juan" "Mar"
$Age
[1] 20 20 21
$Sex
[1] F M F
Levels: F M
$Courses
$Courses[[1]]
[1] "HPBBM" "BD" "BIBMS"
$Courses[[2]]
[1] "HPBBM" "MEB" "BIBMS"
$Courses[[3]]
[1] "BD" "BQS"
$Grades
$Grades[[1]]
[1] 7 9 8
$Grades[[2]]
[1] 5 5 6
$Grades[[3]]
[1] 10 9
str(Students) #shows the structure of the data object
List of 5
$ Names : chr [1:3] "Ana" "Juan" "Mar"
$ Age : num [1:3] 20 20 21
$ Sex : Factor w/ 2 levels "F","M": 1 2 1
$ Courses:List of 3
..$ : chr [1:3] "HPBBM" "BD" "BIBMS"
..$ : chr [1:3] "HPBBM" "MEB" "BIBMS"
..$ : chr [1:2] "BD" "BQS"
$ Grades :List of 3
..$ : num [1:3] 7 9 8
..$ : num [1:3] 5 5 6
..$ : num [1:2] 10 9
Everything in R is an object (constants, data structures, functions, graphs, etc.). Objects have a mode that describes how the object is stored and a class that describes how the object is handled by functions. The mode corresponds to what is called a class in other languages. Additionally, objects belong to a type, which basically coincides with the mode.↩︎
Note that in R, atomic types are always stored as vectors, even when there is a single element.↩︎