4. Spatial Data in R

Vector and raster packages, spatial operations, and cartography.

What is R and RStudio?

The Language (R)

  • R is a powerful programming language used for data analysis and visualization
  • Wide range of packages and libraries, making it suitable for various scientific disciplines, including spatial data analysis in wildlife and fisheries science

The IDE (RStudio)

  • Integrated development environment for R (and Python, Julia)
  • Interface with data sources and tables, allows for code completion and syntax highlighting
  • Allows for easy access to version control functionality, such as git and GitHub
  • Alternatives = VSCode, Jupyter Notebook

RStudio interface

Why Use R for Spatial Data?

  • Open source (free) option to build maps, perform analyses, and convert data
    • If working with non-profits or in academia, many entities have expertise in R
  • Allows for reproducible science and workflows (sound familiar?)
    • Supplementary material for manuscripts
    • Co-workers can use entire scripts or snippets to update models and maps
  • Integrates with thousands of other R packages for modeling, spatial statistics, and visualization
  • Well-documented and plenty of examples from a continuously growing community of R users

Basics of R Programming

Each programming language has specific ways of doing things, and R is no different. The following section contains a general introduction to R classes and data types.

Data Classes

Depending on which operations you are performing, the data class will make a huge difference. These classes determine how R treats and operates on the data.

Numeric

Numeric data class represents continuous numerical values. E.g., temperature

# Print a numeric vector
c(3.14, 2.718, -10, 100.5)
[1]   3.140   2.718 -10.000 100.500

Integer

Integer data class represents whole numbers without decimal points. E.g., counts

# Print an integer vector
c(1L, 2L, -5L, 100L)
[1]   1   2  -5 100

Character

Character data class represents text strings. E..g., identification

# Creating a character vector
c("apple", "banana", "cherry")
[1] "apple"  "banana" "cherry"

Factor

Factor data class represents categorical variables with a fixed set of possible values. E.g., land cover

# Creating a factor vector
factor(c("male", "female", "female", "male"),
       # specify levels
       levels = c("male", "female"))
[1] male   female female male  
Levels: male female

Logical

Logical data class represents binary values indicating true or false. E.g., presence/abscence

# Creating a logical vector
c(TRUE, FALSE, TRUE, TRUE)
[1]  TRUE FALSE  TRUE  TRUE
# Specify binary as logical
as.logical(c(1,1,0,1,0))
[1]  TRUE  TRUE FALSE  TRUE FALSE

Data Types/Structures

Vectors:

One-dimensional arrays that can hold numeric, character, or logical values.

# Numeric vector
c(1, 2, 3, 4, 5)
[1] 1 2 3 4 5
# Character vector
c("apple", "banana", "orange", "grape", "kiwi")
[1] "apple"  "banana" "orange" "grape"  "kiwi"  
# Logical vector
c(TRUE, FALSE, TRUE, FALSE, TRUE)
[1]  TRUE FALSE  TRUE FALSE  TRUE

Matrices:

Two-dimensional arrays with rows and columns of the same data type.

# Create a matrix
# Create a 3x3 matrix with data filled by row-wise
matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE) 
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Data Frames:

Tabular data structures, similar to spreadsheets, consisting of rows and columns.

# Create a data frame
data.frame(
  Name = c("John", "Alice", "Bob", "Emily"),
  Age = c(25, 30, 35, 28),
  Gender = c("Male", "Female", "Male", "Female"),
  stringsAsFactors = FALSE
)
   Name Age Gender
1  John  25   Male
2 Alice  30 Female
3   Bob  35   Male
4 Emily  28 Female

Lists:

Collections of objects, which can be of different data types.

list(
  numeric_vector = c(1, 2, 3),
  character_vector = c("a", "b", "c"),
  matrix_data = matrix(1:4, nrow = 2),
  data_frame = data.frame(
    Name = c("John", "Alice"),
    Age = c(25, 30),
    stringsAsFactors = FALSE
  )
)
$numeric_vector
[1] 1 2 3

$character_vector
[1] "a" "b" "c"

$matrix_data
     [,1] [,2]
[1,]    1    3
[2,]    2    4

$data_frame
   Name Age
1  John  25
2 Alice  30

Basic Operations

Arithmetic operations:

These run mathematical operations on two or more elements.

# Addition
10 + 5
[1] 15
# Subtraction
10 - 5
[1] 5
# Multiplication
10 * 5
[1] 50
# Division
10 / 5
[1] 2
# Explonetiation
10 ^ 5
[1] 1e+05
# Square root
sqrt(10)
[1] 3.162278

Logical operations:

These assess relationships between two or more elements.

# Less than
10 < 5
[1] FALSE
# Greater than
10 > 5
[1] TRUE
# Equal to
10 == 5
[1] FALSE
# Not equal to
10 != 5
[1] TRUE

Assignment operators:

These assign values or groups of values to an object that is stored in memory.

# Assigning numeric
var_1 <- 10
var_1
[1] 10
# Assigning character strings
var_2 <- "species"
var_2
[1] "species"
# Assigning vector of strings
var_3 <- c("sppA", "sppB", "sppC")
var_3
[1] "sppA" "sppB" "sppC"

Function calls:

These pass a series of arguments to a pre-defined process, called a function.

name_of_function(argument_1 = value_1,
                 argument_2 = value_2,
                 argument_3 = value_3)

R Packages

Packages in R are collections of functions curated by developers to make life easier. They extend the functionality by providing additional tools, common workflows, and accessible datasets.

Installing & Loading Packages

R packages can generally be downloaded/installed via two primary methods. (1) from the CRAN - Comprehensive R Archive Network, or (2) from a GitHub repository containing the package. While many popular packages are on CRAN, specific application packages might only be available on GitHub.

# Installing packages from CRAN
install.packages("package_name")

# Installing packages from GitHub
  ## Install and load the remotes package (if not already installed)
  if (!requireNamespace("remotes", quietly = TRUE)) {
    install.packages("remotes")}
  library(remotes)
  ## Install the package from the GitHub repository
  install_github("username/repository")

# Loading any installed package
library(package_name)

The Tidyverse

The tidyverse is a collection of R packages designed for data science and statistical analysis. It provides a cohesive framework for working with data by emphasizing consistency, readability, and efficiency.

# Installing the tidyverse entirely, note the name as a string
install.packages("tidyverse")

The core philosophy of the tidyverse centers around the principles outlined in the “tidy data” concept, where datasets are organized in a structured format with each variable forming a column, each observation forming a row, and each type of observational unit forming a table.

  • ggplot2 - for creating static and interactive visualizations
  • dplyr - for data manipulation, filtering, and summarizing
  • tidyr - for getting data into tidy wide and tidy long formats
  • readr - for reading flat files, like .csv and .tsv
  • purrr - for working with functions and vectors
  • tibble - for easy handling of tidy dataframes
  • stringr - for working with strings
  • forcats - for working with factors (categorical data)

Piping Functions

The tidyverse also introduced the pipe (%>%) which allows the chaining of functions. Pipes enable expressive code where the output of a previous function becomes the first argument of the next function, enabling chaining. RStudio now has a native pipe (|>) that works with a variety of non-tidyverse packages. Note: ctrl+shift+M = shortcut

# Example code with piped functions, note the native R pipe used
present_veg_area <- present_veg_sf |> 
  mutate(BeaverVegCat = as.factor(BeaverVegCat)) |> 
  group_by(LandscapeID, BeaverVegCat, Survey_Year) |> 
  summarise(total_veg_area_m2 = sum(area_m2, na.rm = TRUE)) |> 
  arrange(LandscapeID)

Common R Spatial Packages

There are likely over 100 R packages that can handle some aspect of spatial data, these are the most popular ones that we will explore some in the workshop. A more comprehensive list can be found here: https://cran.r-project.org/web/views/Spatial.html

Spatial packages in R are currently in a transition period, with many historically common packages being replaced by newer, more performant varieties. E.g., sp, rgdal, rgeos, and raster are deprecated along with their spatial object types, but are often still dependencies.

Note: indicates packages utilized in this workshop.

For General Spatial Data Handling

  • sf - for working with vector spatial data
  • terra - for working with raster spatial data
  • stars - for working with spatial time series (vector and raster data cubes)
  • spatstat - for spatial statistics, focusing on spatial point patterns
  • sfnetworks - for analysis of geospatial networks
  • geometa - for writing and reading OGC/ISO geographic metadata in R
  • ncdf4 - for reading, writing, and manipulating netCDF files
  • mapedit - for drawing, editing, and deleting spatial data interactively in R

For Ecology & Natural Resources

  • rgbif - interface with the GBIF to download and view species occurrence records from database
  • landscapemetrics - for landscape ecology metric calculations (FRAGSTATS for R)
  • spatialEco - for spatial analysis and modelling of ecological systems
  • ade4 - some capabilities for spatial multivariate analysis methods for ecology
  • adehabitatHR - a collection of tools for analyzing wildlife habitat selection and spatial ecology
  • dismo - tools for species distribution modeling and ecological niche modeling

For Mapping & Cartography

  • ggplot2 - creating static maps in the grammar of graphics style
  • tmap - simple thematic maps, including both static and interactive map options
  • mapsf - simple map creation using sf objects in the ggplot2 system, successor of cartography
  • leaflet - for creating interactive web maps, including popups and basemaps
  • mapview - provides an interactive viewer for exploring spatial data
  • plotly - turns maps built with ggplot2 into interactive charts
  • ggspatial - adds spatial geometries and annotations to ggplot2

For Getting Spatial Data

  • rnaturalearth - state and national boundaries across the United States and world
  • tidycensus - population, state boundaries, and other census metrics
  • tigris - TIGER/Line files (i.e., roads, highways) from U.S. Census
  • geodata - diverse source of climate, elevation, admin boundaries, land use, and more
  • elevatr - elevation rasters across the world
  • osmdata - any and all Open Street Map features, including roads, buildings, and more
  • spocc - many species occurrence repositories, including GBIF, BISON, iNaturalist, eBird

Loading Spatial Data Into R

Several R packages can load spatial data into the R environment, but we will focus on sf and terra for this workshop. Note that creating an R spatial object does not save it to a file directory automatically, it simply loads the spatial information into your current R session (in memory). For large files, it can be useful to subset to only include a specific area of interest (AOI).

Vector Data

Remember that the vector data model represents points, lines, and polygons. We can load these spatial features into our R environment with the sf package using the st_read() function. But first we use the st_layers function to identify all the layers we have available to us in our geopackage.

## Reference spatial layers from geopackage (n = 25)
gpkg_dsn <- "BeaverHabitatSelection.gpkg"
gpkg_layers <- sf::st_layers(dsn = gpkg_dsn)

## Load in beaver-absent vegetation layer
absent_veg_sf <- sf::st_read(dsn = gpkg_dsn,
                             layer = gpkg_layers$name[3])

If we need to reference a shapefile in our directory, we can still use st_read() but need to specify the .shp extension in our function.

# Set the file path to your shapefile (.shp)
path <- "path/to/your/shapefile.shp"

# Read the shapefile using sf::st_read()
shapefile <- sf::st_read(path)

Simple Feature Geometry Types

The “sf” in sf stands for simple features, which is an open data standard by the Open Geospatial Consortium (OGC). This standard is used across many software systems (e.g., QGIS, PostGIS) and contains seven core geometry types that are supported by sf.

  • Single Geometries
    • POINT - by using st_point()
    • LINESTRING - by using st_linestring()
    • POLYGON - by using st_polygon()
  • Multiple Geometries
    • MULTIPOINT - by using st_multipoint()
    • MULTILINESTRING - by using st_multilinestring()
    • MULTIPOLYGON - by using st_multipolygon()
  • Geometry Collections
    • GEOMETRYCOLLECTION - by using st_geometrycollection()

Raster Data

Remember that the raster data model represents the world with a continuous grid of cells, called pixels. We can load raster files (e.g., .tif) into our R environment using the rast() function in the terra package.

# Set tje file path to your raster, in this case a .tif
raster_filepath = system.file("raster/srtm.tif")

# Read the raster as a SpatRaster
my_rast = rast(raster_filepath)

Every raster object has a header that can be viewed by calling the object in the console. This contains information like the dimensions and CRS of the raster.

# Call the raster object to view its header
my_rast

#> class       : SpatRaster 
#> dimensions  : 457, 465, 1  (nrow, ncol, nlyr)
#> resolution  : 0.000833, 0.000833  (x, y)
#> extent      : -113, -113, 37.1, 37.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : srtm.tif 
#> name        : srtm 
#> min value   : 1024 
#> max value   : 2892

It is important to note that several other R packages still require raster data to be in the older format handled by the raster package. In this case, we can coerce our SpatRaster into the raster format.

# Convert to a RasterLayer
my_RasterLayer <- raster::raster(my_rast)

# Convert to a RasterStack
my_RasterStack <- raster::stack(my_rast, my_rast)

Spatial Operations

Here are some example spatial options in R using sf and the tidyverse.

Joining

world_coffee = left_join(world, coffee_data)
#> Joining with `by = join_by(name_long)`

class(world_coffee)
#> [1] "sf"         "tbl_df"     "tbl"        "data.frame"

Summarizing Data

world_agg4  = world |> 
  group_by(continent) |> 
  summarize(Pop = sum(pop, na.rm = TRUE),
            Area = sum(area_km2),
            N = n())

Mapping Spatial Data In R

As we saw before, several packages can create maps in R so we have several to pick from. Which one you use will depend on your familiarity with the package and the purpose of the map you are creating.

In general, there are two types of maps we will create in R: (1) static, such as those for publications, and (2) interactive/dynamic, such as those we can view on our computer screens and allow us to zoom in and drag to navigate around the map.

ggplot

library(ggplot2)

# Create a map using ggplot2
ggplot() +
  geom_sf(data = World,
          aes(group = group),
          fill = "lightblue",
          color = "black") +
  labs(title = "Map using ggplot2") +
  theme_minimal()

tmap

library(tmap)

# Create a map using tmap
tm_shape(World) +
  tm_polygons(col = "lightblue",
              border.col = "black") +
  tm_layout(title = "Map using tmap")

mapview

library(mapview)

# Create a map using mapview
mapview(World,
        zcol = NULL,
        col.regions = "lightblue",
        legend = TRUE, 
        map.types = "Esri.WorldGrayCanvas",
        layer.name = "Map using mapview")