3. Scripting in ArcGIS Pro

Jupyter, ArcPy, visualization, and workflows.

Introduction

What is programmatic GIS?

Programmatic GIS refers to the practice of using programming languages and scripts to perform geospatial analysis, automate GIS tasks, and develop custom geospatial applications. Rather than relying solely on graphical user interfaces (GUIs) of GIS software, programmatic GIS empowers users to harness the full capabilities of GIS through code.

While the GUI is a vital part of GIS and creates a venue for speedy learning, it can be very slow (esp. in ArcGIS Pro) and does not make sense for large scope projects or repetitive tasks.

Why use scripting for GIS?

  • Allows for reproducible science and workflows
    • Supplementary material for manuscripts
  • Turns repetitive tasks into simple tools
  • Some tools are only available in the coding interface
  • Integrates with other tools and methods
    • Modules, extensions, APIs
    • Data wrangling and visualization

What is Python?

Python is a versatile and powerful programming language widely used in various fields, including web development, data science, artificial intelligence, and, importantly, GIS.

While R is widely used across academia, many private industries use Python as the standard language for programmatic GIS. Python is the primary coding language used in ArcGIS Pro, along with Arcade (Esri’s proprietary language).

import arcpy

What is Jupyter?

Jupyter Notebooks are interactive computing environments that allow you to combine live code, equations, visualizations, and narrative text all in one document. They support multiple programming languages, including Python, R, and Julia, making them versatile tools for various data analysis tasks.

Jupyter Notebooks is installed as part of the ArcGIS Pro installation process or added later using the ArcGIS Pro package manager. Once installed, you can launch Jupyter Notebooks directly from the ArcGIS Pro interface or a Python command prompt.

Jupyter Notebook within ArcGIS Pro

Basics of Python Programming

Data Classes and Structures

Python provides several built-in data classes and structures that allow developers to organize and manipulate data efficiently.

Numbers

Integer (int): Integers represent whole numbers without any decimal point.

Float (float): Floats represent real numbers with a decimal point.

# Example of integers
x = 5
y = -10
z = 0

print(x, y, z)  # Output: 5 -10 0


# Example of floats
a = 3.14
b = -0.5
c = 2.0

print(a, b, c)  # Output: 3.14 -0.5 2.0

Strings

String (str): Strings are sequences of characters, enclosed within single quotes (’ ’) or double quotes (” “).

# Example of strings
name = "Alice"
message = 'Hello, world!'

print(name)     # Output: Alice
print(message)  # Output: Hello, world!

Boolean

Boolean (bool): Booleans represent truth values, either True or False.

# Example of booleans
is_valid = True
is_greater = False

print(is_valid)    # Output: True
print(is_greater)  # Output: False

Lists

List: Lists are ordered collections of items, which can be of any data type. Lists are mutable, meaning they can be changed after creation.

# Example of lists
numbers = [1, 2, 3, 4, 5]
fruits = ['apple', 'banana', 'orange']

print(numbers)  # Output: [1, 2, 3, 4, 5]
print(fruits)   # Output: ['apple', 'banana', 'orange']

Tuples

Tuple: Tuples are similar to lists, but they are immutable once created.

# Example of tuples
point = (10, 20)
colors = ('red', 'green', 'blue')

print(point)   # Output: (10, 20)
print(colors)  # Output: ('red', 'green', 'blue')

Dictionaries

Dictionary (dict): Dictionaries are unordered collections of key-value pairs. Each key must be unique.

# Example of dictionaries
person = {'name': 'Alice', 'age': 30, 'city': 'New York'}
grades = {'math': 90, 'science': 85, 'history': 88}

print(person)  # Output: {'name': 'Alice', 'age': 30, 'city': 'New York'}
print(grades)  # Output: {'math': 90, 'science': 85, 'history': 88}

Sets

Set: Sets are unordered collections of unique elements. They are useful for mathematical operations like union, intersection, etc.

# Example of sets
set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}

print(set1)  # Output: {1, 2, 3, 4, 5}
print(set2)  # Output: {4, 5, 6, 7, 8}

Common Geospatial Packages

Python has both base and extended functionality where the latter is provided by external packages (i.e., modules) that can be installed, and then imported into our workspace.

  • GeoPandas: GeoPandas extends the Pandas library to work with geometric data types, allowing for easy manipulation and analysis of spatial datasets.

  • Shapely: Shapely is a library for geometric operations and manipulations of geometries (points, lines, and polygons). It is often used in conjunction with GeoPandas.

  • Fiona: Fiona is a Python wrapper around the OGR library, providing an interface for reading and writing spatial data formats (e.g., Shapefile, GeoJSON, etc.).

  • Pyproj: Pyproj is a Python interface to the PROJ library, allowing for geospatial transformations, conversions between coordinate reference systems, and calculations of distances and areas.

  • GDAL (Geospatial Data Abstraction Library): GDAL is a powerful library for reading, writing, and processing raster and vector geospatial data formats. It is often used in combination with other Python libraries like Fiona and Rasterio.

  • Rasterio: Rasterio is a Python library for reading and writing raster geospatial data formats. It provides an intuitive interface for working with raster datasets, including georeferencing and processing.

Python in ArcGIS

arcpy is the main python module to interact with the ArcGIS ecosystem and will be the focus of our exploration today.

import arcpy

roads = "c:/base/data.gdb/roads"
output = "c:/base/data.gdb/roads_Buffer"

# Run Buffer using the variables set above and pass the remaining 
# parameters in as strings
arcpy.Buffer_analysis(roads, output, "distance", "FULL", "ROUND", "NONE")

Best Practices and Tips:

1. Writing Efficient and Readable arcpy Scripts:

  • Use meaningful variable names and comments to enhance code readability.
  • Break down complex tasks into smaller, modular functions.
  • Optimize arcpy scripts by minimizing unnecessary loops and operations.

2. Managing Memory and Resources:

  • Release resources and locks on datasets using del statements and arcpy.ClearWorkspaceCache_management().
  • Use context managers (with statements) for managing arcpy environments and cursors to ensure proper resource cleanup.

3. Documenting Code and Workflows:

  • Document arcpy scripts with clear and concise comments explaining the purpose of each section and important steps.
  • Maintain separate documentation files or READMEs detailing script usage, input/output data, and dependencies.

4. Debugging and Troubleshooting arcpy Scripts:

  • Use print statements and logging to debug arcpy scripts.
  • Handle errors gracefully using try-except blocks to prevent script failures.
  • Utilize arcpy’s error handling mechanisms to identify and resolve issues.

5. Version Control and Collaboration:

  • Use version control systems (e.g., Git) to track changes and collaborate with team members on arcpy projects.
  • Establish coding standards and conventions for consistent arcpy script development within the team.