3. Scripting in ArcGIS Pro
Jupyter, ArcPy, visualization, and workflows.
Introduction
What is programmatic GIS?
Programmatic GIS refers to the practice of using programming languages and scripts to perform geospatial analysis, automate GIS tasks, and develop custom geospatial applications. Rather than relying solely on graphical user interfaces (GUIs) of GIS software, programmatic GIS empowers users to harness the full capabilities of GIS through code.
While the GUI is a vital part of GIS and creates a venue for speedy learning, it can be very slow (esp. in ArcGIS Pro) and does not make sense for large scope projects or repetitive tasks.
Why use scripting for GIS?
- Allows for reproducible science and workflows
- Supplementary material for manuscripts
- Turns repetitive tasks into simple tools
- Some tools are only available in the coding interface
- Integrates with other tools and methods
- Modules, extensions, APIs
- Data wrangling and visualization
What is Python?
Python is a versatile and powerful programming language widely used in various fields, including web development, data science, artificial intelligence, and, importantly, GIS.
While R is widely used across academia, many private industries use Python as the standard language for programmatic GIS. Python is the primary coding language used in ArcGIS Pro, along with Arcade (Esri’s proprietary language).
import arcpy
What is Jupyter?
Jupyter Notebooks are interactive computing environments that allow you to combine live code, equations, visualizations, and narrative text all in one document. They support multiple programming languages, including Python, R, and Julia, making them versatile tools for various data analysis tasks.
Jupyter Notebooks is installed as part of the ArcGIS Pro installation process or added later using the ArcGIS Pro package manager. Once installed, you can launch Jupyter Notebooks directly from the ArcGIS Pro interface or a Python command prompt.
Basics of Python Programming
Data Classes and Structures
Python provides several built-in data classes and structures that allow developers to organize and manipulate data efficiently.
Numbers
Integer (int): Integers represent whole numbers without any decimal point.
Float (float): Floats represent real numbers with a decimal point.
# Example of integers
= 5
x = -10
y = 0
z
print(x, y, z) # Output: 5 -10 0
# Example of floats
= 3.14
a = -0.5
b = 2.0
c
print(a, b, c) # Output: 3.14 -0.5 2.0
Strings
String (str): Strings are sequences of characters, enclosed within single quotes (’ ’) or double quotes (” “).
# Example of strings
= "Alice"
name = 'Hello, world!'
message
print(name) # Output: Alice
print(message) # Output: Hello, world!
Boolean
Boolean (bool): Booleans represent truth values, either True or False.
# Example of booleans
= True
is_valid = False
is_greater
print(is_valid) # Output: True
print(is_greater) # Output: False
Lists
List: Lists are ordered collections of items, which can be of any data type. Lists are mutable, meaning they can be changed after creation.
# Example of lists
= [1, 2, 3, 4, 5]
numbers = ['apple', 'banana', 'orange']
fruits
print(numbers) # Output: [1, 2, 3, 4, 5]
print(fruits) # Output: ['apple', 'banana', 'orange']
Tuples
Tuple: Tuples are similar to lists, but they are immutable once created.
# Example of tuples
= (10, 20)
point = ('red', 'green', 'blue')
colors
print(point) # Output: (10, 20)
print(colors) # Output: ('red', 'green', 'blue')
Dictionaries
Dictionary (dict): Dictionaries are unordered collections of key-value pairs. Each key must be unique.
# Example of dictionaries
= {'name': 'Alice', 'age': 30, 'city': 'New York'}
person = {'math': 90, 'science': 85, 'history': 88}
grades
print(person) # Output: {'name': 'Alice', 'age': 30, 'city': 'New York'}
print(grades) # Output: {'math': 90, 'science': 85, 'history': 88}
Sets
Set: Sets are unordered collections of unique elements. They are useful for mathematical operations like union, intersection, etc.
# Example of sets
= {1, 2, 3, 4, 5}
set1 = {4, 5, 6, 7, 8}
set2
print(set1) # Output: {1, 2, 3, 4, 5}
print(set2) # Output: {4, 5, 6, 7, 8}
Common Geospatial Packages
Python has both base and extended functionality where the latter is provided by external packages (i.e., modules) that can be installed, and then imported into our workspace.
GeoPandas: GeoPandas extends the Pandas library to work with geometric data types, allowing for easy manipulation and analysis of spatial datasets.
Shapely: Shapely is a library for geometric operations and manipulations of geometries (points, lines, and polygons). It is often used in conjunction with GeoPandas.
Fiona: Fiona is a Python wrapper around the OGR library, providing an interface for reading and writing spatial data formats (e.g., Shapefile, GeoJSON, etc.).
Pyproj: Pyproj is a Python interface to the PROJ library, allowing for geospatial transformations, conversions between coordinate reference systems, and calculations of distances and areas.
GDAL (Geospatial Data Abstraction Library): GDAL is a powerful library for reading, writing, and processing raster and vector geospatial data formats. It is often used in combination with other Python libraries like Fiona and Rasterio.
Rasterio: Rasterio is a Python library for reading and writing raster geospatial data formats. It provides an intuitive interface for working with raster datasets, including georeferencing and processing.
Python in ArcGIS
arcpy
is the main python module to interact with the ArcGIS ecosystem and will be the focus of our exploration today.
import arcpy
= "c:/base/data.gdb/roads"
roads = "c:/base/data.gdb/roads_Buffer"
output
# Run Buffer using the variables set above and pass the remaining
# parameters in as strings
"distance", "FULL", "ROUND", "NONE") arcpy.Buffer_analysis(roads, output,
Best Practices and Tips:
1. Writing Efficient and Readable arcpy Scripts:
- Use meaningful variable names and comments to enhance code readability.
- Break down complex tasks into smaller, modular functions.
- Optimize arcpy scripts by minimizing unnecessary loops and operations.
2. Managing Memory and Resources:
- Release resources and locks on datasets using
del
statements andarcpy.ClearWorkspaceCache_management()
. - Use context managers (
with
statements) for managing arcpy environments and cursors to ensure proper resource cleanup.
3. Documenting Code and Workflows:
- Document arcpy scripts with clear and concise comments explaining the purpose of each section and important steps.
- Maintain separate documentation files or READMEs detailing script usage, input/output data, and dependencies.
4. Debugging and Troubleshooting arcpy Scripts:
- Use print statements and logging to debug arcpy scripts.
- Handle errors gracefully using try-except blocks to prevent script failures.
- Utilize arcpy’s error handling mechanisms to identify and resolve issues.
5. Version Control and Collaboration:
- Use version control systems (e.g., Git) to track changes and collaborate with team members on arcpy projects.
- Establish coding standards and conventions for consistent arcpy script development within the team.