Data Analysis and Visualization in Python for Ecologists

Lesson Maintainers: April Wright, Tania Allard, Maxim Belkin

Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data.

This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.
These lessons assume no prior knowledge of the skills or tools.

To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.

Prerequisites

This lesson requires a working copy of Python.
To most effectively use these materials, please make sure to install everything before working through this lesson.

For Instructors

If you are teaching this lesson in a workshop, please see the Instructor notes.

[workshop-repo]: [yaml]: http://yaml.org/

Schedule

Setup Download files required for the lesson
00:00 1. Before we start What is Python and why should I learn it?
00:30 2. Short Introduction to Programming in Python What is Python?
Why should I learn Python?
00:30 3. Starting With Data How can I import data in Python?
What is Pandas?
Why should I use Pandas to work with data?
01:30 4. Indexing, Slicing and Subsetting DataFrames in Python How can I access specific data within my data set?
How can Python and Pandas help me to analyse my data?
02:30 5. Data Types and Formats What types of data can be contained in a DataFrame?
Why is the data type important?
03:15 6. Combining DataFrames with Pandas Can I work with data from multiple sources?
How can I combine data from different data sets?
04:00 7. Data Workflows and Automation Can I automate operations in Python?
What are functions and why should I use them?
05:30 8. Making Plots With plotnine How can I visualize data in Python?
What is ‘grammar of graphics’?
07:00 9. Data Ingest and Visualization - Matplotlib and Pandas What other tools can I use to create plots apart from ggplot?
Why should I use Python to create plots?
08:45 10. Accessing SQLite Databases Using Python and Pandas What if my data are stored in an SQL database? Can I manage them with Python?
How can I write data from Python to be used with SQL?
09:30 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.