Notes and reflections from the Community Data Science Workshop, presented by Benjamin Mako Hill and friends today at the University of Washington, Dept. of Communication.
Contents
- Introduction
- Using sample APIs
- Putting data in a format you can use
Introduction
What is an API?
- Term stands for Application Programming Interface
- It is a standard (or protocol). Â It’s not a piece of software.
- It’s a way for one program to talk to another program.
- It’s a way to get data from online platforms about what people are doing on that platform.
Why should I care as an activism researcher?
- Sometimes people are using those platforms for activism.
- You can learn something (not everything) about activism activity on the platform by looking at the traces of that activity that individuals leave in the form of tweets, follows, edits, and more.
- These traces are the data that you access through the API
What can I do with an API?
- ask for data (almost always asking a URL)
- get data back (almost always in a file format called JSON)
- build a dataset of content to study (using the data you got through the API)
Challenges
- When a platform changes its architecture, the structure of data can also change.
- Different platforms have different APIs. Â Though they have similar features, you will need to learn each platform’s API separately
- The API structure and documentation will be better for platforms that make money off their API, like Twitter
- For those that don’t care about how people access their data, the API will not be well-structured
- However, for platforms whose data is commonly accessed via API (like Twitter), there will be existing Python modules that have been created to make your task easier.
Using Simple APIs
Use Python to go out onto the web, grab some data, and show it to you
- Use Python, which is a program you probably already have in your computer and can access with your terminal (here’s how)
- Python example code, which scrapes the HTML code from the website http://www.python.org
- In this example, the API is the standard that allows you to pull data (HTML or other) from a website by using the code above.
Put data from the web into a file on your computer
- This code puts the html code from http://www.python.org into a file on your computer
- Now the file is on your computer (you can find it by searching your computer for a file named python.html)
- Use the command os.chdir to set your directory, which is the place on your computer where the file is placed
- The default directory is your personal directory on your computer. Â For example, mine is called mjoyce.
Use a simple API that involves kittens
- Go to placekitten.com, a place to get images of kittens of particular dimensions for use as placeholders in web design
- The dimensions of the pic are in the URL, such as http://placekitten.com/200/200 and http://placekitten.com/400/200
- Here’s code to pull a 300×300 pixel cat image from placekitten.com and put it in a file on your computer called “kitten.jpg”
- And here’s what the file you created looks like on your computer
- In this example, the API is the standard that you can get an image of a kitten with certain dimensions by using a URL with the dimensions at the end of that URL
Putting Data in a Format You Can Use
Why should I care about JSON?
- When you get data from a website using an API, it will most often be in JSON format.
What is JSON?
- It is a language for structuring data. Â It is a format used by programs for programs.
- Here’s an example of what a JSON file looks like:Â http://json.org/example.html
- Here’s a simpler example by Mako:Â http://mako.cc/cdsw.json
- It has a very similar data format to Python
Interpreting a JSON file
- This is an entry about a pet fish with a name, age, and favorite color
Importing a JSON file into Python
- This code displays a JSON file from the web in Python
- This imports a file for interpreting JSON into Python.
- This code names the json file “data” and displays the contents of the file in Python
Putting data from a JSON file into a spreadsheet
- This code puts certain data in the JSON file from the web into a .csv spreadsheet file so it is easier to work with.
- And this is what the .csv file you created looks like
- These are the basics of what you need to do to be a data scientist, either for the study of activism or any other activity carried out online.
Pingback: How to Get Data from Twitter | meta-activism