This tutorial uses Tweepy software to use Python to get data from Twitter. This tutorial was presented today at the Community Data Science Workshop @UW. The first tutorial, an introduction to APIs, is here.
1) Get a a Twitter API key & access token
- First, you need to set up Twitter authentication so Twitter will give you the data.
- Authentication steps are here: https://openhatch.org/wiki/Community_Data_Science_Workshops/Twitter_authentication_setup
1) Then get a bunch of Python programs for Twitter
- Download this zip file from GitHub to your desktop: https://github.com/makoshark/twitter-data-examples/archive/master.zip
- Change your directory in Python so you can access these files directly by using the “cd” command
2) Put the authentification information into a Python program
- Download a text editor like Smultron
- Open a new file and type your keys and tokens so it looks like this:
- Save the file as “twitter_authentication.py” to the file of Twitter programs you just downloaded from GitHub. (Replace the existing file with this name.)
3) Get tweets from Twitter
- In your terminal run the command “python twitter1.py”.
- Tweets should appear.
- This means you successfully used the Twitter API in a basic way.
- This Twitter data looks messy and that is something you need to get used to.
- 99% data science is using your brain to figure out what the data you are looking at means.
- 1% of data science is using statistics to interpret the data (or so says Guy)
- Then you use additional commands in Python to extract pieces of information (like time zone) that you may want to analyze.