Creating a Basic Function in Python

Continued adventures in learning Python (started here):

A function gives you a return value based on an argument that you input.  The function len will give you a return value of the number of elements in an object.  The argument you provide is the name of the object.  In this example the object is called “names.”

Screen Shot 2014-05-31 at 10.30.34 AM

You create a function by using the syntax def (define).  This function is called “Mary” (it could be called anything).  The function doubles the argument.

Screen Shot 2014-05-31 at 10.28.44 AM

Digital Activism Research Methods: Using API’s for Social Media Research

I am currently at the International Communication Association (ICA) annual conference in Seattle.  I’ll be posting what I learn.

Facebook Graph API:

Anne Oeldorf-Hirsch,  U of Connecticut, USA

Screen Shot 2014-05-23 at 11.21.56 AM

  1. Create an app (use computer science expertise)
  2. Study participants visit study site and are linked to Facebook after disclosure message and consent about data collection
  3. On Facebook, participants grant permission for study app to access whatever information the researcher wants (example: friend lists, posts, post comments)
  4. You then get a dataset organized by users ID number
  5. You can then use that data as you wish, for example, to populate a survey where the user will explain or describe their Facebook activity.

NameGenWeb + Programming in Comm Research:

Nicole Ellison, Michigan State U, USA

Using  the NameGenWeb app to collect Facebook network data

  • NameGenWeb (image above) is a Facebook app that collects information about a user’s network
  • The app is slow in gathering this information, causing some people to quit the app prematurely
  • Like any other app, the user must give the app permission to access their account data on the Facebook platform

Thoughts on programming and communication research

  • The benefit is that you don’t need to rely on self-report – you have the user data
  • The problem is that it is a skill set many comm researchers don’t have
  • Comm grad students should learn programming
  • Relying on computer science students creates black box problem, they are unlikely to have substantive expertise in the research question
  • You are now at the mercy of the social media company (lack of control over data collection, plug can be pulled completely)

Getting data from Twitter

Deen Goodwin Freelon, American U, USA

Screen Shot 2014-05-23 at 11.19.45 AM

  • Data scraping: automating collection of digital data
  • Pro’s: powerful for speed and convenience, free, start immediately, usually pretty easy, good for class projects
  • Con’s: APIs limit availability, can only retrieve data within limited time windows, requires very high local system reliability (ie, internet outages means data collection shits off)
  • Tools to use: NodeXL is easiest but not available for PC, Deen showed a nice table of options which I’ll link to (here) when he posts it
  • Purchasing data is the best option if you can afford it
  • Twitter data vendors: Gnip, Datasift, Sysomos (give them a time period and keywords and they give you the data)
  • Trusting bought data: If you can’t validate an analysis, don’t use it (ie, identifying language or gender)
  • Data formats: csv, xml, JSON, MySQL (you need to learn how to use them)
  • Audit your data:  They might not have included everything that fit your query parameters (time, key words)
  • Comm needs computational methods and needs a “development core” (for now, apprentice yourself)

Computation and comm research

Jeff Hancock, Cornell, USA

  • Sending grad students to comp sci departments to take classes is not the solution, they come back frustrated, but without skills useful for comm.
  • This is because comm and comp sci have different priorities
  • programming languages: Java  is useless, Python is great
  • NSF is looking for collaboration between social science and computer science

 

ICA Pre-Conference Talk on Activism

Slides for a talk I’ll be giving tomorrow at the International Communication Association Pre-Conference on Qualitative Political Communication Research.

Contention Beyond Social Movements: Activism and its Benefits from Mary Joyce

How to Get Data from Twitter

This tutorial uses Tweepy software to use Python to get data from Twitter.  This tutorial was presented today at the Community Data Science Workshop @UW.  The first tutorial, an introduction to APIs, is here.

1) Get a a Twitter API key & access token

1) Then get a bunch of Python programs for Twitter

Screen Shot 2014-05-03 at 1.35.18 PM

2) Put the authentification information into a Python program

  • Download a text editor like Smultron
  • Open a new file and type your keys and tokens so it looks like this:

Screen Shot 2014-05-03 at 1.28.53 PM

  • Save the file as “twitter_authentication.py” to the file of Twitter programs you just downloaded from GitHub.  (Replace the existing file with this name.)

3) Get tweets from Twitter

  • In your terminal run the command “python twitter1.py”.
  • Tweets should appear.

Screen Shot 2014-05-03 at 1.38.04 PM

  • This means you successfully used the Twitter API in a basic way.
  • This Twitter data looks messy and that is something you need to get used to.

Screen Shot 2014-05-03 at 1.50.20 PM

  • 99% data science is using your brain to figure out what the data you are looking at means.
  • 1% of data science is using statistics to interpret the data (or so says Guy)
  • Then you use additional commands in Python to extract pieces of information (like time zone) that you may want to analyze.

 

 

What’s an API and Why Should I Care as an Activism Researcher?

Notes and reflections from the Community Data Science Workshop, presented by Benjamin Mako Hill and friends today at the University of Washington, Dept. of Communication.

Contents

  • Introduction
  • Using sample APIs
  • Putting data in a format you can use

Introduction

What is an API?

  • Term stands for Application Programming Interface
  • It is a standard (or protocol).  It’s not a piece of software.
  • It’s a way for one program to talk to another program.
  • It’s a way to get data from online platforms about what people are doing on that platform.

Why should I care as an activism researcher?

  • Sometimes people are using those platforms for activism.
  • You can learn something (not everything) about activism activity on the platform by looking at the traces of that activity that individuals leave in the form of tweets, follows, edits, and more.
  • These traces are the data that you access through the API

What can I do with an API?

  • ask for data (almost always asking a URL)
  • get data back (almost always in a file format called JSON)
  • build a dataset of content to study (using the data you got through the API)

Challenges

  • When a platform changes its architecture, the structure of data can also change.
  • Different platforms have different APIs.  Though they have similar features, you will need to learn each platform’s API separately
  • The API structure and documentation will be better for platforms that make money off their API, like Twitter
  • For those that don’t care about how people access their data, the API will not be well-structured
  • However, for platforms whose data is commonly accessed via API (like Twitter), there will be existing Python modules that have been created to make your task easier.

Using Simple APIs

Use Python to go out onto the web, grab some data, and show it to you

  • Use Python, which is a program you probably already have in your computer and can access with your terminal (here’s how)
  • Python example code, which scrapes the HTML code from the website http://www.python.org

Screen Shot 2014-05-03 at 12.24.54 PM

  • In this example, the API is the standard that allows you to pull data (HTML or other) from a website by using the code above.

Put data from the web into a file on your computer

  • This code puts the html code from http://www.python.org into a file on your computer

Screen Shot 2014-05-03 at 12.14.02 PM

  • Now the file is on your computer (you can find it by searching your computer for a file named python.html)
  • Use the command os.chdir to set your directory, which is the place on your computer where the file is placed
  • The default directory is your personal directory on your computer.  For example, mine is called mjoyce.

Use a simple API that involves kittens

Screen Shot 2014-05-03 at 12.26.22 PM

  • And here’s what the file you created looks like on your computer

Screen Shot 2014-05-03 at 12.29.44 PM

  • In this example, the API is the standard that you can get an image of a kitten with certain dimensions by using a URL with the dimensions at the end of that URL

Putting Data in a Format You Can Use

Why should I care about JSON?

  • When you get data from a website using an API, it will most often be in JSON format.

What is JSON?

  • It is a language for structuring data.  It is a format used by programs for programs.
  • Here’s an example of what a JSON file looks like: http://json.org/example.html
  • Here’s a simpler example by Mako: http://mako.cc/cdsw.json
  • It has a very similar data format to Python

Interpreting a JSON file

    • This is an entry about a pet fish with a name, age, and favorite color

Screen Shot 2014-05-03 at 11.41.43 AM

Importing a JSON file into Python

  • This code displays a JSON file from the web in Python

Screen Shot 2014-05-03 at 11.45.35 AM

  • This imports a file for interpreting JSON into Python.

Screen Shot 2014-05-03 at 11.47.15 AM

  • This code names the json file “data” and displays the contents of the file in Python

Screen Shot 2014-05-03 at 11.55.06 AM

Putting data from a JSON file into a spreadsheet

  • This code puts certain data in the JSON file from the web into a .csv spreadsheet file so it is easier to work with.

Screen Shot 2014-05-03 at 12.03.23 PM

  • And this is what the .csv file you created looks like

Screen Shot 2014-05-03 at 12.00.48 PM

  • These are the basics of what you need to do to be a data scientist, either for the study of activism or any other activity carried out online.

TedEd: Terrifying for Grad Students

Grad school is a pyramid scheme.  There are lots of smart students and only a few academic jobs for us when we graduate.  Oh, and then there’s the fucking internet, which promises to to have only a few of the most talented instructors teach all the children on earth.  Which means even fewer academic jobs.  Fuck them!

Actually, TedEd seems awesome.  But grad students like myself are still screwed.

Data Section is Open

I’ve just added a new Data section to this site.  I’ll be sharing data files resulting from my research on global digital activism, all of which have a Creative Commons Attribution-NonCommercial-ShareAlike license.

The first group of files is related to the second version of the Global Digital Activism Data Set, which I created while working at the Digital Activism Research Project, and which I am continuing to work on and improve.

Image: Digital Activism Research Project

Open Methods Section is… Open!

I’ve just added a new Open Methods section to this site.  I’ll be sharing research materials related to my current study on digital activism effectiveness, all of which have a Creative Commons Attribution-NonCommercial-ShareAlike license.

Ch-ch-ch-check it out!

Image: Flickr/Kris Krug

So Many Feels: Occupy Scholarship as an Emotional Endeavor

Even though Occupy emerged and faded almost two years ago, the scholarship of Occupy continues.  Occupy publications in the last year include a paper that explores the values, attitudes, and beliefs of occupiers as they relate to their use of technology and another that looks at changes in participant engagement, interests, and social connectivity on Twitter.  Another paper looks at the implication of the occupations for public space.

In the British Journal of Sociology, the American sociologist Craig Calhoun wishes to evaluate Occupy, but clarifies that he does not wish to critique it too harshly.  “[T]here is no shame in being more moment than movement,” he writes. “It is no denigration of Occupy Wall Street (or the Occupy movement(s) more generally) to say it may not have a future as such.”

Yet if an activist mobilization doesn’t “have a future,” that certainly isn’t a good thing.  Why not say so? Why this hedging (which is not particular to Calhoun)? The great challenge of analyzing activism success is that the people who analyze activism outcomes as intellectuals also care about activism outcomes as emotional beings. 

The danger here is that we as scholars find ourselves looking for effects rather than measuring effects. The former is a result of post-facto analysis, Monday morning quarter-backing.  There is always some outcome to be found which can be construed as successful (“changing the discourse,” for example). The latter method seeks to record what a campaign was initially trying to achieve and be willing to say that it did not succeed in that attempt.

When we as scholars care about changing the world – and most scholars of activism do – there is a great danger of the former type approach, in which emotion clouds empiricism. And there is often too little of the latter approach, in which unsentimental empiricism leads to recognition of a disappointing reality.

As a scholar who wants activists to succeed I know that I run the risk of producing the former type of analysis, and I am trying very hard to produce the latter. This means remembering that activism campaigns are trying to change the state of the world in some way, and the extent to which they do or do not bring about this change must be the primary criteria for evaluating success.

This is the logic being the variables I’m currently creating. I know this will frustrate some people, because I am missing the lesser effects of (failed) campaigns. But I think it is the right way to go. If we want to make activists more effective we need to be rigorous and unsentimental. Calling a failure a success may make us feel better, but it’s unlikely to lead to better activism and real change.

Image: Wikipedia

Why do Trolls Troll?

Now that the Republicans trolling American democracy have recently (and temporarily) been vanquished, it seems as good a time as any to think about the nature of trolling.

Why people become horrible trolls online, taunting and harassing others for spite, joy, or profit?  John Suler of Rider University has some answers.  In a 2004 article entitled “The Online Disinhibition Effect,” he theorizes that people “say and do things in cyberspace that they wouldn’t ordinarily say and do in the face-to-face world” for six reasons, cumulatively called the online disinhibition effect.  I find 1 through 3 to be the most convincing, but here’s the whole list.

    1. dissociative anonymity: “When people have the opportunity to separate their actions on- line from their in-person lifestyle and identity, they feel less vulnerable about self-disclosing and acting out.”
    2. invisibility: “In many online environments, especially those that are text-driven, people cannot see each other. When people visit web sites, message boards, and even some chat rooms, other people may not even know they… This invisibility gives people the courage to go places and do things that they otherwise wouldn’t.”
    3. asynchronicity: “In e-mail and message boards, communication… [p]eople don’t interact with each other in real time…. Not having to cope with someone’s immediate reaction disinhibits people.”
    4. solipsistic introjection: Solipsism is the belief the one’s own existence is the only thing that is real of meaningful.   “Absent face-to-face cues… can alter self-boundaries….  Reading another person’s message might be experienced as a voice within one’s head….The online companion then becomes a character within one’s intrapsychic world…” and not a real person with thoughts and feelings.
    5. dissociative imagination: “Consciously or unconsciously, people may feel that the imaginary characters they ‘created’ exist in a different space… [a] make-believe dimension, separate and apart from the demands and responsibilities of the real world.”
    6. minimization of authority:  This one relates specifically to why people feel comfortable trolling others that have higher offline authority or status, why trolls have neither fear nor respect for these people (celebs, politicians) online. “Authority figures express their status and power in their dress, body language, and in the trappings of their environmental settings. The absence of those cues in the text environments of cyberspace reduces the impact of their authority.”

Suler is careful to argue that trolls are not necessarily jerks offline as well.  The online context, combined with inherent personality traits, brings forth the habits of trolling.

Image: Troll Hunter (film)

Proudly powered by WordPress
Theme: Esquire by Matthew Buchanan.