Dynamic Documents And Word Clouds

From Sean_Carver
Jump to: navigation, search

In Case You Missed Last Class

Because we are almost done with the material on the syllabus, we decided today to branch out with something new: Dynamic Documents with Word Clouds (see below). Please complete the Preliminaries (before class) by Wednesday. It would also be good if some of you would attempt to complete all of the steps below. There will be plenty of new things to do in class and you can help others if there are problems. Email me if you have problems: carver@american.edu

Dynamic Documents

What is a dynamic document? A dynamic document has text and possibly headings and other features, like bold and underline, lists, etc. But more importantly it accesses data (either by loading it from a file or in our case connecting to twitter and asking for it), then processes the data and derives summaries and figures to put in the document.

Dynamic documents are powerful.

Say you have data. You write a paper or report based on that data. Then you collect more data, or fix some problem with the data. If you write the dynamic document properly, when the data changes, all you have to do is press one button and all of the statistics and all of the figures get updated everywhere in the entire document.

If you had to update every statistic and every figure in the document, every time the data changed, this would take a long time and be prone to error. Dynamic documents are the way to go.

Word Clouds

I was inspired by the recent (2016) election to address the following question: what is going on with our culture? What better way to do that than to mine the twitter archive? There are many sophisticated techniques you could learn and it would take a whole career to stay on top of them. I wanted to create a dynamic document, easily customizable, which would graphically display some cultural information from twitter. After poking around with Google, I settled on a word cloud. A word cloud is a graphical display of the frequency of words, in this case, the frequency of words that appear in a random sample of tweets that satisfy certain search criteria. The more frequent the word, the larger it appears. Word clouds can code frequency with color, as well.

What we are going to do

I am going to give you a dynamic document that searches twitter and draws a word cloud, then displays it as part of a document that was actually my lesson plan for part of last class. You will make minor modifications to this file and rerun it. Maybe you will change the word or words we are looking for, or maybe you will change the search criteria---the date range or the geographic location. You will change the text of the document. This should be very easy once you have set up the software. The steps below are for setting up the software. There are a lot of steps but they are all pretty easy.

Preliminaries (before class)

  • If you haven't done so already, download and install the statistical software package "R". Some computers will say that R comes from a source that is untrusted. Don't worry about that. Millions of people have downloaded R with no problems.
  • Now download and install "R studio." Use the free desktop version.
  • Make sure R studio opens. R studio will access R.

Preliminaries (in class)

  • Download the dynamic document: WordClouds.Rmd. All of the filenames, here and below are links to the actual files. Click on them and choose save.
  • You don't need this file, but in case you wanted to run a word cloud from the R console, you could use this file: Wcloud.R.
  • You do, however, need the following file for both the dynamic document and the console program. My_access_template.R.
  • Once you have downloaded My_access_template.R, copy the file to a new file called "my_access.R". Filenames are case sensitive. After downloading the files, you can probably find them in your Downloads directory.
  • You will need to edit my_access.R to put in the "passwords" that twitter gives you. Instructions are below. But first, to get the passwords from twitter, follow this link: http://apps.twitter.com/
  • Log on to twitter. You need a twitter account if you don't have one already.
  • Click on "Create New App"
  • Put in a name for your application (no two people can have the application name).
  • Put in a description of your app: "Draws a word cloud!" is fine.
  • Leave "Callback URL" blank.
  • Agree to the "Developer Agreement."
  • Click on "Create your twitter application"
  • Now you get the home page for your app. Click on the tab at the top that says "Keys and Access Tokens."
  • You need from this page your "Consumer Key," "Consumer Secret", "Access Token", and "Access Token Secret." You may have to tell twitter to generate or regenerate these keys. They will be strings of numbers and letters like "123kewarfadf03242324lkmaerlafdsafsd".
  • Copy and paste these keys into the my_access.R file in the places indicated. Use R studio (or another text editor) to edit my_access.R. To open my_access.R choose File --> Open File, Replace all text between quotation marks with the "passwords" provided by twitter (as indicated in the code), but don't erase the quotation marks. Then save the file.
  • One more thing. You need an image from the web. Download one from images.google.com and name it compassion.jpeg, or use this one at this link: compassion.jpeg. Make sure you save (or rename) it as "compassion.jpeg" with a lower case 'c' (The MediaWiki I use for my website doesn't like file names that start with a lower case letter).
  • Put all files downloaded and created in the same directory: The 'Downloads' directory is fine, but you might want a new directory to keep them separate from other files.
  • Start R Studio, if you haven't already.
  • In the R Studio console type setwd("...") where ... is the directory where your files are. This may or may not be necessary.
  • Load the dynamic document: WordClouds.Rmd. File --> Open File
  • Press the knit button (ball of yarn).
  • This will not work yet (error) because we need to install certain packages and possibly other software. At the top of the lower left pane on R studio there is tab called packages. If you click on it you will get a button called Install. Click the button type in the name of the package you need. I use the following packages to draw the word cloud: twitteR, tm, wordcloud, and RColorBrewer. However installing the packages may give you additional errors with messages saying you need to install other packages or other software. Try that if you want, (inspect the messages), but this may require my help. Note package names are case sensitive: yes, you must type "twitteR".
  • Once all the needed software is installed the "knit" button should bring up a window with the document rendered and displayed. Note, the code may need to be tweaked for different machines (Mac, Windows, Linux). I use Linux. I will almost certainly need to help you with that, but the approach is to ask Google.
  • I know this is complicated, but you only need to do this once.
  • At the top document you can select "Open in Browser," which should open the document in your default browser---mine is Firefox but it might be different for everyone.
  • Most browsers will allow you to print the file to a PDF or to a printer. Do that.
  • The next step is to change the dynamic document, and press knit again (you don't need to save because "knit" also saves). We will work with this next step on Thursday, or Wednesday if there is time.