Resources mentioned in class on October 29

OK Cupid uses big data to map who’s gay curious in the U.S. and Canada

Edmond Chang’s keynote talk on gamification for THATCamp Boise State

Playing with collections at the Cooper-Hewitt (September 19)

1. Some terms to look up and define for your group, in your own words:

  • metadata
  • API (Hint: You’re probably going to find the PDFs at the bottom of the Wikipedia article more accessible than the Wikipedia entry itself.)
  • Creative Commons
  • public domain
  • open source
  • Omeka
  • Dublin Core

2. Visit the Cooper-Hewitt website.  What do you think the museum’s mission is?  Why would it decide make its collections data digitally accessible to so many people? How does that release of data correspond to its mission?

3. What do Ridge and Murray-John suggest are the biggest hurdles to working with collections data from museums?  Why might these hurdles exist?

4. Micah Walter writes that the visualizations he shares in his blog post “are only possible because we released the collection data as a single dump. If we had, like many museums, only provided an API, this would not have been possible (or at least been much more difficult) to do.”

  • What is the difference between releasing a collection’s metadata as a single file and providing an API to the museum’s collections?
  • What are the challenges to museum staff and end users in each scenario (API and data release in a single file)?
  • Why would a museum opt to use either approach rather than (or in addition to) sharing its collections through a web browser interface it built, as the Powerhouse Museum has done?  (See also the Cooper-Hewitt’s searchable collections database.)
  • What are the advantages and liabilities of each approach?

5. Does what you learned today, through the readings and during class discussion, change how you would approach the big data visualization project your group proposed during the last class?  If so, how and why? If not, why not?

Recent Look at the United States Murder Rate

We compare and analyzed from different websites the geography and rate of violent crimes within the United States.  We divided characteristics of crime rates giving on the United States Census Bureau’s statistics of certain crimes within major minor and rural metropolitan cities.

“Questions like what rates in and outside of metropolitan cities were higher for what crimes?” were asked.

Data, visuals, and visualization (September 17)

Resources for in-class discussion

Map of Salem Village and map of witchcraft accusers and accused

Cholera in London

Napoleon’s invasion of Russia in 1912

On the Origin of Species: The Preservation of Favoured Traces

Animal City

Digitally reconstructing Washington, DC as it appeared circa 1814

Chronozoom project

Name Voyager and Name Mapper


1. How might a humanist approach or use data sets differently than a scientist would?

2. Why might historians want to create visualizations?

3. What are the advantages and liabilities (for historians and their audiences) of transforming data into visualizations?

4. Which of the visualizations in the reading, or at the links above, do you find particularly interesting or persuasive, and why?  Which ones are less interesting or persuasive?

An in-class exercise

1. Find online either (a) sources that you could convert into data or (b) an existing dataset drawn from primary sources.

2. What questions might an historian ask of this data?

3. What methods might the historian use to make sense of this data?

4. What kind(s) of visualization(s) do you think would be most useful to (a) the historian as she conducts her analysis and (b) the audience for her work?

5. Post your responses to these questions, along with a link to the data or dataset you used in your example, to the course blog.  In the category list, check the box for “data experimentation.”  Be sure to include the first names of everyone in your group.


Digital Data Curation (September 12)

1. What is a collection? What is an aggregation?

2. What is the difference among the practices of preserving, curating, and aggregating data?  What are the challenges of each of these practices?

3. What are some of the tensions present in data curation?  (For example, “openness and access vs. intellectual property rights”)  How are these tensions being addressed, and how, if at all, might they be resolved?

4. How would you go about digitally preserving or documenting your family history? How much would you share, and where/how would you share it? How would you determine what to keep private, if anything? How would you organize the information? How much would you curate the collection?  (For example, would you just put up a searchable database of the digital objects, or would you create finding aids, set up browsable categories, or write an essay that provides an overview of the collection?) Would you make it easy for other people to contribute to the collection, or would it be a closed collection (limited to your own objects)?  How would you determine what objects, people, or topics belonged in the collection and what did not?

5. Knowing that archivists’ time and preservation resources aren’t unlimited, what criteria should the Library of Congress use to determine which election-related or end-of-presidential-term websites should be preserved? In what form should those websites be preserved, and through what interface should they be made findable by researchers and others?

Resources for in-class discussion of big data (Sept. 10)

Questions for discussion

1. Why does Croll claim big data is a civil rights issue?  How has the technology shifted to make it a civil rights issue?  How is Croll’s claim relevant to historians using big data in their research?

2. In “Big Data On Campus Is Like A Keg Stand For Your Brain,” Sinclair writes that he wants to develop digital tools that guides the reader in asking questions about the data.  What are the advantages and liabilities to such a tool in the humanities?

3. Reading between the lines of the articles we’ve read so far for this course (but looks especially at the one by Gibbs and Owen), what are the methods, forms, and values of the traditional humanities?  What are the challenges in merging the methods, forms, and values of digital practice with the traditional humanities?

4. What do Gibbs and Owen mean when they write that “rigorous mathematics is not necessarily essential for using data efficiently and effectively.  In particular, work with data can be exploratory and deliberately without the mathematical rigor that social scientists must use to support their epistemological claims”?  How might humanists’ engagement with big data differ from social scientists’?


Google Ngram Viewer

Text analyzer