In the News

Life happened the past few weeks, but I’m looking forward to getting back into the swing of things here on Cultural Heritage and Information. Today I thought I’d post some interesting stories from around the web that relate to topics here on the blog.

HathiTrust Digital Library Wins Latest Round in Battle With Authors

With new publishing technologies and research practices, the copyright debates will continue to evolve in legal and other settings. The Chronicle of Higher Education posted a summary (by Jennifer Howard) on June 10th about the latest developments in the HathiTrust Digital Library vs. Authors Guild case. Ultimately, the U.S. Court of Appeals for the Second Circuit (in New York) decided for the library. Its decision will allow a searchable, full-text database of the Library’s works under the “fair use” clause, and will also allow dissemination of works in different formats for vision-impaired users.

What happens When Preservation and Innovation Collide?

The National Trust for Historic Preservation reflects on two years of innovation strategy development with EmcArts’ Innovation Lab for Museums. In the post (by Estevan Rael-Galvez), they share their ideas, challenges, and successes. Most interesting, in my opinion, is their idea to transition traditional historic house museums (which I adore), from static contrived experiences to more integrated, immersive experiences that stimulate all the senses of visitors.

Bit Rot: The Limits of Conservation discussed (in a post by Martha Buskirk on June 9) how time affects access and preservation of electronic media. The article supports “lots of copies keep stuff safe” as a general strategy to work toward in preservation and conservation of cultural and art artifacts. It also describes common obstacles, such as getting artists’ input on migration to new technologies, obsolescence of older technologies, copyright issues, determining in what aspects of the works the value lies, and the consequences of benign neglect. Best practice? Awareness and vigilance about what we want to save, and what has value to us.

How difficult can your manuscripts be?

The National Conservation Service in the UK blogged about some challenges that crop up when digitizing manuscripts. Some issues they faced during the digitization process for Khojki manuscripts from the Institute of Ismaili Studies include illegible text located in awkward places (e.g., the gutters), curved and warped pages, and ink degradation.

World Cups

Just for fun, I’m sharing the Horniman Museum and Gardens‘ World Cup tie-in, about a digital exhibit they created on cups from around the world (“world cups”… get it?). Cups from locations such as Burma, China, Japan, Indonesia, and Colombia feature in the exhibit.


Fighting, err… working with SharePoint

For the past several months I have been creating, adding, and quality controlling metadata in a SharePoint 2010 document library.* I thought those months had given me some crucial SharePoint skills, and that SharePoint had become relatively intuitive.

There are four kinds of people in the world, according to an Arab proverb (according to Bartleby):

… those who don’t know that they don’t know; those who know that they don’t know; those who don’t know that they know; and those who know that they know.**

Reflecting on my experiences this past week (read on, I get to that shortly), I used to be the first kind of people, and now I feel pretty confident that I’m in the second category.

Continue reading

A call for public accountability & transparency of national surveillance policies

I want to share the link to an article by ALA (the American Library Association, for non-librarians) about the recent news about government surveillance policies and practices, such as PRISM.

This is a huge issue for me. That the government has collected such vast amounts of personal information about individuals’ online activities and communication, secretly, worries me a great deal. Secret surveillance and collection practices, “warrants” which are approved by a secret court, and which encompass vast amounts of “incidental” personal information, have terrible potential for the future of our democracy and for the authority of the Constitution, which should protect against such wholesale surveillance practices. I am not convinced that the claimed congressional oversight is stringent enough to protect citizens who are being surveilled without their knowledge. Without public accountability and transparency, I will not be reassured by the government’s claims that these laws are upheld with integrity, and that the warrants are executed justly and within the bounds of the Constitution. Continue reading

Water usage/shortage: an infographic


Do you know how much water you use every day? How much can you save?



I’m going to get started saving more water today.

Digital preservation in a webinar, part four

Finally! The recap of the fourth Introduction to Digital Preservation webinar, hosted by ASERL.

[To listen to the recordings and view the power point presentations, see ASERL’s archive]

The title of this webinar: “Using FITS to Identify File Formats and Extract Metadata.” It was presented by Andrea Goethals, of Harvard University.

The highlights:

What is FITS?

  • “File Information Tool Set”

Some complications

  • Format specifications often have different versions.
  • Specifications are things that file formats conform to.
  • Authoritative specification information does not always exist for files. Sometimes it can be unclear, complex or long, it can reference other file formats, and can depend on other specifications.

Further complications for tool builders and users

  • OpenDoc formats are packaged as ZIP files, which information is not sufficient for preservation.
  • Many formats (e.g., XML) are text formats.
  • Some formats lack obvious identifying features.


  • File formats can be difficult to accurately identify.
  • Some are more specific than others (inconsistent).

How does FITS help?

  • Combines the functionality of different file format identification tools.

Why build FITS?

  • The motivation at Harvard was to offset the risk of accepting any format (including web archives, email attachments, donated external hard drives).
  • Additionally, to integrate into existing preservation workflows.
  • Strategy: to develop a tool manager instead of a tool, and to account for tool inaccuracy: to check tools against each other, and to verify results.

What is required?

  • XML, for tools without a graphics interface tool

What does FITS do?

  • Identifies many file formats
  • Validates a few file formats
  • Extracts metadata
  • Calculates basic file information
  • Outputs technical metadata
  • Identifies problem files (e.g., conflicting opinions on format, metadata values; unidentifiable formats)

The Process

  • FITS translates tool output to a common XML file type, consolidates them into one FITS XML format, and then translates the FITS XML file to standard XML.
  • You can store the FITS XML files wherever you store metadata in the repository.
  • The file is not modified during the process.

Normalization (translation)

  • The key to using multiple tools
  • Assists with tools that provide different names for the same format
  • Assists with tools that provide different values for the same metadata
  • Assists with tools that provide different ways of saying when they can’t identify the format of a file

[Then we watched nifty demonstrations in Windows as Andrea Goethals took us through what FITS does and how it does it. I discovered I can read basic XML.]

At Harvard University Libraries

  • They store metadata in XML form in a metaschema
  • Output is parsed and packaged
  • Some of FITS data fits well into PREMIS
  • Standard metadata block is added
  • Other information is included with administrative metadata

Questions & Answers

Q: Are there plans to integrate FITS into large systems/repositories?

A: ArchiveMata uses it. DuraCloud looked into it, but it is mostly used in individual repositories.

Q: Do you need to have the individual File Format Identification tools loaded locally?

A: All necessary tools are downloaded with FITS

Q: When FITS notes conflicts between tools’ results, how do you know which one is right?

A: Conflicts often occur in relatively unused formats. There is an XML file included that can be used to educate oneself, to determine if it is really a more specific version of a broader format. (It provides a format tree).

Where to find FITS

  • Download the link
  • OSS
  • The mailing list is good for new versions, other news.

My Take-aways

This whole series has been incredibly informative. Having listened to these experts talk about common/important tools that they use for digital preservation, I now have a better idea of not only the processes involved in digital preservation, but also how the different pieces fit together. The project planning information was pretty straightforward, and generally, not very different from many projects I’ve worked on in the past or learned about in library school.

Now that I know that some FITS information fits well into PREMIS, and that other information from FITS fits into administrative metadata sections, and that XML can carry them all, I have a better idea of how to use the metadata categories described in the second webinar.

I know which kinds tools are meant to be used for various tasks in digital preservation projects, and I know what I need to learn (and what I don’t) in order to use them. I can point to FITS and PREMIS and say that they may be used in the implementation stage.

Lastly, I know so much more about where to go to find out more about the tools, processes, best practices, and current projects.