Digital preservation in a webinar, part three

I not-so-recently “went to” the third Introduction to Digital Preservation webinar hosted by ASERL (Association of Southeastern Research Libraries).

[To listen to the recordings and view the power point presentations, see ASERL’s archive]

This webinar was titled: “Management of Incoming Born-Digital Special Collections. Presented by Gretchen Gueguen, of the University of Virginia.

Without further ado, my notes:

What is born-digital?

There are two layers:

  1. Content
  2. Supporting software and operating system(s) (OS)

The same software/OS can be used for multiple files.

The Crucial Dependency

  • Hardware. Including (but not limited to) ports, wires, ribbons, drives, connectors
  • Translation between older and newer hardware can be achieved by write-blockers

The process

Imagine a doughnut.

“Preserve” is positioned in the doughnut hole, smack dab in the middle. Around the edges of the donut (starting on the left and moving clockwise, if you’re curious) reside:

  • “Provide Access”
  • “Appraise”
  • “Accession”
  • “Arrange/Describe”

Appraisal

Including old collections and new collections, legacy material (that has already been collected).

Do you further process these legacy collections, or deaccession them?

Appraisal Phase 1: Inventory

(the following list contains information/data you may want to collect in the inventory phase)

  • Disk #
  • ID #
  • Collection name/title
  • Record # (MARC or EAD)
  • Media type
  • Manufacturer
  • Capacity
  • Date (from label info)
  • Color
  • Damage
  • Label info

The above information can be used in cataloging, and can help future identification/location of items in the collection.

It may be necessary to:

  • Research accession records
  • Search the stacks
  • Conduct a physical survey of a statistically significant sample of disks

Appraisal Phase 2: Evaluate

Legacy collections

  • Available resources (work, costs)
  • File types and formats present in materials
  • Volume of data vs. capacity to take it
  • Condition of content: changed? corrupted?
  • Dependencies on software, hardware
  • Institution’s commitment to the content
  • Migration or transformation required?
  • Can you appraise/view the intellectual content?

New acquisitions

  • Policy framework (update it frequently and proactively)
  • What capacity do you have for acquiring new born-digital collections?
  • How will you deal with certain scenarios?
  • Do you need special hardware to read the content?
  • Do you have that hardware?
  • Does someone else? (e.g., eBay) : Given the scarcity of obsolete hardware, there is a growing interest in sharing equipment
  • Is the disk/drive natively Read Only?

Accessioning

Hardware types

  • Zip
  • DVD/BluRay
  • JAZ
  • Others (e.g., floppies) are difficult
  • Write blockers/forensic bridges: Hardware devices or software that block any writing onto disks (e.g., Tableau, Wiebe Tech; SAFE Block XP, MacForensicLab)

Software barriers

How to transfer the data to a new medium?

1. Disk imaging – one file, bit-level copy

  • Captures unused space, sometimes called “file slack,” made up of binary zeros, can take up a lot of space
  • Benefits: compact, single file, intact, complete
  • Drawbacks: can capture unwanted data, requires specialized tech, can transfer across write-blocker if file is still readable

2. Logical imaging – select what you want and create an image

Transfer using (examples)

  • NTFS (New Tech File System)
  • MAC : HFS (Hierarchical File System)

Rendering files

  • Transfer methods: over a network, using Duke Data Accessioner, or Bagit, or FTP transfer tool such as FileZilla, CyberDuck (how about these names?).
  • Web harvesting (e.g., Internet Archive)
  • Save to modern media (CD, external hard drive)
  • Image the hard drive in person

Management

Is the file corrupted, lost, or changed?

  • Checksums. If these haven’t changed, the file hasn’t changed.
  • Check for viruses (stabilizing material): Do this in an un-networked space BEFORE uploading the files to a network!
  • Search for Personally Identifiable Information (PII)
  • Search for duplicate files using checksums.

Arrange/Describe

Metadata

  • Use media inventory
  • File inventory of contents (e.g., date, size, file name, type)
  • Extract technical, forensic, and preservation metadata (using PREMIS, PBCore, for examples)
  • Use a spreadsheet if you don’t have fancy infrastructure to record this information

Storage

  • Make multiple copies! (Lots of Copies Keep Stuff Safe, heh heh)
  • Use repositories or a managed service system for metadata and storage
  • If you don’t have one, how will you store and track content? (Spreadsheet and storage database)

Questions (a selection)

Q: Do you have any rules of thumb for materials NOT to accession?

A: The folks at the University of Virginia have not seen anything that they have decided not to take – nothing too unusual. Make sure you have access to the hardware to read the data/content. For some formats UVA doesn’t actually have, they obtained copies of the software from the donor.

Q: Do you manage the bit-stream or physical for commercially produced materials such as DVDs related to other materials?

A: Only physical management at the moment.

Q: Does UVA’s gift agreement contain language for digital preservation?

A: The agreement does state that the donor will agree not to offer the same content to other sources or institutions. It provides information about intellectual property rights. UVA reserves the right to do whatever is needed to preserve the content. Allows donors to ask for access restrictions. It does not contain any statement to the effect that UVA agrees to preserve content via a particular material or for a specific time.

Final words

Appraisal and accession are CRUCIAL.

Metadata is important – use checksums, spreadsheets.

Consider consortia – have someone else read the disks you can’t, and vice versa.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s