I not-so-recently “went to” the third Introduction to Digital Preservation webinar hosted by ASERL (Association of Southeastern Research Libraries).
[To listen to the recordings and view the power point presentations, see ASERL’s archive]
This webinar was titled: “Management of Incoming Born-Digital Special Collections. Presented by Gretchen Gueguen, of the University of Virginia.
Without further ado, my notes:
What is born-digital?
There are two layers:
- Supporting software and operating system(s) (OS)
The same software/OS can be used for multiple files.
The Crucial Dependency
- Hardware. Including (but not limited to) ports, wires, ribbons, drives, connectors
- Translation between older and newer hardware can be achieved by write-blockers
Imagine a doughnut.
“Preserve” is positioned in the doughnut hole, smack dab in the middle. Around the edges of the donut (starting on the left and moving clockwise, if you’re curious) reside:
- “Provide Access”
Including old collections and new collections, legacy material (that has already been collected).
Do you further process these legacy collections, or deaccession them?
Appraisal Phase 1: Inventory
(the following list contains information/data you may want to collect in the inventory phase)
- Disk #
- ID #
- Collection name/title
- Record # (MARC or EAD)
- Media type
- Date (from label info)
- Label info
The above information can be used in cataloging, and can help future identification/location of items in the collection.
It may be necessary to:
- Research accession records
- Search the stacks
- Conduct a physical survey of a statistically significant sample of disks
Appraisal Phase 2: Evaluate
- Available resources (work, costs)
- File types and formats present in materials
- Volume of data vs. capacity to take it
- Condition of content: changed? corrupted?
- Dependencies on software, hardware
- Institution’s commitment to the content
- Migration or transformation required?
- Can you appraise/view the intellectual content?
- Policy framework (update it frequently and proactively)
- What capacity do you have for acquiring new born-digital collections?
- How will you deal with certain scenarios?
- Do you need special hardware to read the content?
- Do you have that hardware?
- Does someone else? (e.g., eBay) : Given the scarcity of obsolete hardware, there is a growing interest in sharing equipment
- Is the disk/drive natively Read Only?
- Others (e.g., floppies) are difficult
- Write blockers/forensic bridges: Hardware devices or software that block any writing onto disks (e.g., Tableau, Wiebe Tech; SAFE Block XP, MacForensicLab)
How to transfer the data to a new medium?
1. Disk imaging – one file, bit-level copy
- Captures unused space, sometimes called “file slack,” made up of binary zeros, can take up a lot of space
- Benefits: compact, single file, intact, complete
- Drawbacks: can capture unwanted data, requires specialized tech, can transfer across write-blocker if file is still readable
2. Logical imaging – select what you want and create an image
Transfer using (examples)
- NTFS (New Tech File System)
- MAC : HFS (Hierarchical File System)
- Transfer methods: over a network, using Duke Data Accessioner, or Bagit, or FTP transfer tool such as FileZilla, CyberDuck (how about these names?).
- Web harvesting (e.g., Internet Archive)
- Save to modern media (CD, external hard drive)
- Image the hard drive in person
Is the file corrupted, lost, or changed?
- Checksums. If these haven’t changed, the file hasn’t changed.
- Check for viruses (stabilizing material): Do this in an un-networked space BEFORE uploading the files to a network!
- Search for Personally Identifiable Information (PII)
- Search for duplicate files using checksums.
- Use media inventory
- File inventory of contents (e.g., date, size, file name, type)
- Extract technical, forensic, and preservation metadata (using PREMIS, PBCore, for examples)
- Use a spreadsheet if you don’t have fancy infrastructure to record this information
- Make multiple copies! (Lots of Copies Keep Stuff Safe, heh heh)
- Use repositories or a managed service system for metadata and storage
- If you don’t have one, how will you store and track content? (Spreadsheet and storage database)
Questions (a selection)
Q: Do you have any rules of thumb for materials NOT to accession?
A: The folks at the University of Virginia have not seen anything that they have decided not to take – nothing too unusual. Make sure you have access to the hardware to read the data/content. For some formats UVA doesn’t actually have, they obtained copies of the software from the donor.
Q: Do you manage the bit-stream or physical for commercially produced materials such as DVDs related to other materials?
A: Only physical management at the moment.
Q: Does UVA’s gift agreement contain language for digital preservation?
A: The agreement does state that the donor will agree not to offer the same content to other sources or institutions. It provides information about intellectual property rights. UVA reserves the right to do whatever is needed to preserve the content. Allows donors to ask for access restrictions. It does not contain any statement to the effect that UVA agrees to preserve content via a particular material or for a specific time.
Appraisal and accession are CRUCIAL.
Metadata is important – use checksums, spreadsheets.
Consider consortia – have someone else read the disks you can’t, and vice versa.