Last weekend, I attended the very educational and inspiring, fun and interesting Data Driven: Digital Humanities in the Library conference at the College of Charleston. I have a lot of information to digest, and in the next few posts I will write a series of some of my notes, and some implications for my projects at the DC/SLA.
In this post, I begin with my notes from the pre-workshop readings for “From Theory to Action: A Pragmatic Approach to Digital Preservation Strategies and Tools” Workshop at the conference in Charleston, SC, June 20-22 2014.
NDSA Levels of Preservation (an assessment tool for institutions and organizations)
You’ve Got To Walk Before You Can Run (high-level view of the basic requirements to make digital preservation operational)
Walk This Way (detailed steps for implementing DP – introductions to each section were recommended reading)
Library of Congres DPOE (optional)
POWRR website (optional – the group that taught the workshop is POWRR – Lots of good here)
NDSA Levels of Preservation – Where I see the DC/SLA Archives Committee:
- Storage and Geographic location
- Level 0 – still determining where things are, how they have been stored
- File fixity and Data integrity
- What is fixity? (I learned fixity is, for example, running checksums to determine if materials/digital objects have changed or been corrupted over time. Checksums are algorithm-produced unique identifiers that correspond to the contents of a file and are assigned to a specific version of a file or item).
- Information Security
- Level 0-1 – We have determined in policy documents who *should* have read authorization (the general public, in most cases, with some redactions/delays in dissemination for PII and financials)
- The Archives Committee will be the only ones, aside from possibly a Board liaison, to have other authorizations (edit, delete, etc.)
- Level 0 – We will soon be conducting an inventory of content, which will include an investigation into what metadata has been included
- File formats
- Level 0 – We will soon determine what formats have been and should be used
So, clearly, we still have a lot of work to do.
“You’ve got to walk before you can run: first steps for managing born-digital content received on physical media” (OCLC/Ricky Erway, 2012)
- Audience: those who have or are currently acquiring such born-digital materials, but have not yet begun to manage them
- identifying and stabilizing holdings
- Four Essential Principles
- Do no harm (to physical media or content)
- Don’t do anything that unnecessarily precludes future action and use
- Don’t let the first two principles be obstacles to action
- Document what you do!!
- Survey and Inventory Materials in your Current Holdings
- Locate existing holdings
- Gather info about digital media already in collections
- Do collections inventory to locate computer media in any physical form
- Count and describe all identified media (NOT mounting or viewing content on media)
- Gather info from donor files, acquisition records, collections, etc.
- Remove media but retain order by photographing digital media and storing printouts in physical collection
- Alternative: place separator sheets in physical collection
- Assign appropriate inventory # / barcode to each physical piece
- Record location, inventory #, type of physical medium, any identifying info found on labels /media, e.g., Creator, Title, etc.
- Record anything known about hardware, operating system, software; use consistent terms
- Count # of each media type, and indicate max capacity of each media type, max amount of data stored, then calculate overall total for the collection
- Return physical media to suitable storage
- Add summary description of digital media to any existing accession accession record, collection level record, or finding aid
- Prioritize collections for futher treatment, based on:
- value, importance, needs of collection as a whole and level of use (anticipated use) of collection
- whether there is danger of loss of content
- whether appears to be significant digital content not replicated among analog materials
- whether use of digital content that is replicated in analog form would add measurably to users’ ability to analyze or study content
- when just a few files can be represented on a page; whether printouts might suffice
- Repeat these steps every time you receive new media.
Walk This Way (OCLC/Julianna Barrera-Gomez, and Ricky Erway, 2013)
- Draft a workflow before beginning? Revise during execution?
- Existing digital preservation policies may include donor agreements (which can explain what info may be transferred from digital media) and policies on accessioning or de-accessioning records or physical media
- Consult policies (IT?) on software use or server backups
- AIMS project 2012 report about digital collection stewardship provides objectives for informing policy and glossary for non-archivists
- Documenting the project
- What info about the process will be needed in future to understand scope, steps taken, and why?
- provides context to ensure process; forms key part of evidence for provenance; indicates authenticity of material
- manage associated metadata (auto-generated or manually created)
- content management systems: Archon, Archivist’s Toolkit: use to create accession records to link from project’s documentation to other holdings
- Create a physical project directory with folders
- Master Folder (Preservation Copy, Archival Copy Folder) – holds master copies of files
- Working Folder – holds working copies of master files
- Documentation Folder – to hold metadata and other information associated with the project
- Preparing the Workstation (Mandatory) – this may be a problem, unless we find a way around having a physical workstation for preservation work.
- dedicated workstation to connect to source media
- start with a single type of media from a collection to aid efficiency and keeping track of materials, metadata.
- What alternatives to this? Physical space and financial obstacles for DC/SLA
- Use a computer that is regularly scanned for viruses
- consider keeping it non-networked until a connection is needed (e.g., for file transfers, software/virus definition updates)
- DO NOT open files on source media!
- Connect the source media
- Examine media for cracks/breaks/defects
- Consider removing sticky notes or other ephemera (take digital photo first)
- DO NOT attempt to open files yet!
- Transfer Data
- Copy files or create a disk image
- Copy files individually or in groups – practical way for new archivists to get started
- Disk image – more info is captured, easier to ensure authenticity. Makes exact, sector-by-sector bit stream copy of a disk’s contents, retaining original metadata. Make a single file containing an authentic copy of the files and file system structure on a disk.
- Forensic images image everything, including deleted files and unallocated space. Logical copies omit deleted files and unallocated space.
- Copy files or create a disk image
- Check for viruses
- Record the file directory
- Make a copy of the directory tree
- Run Checksums or Hashes
- unique value, based on contents of a file and is generated by specific algorithms (different ones – consistency is important)
- identify whether/when a file has changed
- regularly hashing a file or image you have copied and checking those new hashes against the hashes made at the time of the transfer should be part of your digital curation workflow
- Securing project files
- consolidate documentation
- Prepare for Storage
- arrange for space on a backed-up network server that is secure
- Transfer to a secure location
- additional copies – preservation master copies that must be kept safe from unintentional alteration
- Store or de-accession source media
- if destruction, use a secure method in conjunction with donor agreement and policies
- Validate file types
- determine whether you can open and read the contents of digital files (from the working copies!)
- use working copies
- hex editors – show file properties (byte representation)
- Assess Contet (optional)
- use working copies
- Reviewing files
- only working copies
- Finding duplicate files
- if you delete, you will need to delete from the Master Folder already moved to secure storage
- Dealing with Personally Identifying or Sensitive information
- sensitive information must be kept restricted and secure on workstations, file servers, backup or transfer copies
- Redact or anonymize before making available to users