'Reusing digital content: towards making research using this content limited by what is possible rather than what is permissible', Mining Digital Repositories: Challenges and Horizons event at the KB, The Hague, 11 April 2014

Notes from a short talk I gave at the Mining Digital Repositories: Challenges and Horizons event at the KB, The Hague, 11 April 2014

The following text represents my notes rather than precisely what was said on the day and should be taken in that spirit.

Slides: http://slidesha.re/R71gBz Notes: https://gist.github.com/drjwbaker/10422453


Intro

Background of team, multi-disciplinary team with broad skill set S sense of importance of open S ethos of more than resource discovery, think of digital research with respect to societal change S deluge of data et al, too much to read Sx2 category of research we support S new contexts for scholarship in the humanities and social sciences. S changing libraries: places full of different things (web content), performing different roles (offering open linked data, running digital research labs S Labs competition open…), driving change (teams of humanities researchers, practitioners and scholars).


Re-use scenarios

S Microsoft Live Books Search (2006-2008) … 68k volumes from the BL digitised … when project wound up MS gave content to us and we dedicated it into the Public Domain for unrestricted use and reuse … access solution through catalogue. PDFs of books.

Last year we, we here Digital Research Team and Andrew Mellon Funded British Library Labs team, started investigating better access to this content, access that would encourage reuse. We began by wondering what else is in there apart from text? … found 1 million images in the OCR … @MechCurBot story - from playing (faces), to idea (exhibiting), to Tumblr (serendipitous publication) … http://mechanicalcurator.tumblr.com/

S Flickr story … Improve discovery and research through distributed metadata generation (interesting side issue: what does this method of metadata collection mean for researchers?) http://www.flickr.com/photos/britishlibrary Since we released the collection in early December: over 132 million image views, over 81,000 tags. Built sets that refresh daily of the least seen and least tagged images: every image has now been seen, only around 70,000 images have had fewer than 5 views, and no more than 50k images have no human generated tags.

All this has encouraged reuse.

Automated reuse of the data:

S Wikimedia

S Digital NZ

Manual, creative reuse of the content:

S Nicola Demonte has been using the images in his art history - ‘Memory Lane’ - classes for students with memory loss, brain injuries and Alzheimer’s today at Summerwood of Chanahssen, a senior living institution in Minnesota.

S Michael Hancher, University of Minnesota (‘Doing things with a million British Library book illustrations’ http://blog.lib.umn.edu/mh/dh2/2014/01/doing-things-with-a-million-british-library-book-illustrations.html and exercise at https://docs.google.com/file/d/0B08KKnzlfYMNRmVuMjl1SEFSdjA/edit ) Getting students to sample the book illustrations and assess the experience of coming up with informative tags for several dozen images. - For each image, he has asked them to be prepared to discuss the appropriateness of the tags and the questions that they raise. Such as, for this image: - Do the buildings include churches? - Seated man or seated woman? An adult, not a child? Evidence for that? - What kind of hat? - Is that really a sketch pad? - What does the signature say? - Does the text in this book relate in a significant way to this picture?

S Secret santa map…

S Comics and memes…


Architecture

Mention of the text that surrounds the images brings me to architecture and to the evolving role of libraries. The MS books were low-hanging fruit, a consolidated collection unrestricted by copyright or licensing conditions in an otherwise fragmented and complex landscape. We used Tumblr and Flickr as means of enabling reuse because both are deployed architectures we could exploit with minimal engineering, because they have APIs for machine readable access, and because both present visual material relatively well to humans … (to stress, were it needed, we are not wedded to Tumblr or Flickr - or for that matter Yahoo services - as platforms for our content.)

As we explore a vision for architecture and infrastructure around our digital collections, we bring this desire to exploit off the shelf technologies with us (we don’t want to continually reinvent the wheel.) And we have chosen this approach in response to a set of problems we have identified:

  • That infrastructures are restrictive and proscriptive.
  • That assets are distributed unevenly across organisations and systems.
  • That access restrictions unpredictably limit where, how and who can use items.

Alongside these problems we are experiencing changing demand from researchers with respect to digital content:

  • For scalable access to large quantities of digital content; be that text, images, sound, video, data.
  • For the ability to bring their own tools, work in whatever way they want, use any workflow, address any sort of problem.
  • For the ability to work across collections irrespective of content owner or licence terms.

Deployed technologies and services are available to create interoperable infrastructures and virtual research environments that would address these problems and meet these needs. And outside of the humanities, they are being used; at the European Bioinformatics Institute, at CERN.

We see this infrastructure vision as a priority because, as suggested independently but almost simultaneously a few months back by Andrew Prescott (DH KCL … and in the audience) and Bob Nicholson (a periodicals scholar at Edge Hill in the UK), the present situation is that research that uses digital assets is limited by what is permissible as much as what is possible.

This isn’t good enough.

For research using digital content at scale to thrive and flourish in the humanities, we need to somehow bridge the gap between an notion (though it of course is an illusion) that anything is possible with traditional, hand-crafted, ‘non-digital’ humanities research approaches, and a digital research landscape shaped by unevenly distributed and restricted digital content; a landscape that restricts creative, novel and unexpected reuse of the digital assets we have invested substantial funds, time and effort to create; digital assets that have the potential to change the understanding of and engagement with past experience, our shared heritage; the stuff heritage institutions have spent so much time, effort and money trying to capture.

S [slides and notes]


Some admin…

This work is licensed under a Creative Commons Attribution 3.0 Unported License. Creative Commons License

Outreach and learning communities at British Library Digital Research

Outreach and learning communities at British Library Digital Research: what we’ve done and what we can do for you and your students

Notes from a talk I gave at ‘Digital Literacies: Building Learning Communities in the Humanities’, HEA event at Liverpool John Moores, 2 April 2014

http://www.heacademy.ac.uk/events/detail/2014/Seminars/AH/GEN913_LJMU

The following text represents my notes rather than precisely what was said on the day and should be taken in that spirit.

Slides: http://www.slideshare.net/drjwbaker/2014-0331-ljmdigilitslides


S License.


Intro

S Background of team, more than resource discovery, situate turn toward digital research within a response to external forces S deluge of data et al, libraries increasingly full of data as much as books S category of research we support, S new contexts for scholarship in HSS.


Outreach - PGs

S Outreach: focus thus far on PGs, ECRs … part of the future of humanities and social science, offering support to those who may not find expertise they need re digital research among supervisory group, or senior faculty.

Doctoral Open Days. Role across the programme - embedded element, talking about doing things with our collections. Our own day on Digital Research advertised to students across HSS.

Usual elements: talks that set the scene, meet the curators.

But central to the day was a prototyping task. For this each group had a flip-chart, some pens, and some cards. The cards represent hypothetical tools for digital research and hypothetical digital collections – though in both cases they resemble real things. On each card we specified the sorts of properties these tools and collections have (and some deliberately chosen pitfalls), and what each group was tasked with doing was to look at the cards and come up with a potential research project that might be possible (they had about a hour plus a lunch break for this). They then had two minute pitch their projects, followed by some Q&A, and a discussion of what might be needed to fill the gap between idea and reality (so training needs, tool development, team work, conceptual apparatus)

In setting it up this way, without computers, without data, I’m very much inspired by the great social historian Emmanuel Le Roy Ladurie who wrote in 1973 that for researchers:

What counts is not the machine but the problem. The machine is only interesting insofar as it allows one to tackle new questions, content and especially scale

So we urged the students to proceed with ideas of the novelty of questions, content and scale, as opposed to technology in and for itself. A line of thinking that came out, to some extent, of discussions had during a session I facilitated at the Digital Pedagogies THATCamp the HEA and UCL-DH put on in June 2013 http://digitalpedagogies2013.thatcamp.org/

Resources available to download on the BL Digital Scholarship Blog http://britishlibrary.typepad.co.uk/digital-scholarship/2014/01/prototyping-task-for-digital-research-novices.html

S Novelty also at the heart of BL Labs, a Mellon Funded project that encourages researcher of any kind, with perhaps a leaning towards postgraduates and early-career researchers, as well as software developers and folks from the GLAM sectors in its broadest sense, to experiment with our digital collections through competitions, hack events, workshops. Current competition, with the offer of a residency and a cash-prize, open until 22 April (nb: proposals around research/outreach ideas needing collaboration with us for the skills, technical expertise we have are most welcome)

Example of a previous winner (Pieter Francois):

IMAGE ALT TEXT HERE


Outreach - more general

S BUT, of course work of the team, as Labs suggests, does go beyond PG communities.

In particular:

S Microsoft Live Books Search (2006-2008) … 68k volumes from the BL digitised … when packed in gave content to us and we dedicated it into the PD … access solution through catalogue. PDFs of books.

Last year we, we here Digital Research Team and Mellon Funded BL Labs team, started investigating better modes of access to this content. We began wondering what else is in there apart from next? 1 million images in the OCR … MechCurBot story - from playing (faces), to idea (exhibiting), to Tumblr (serendipitous publication). http://mechanicalcurator.tumblr.com/

S Flickr story. Publishing for enhancement, reuse, discovery, research. http://www.flickr.com/photos/britishlibrary

  • Not a project, rather we were working as ‘skunks in the library’, as Beth Nowviskie, Director of the University of Virginia’s Scholars’ Lab has called such research and research tech team: doing things that are risky, aren’t normally done, shouldn’t be done, and asking questions of our own systems (such as our ability to publish content via the BL website), in order to effect change - call it productive disruption if you like.

Flickr

Engagement & learning communities interaction: -S Wikimedia - S Digital NZ - S Nicola Demonte has been using the images in his art history - ‘Memory Lane’ - classes for students with memory loss, brain injuries and Alzheimer’s today at Summerwood of Chanahssen, a senior living institution in Minnesota. - S Michael Hancher, University of Minnesota (no relation to above, see ‘Doing things with a million British Library book illustrations’ http://blog.lib.umn.edu/mh/dh2/2014/01/doing-things-with-a-million-british-library-book-illustrations.html and exercise at https://docs.google.com/file/d/0B08KKnzlfYMNRmVuMjl1SEFSdjA/edit ) Getting students to sample trove of book illustrations and assess the experience of coming up with informative tags for several dozen images. His instructions are as follows: - For each image, he has asked them to be prepared to discuss the appropriateness of the tags and the questions that they raise. Such as, for this image: - Do the buildings include churches? - Seated man or seated woman? An adult, not a child? Evidence for that? - What kind of hat? - Is that really a sketch pad? - What does the signature say? - Does the text in this book relate in a significant way to this picture? This example neatly underlines the huge potential here then beyond the mere use, reuse of material in FE, HE teaching and learning.

S And this can even go beyond interaction with to content to how it is delivered.

To sharing why we have done this - the internal disruption, how the work questions notion of ‘publication’, how institutions think of derived data, sets of content algorithmically derived. Masters students at City University taking the module ‘Libraries and Publishing in an Information Society’ were especially interested when I went to speak to them about all this in March http://jameswbaker.tumblr.com/post/79550296266/future-libraries-considering-publishing-libraries

To describing what it means to be part of a research ecosystem where this sort of thing is begin done, where content of this kind is available at scale.

To discussing the fragmenting, blurring boundaries between communities invested in our shared past. Within our Flickr set, discovery very much driven by contribution from non-traditional domains, so not librarians, historains et al (one individual has exceeded 12,000 individual, hand typed tags - and not nonsense or generics, but latin names for plants, georeference points for buildings).

And maybe this then is something we are best positioned to offer: a focal point for connnecting, by way of our activities and digital collections, your students with wider communities interested in what we have and what we do, and how those communities, from Flickr to Wikipedia, are a key partner in knowledge creation and dissemination in the the digital age.


Some admin…

This work is licensed under a Creative Commons Attribution 3.0 Unported License. Creative Commons License