Clipper Jisc RDN workshop, Cambridge 6th September 2016 – sparking ideas

I attended a very busy and interesting meeting of the Jisc RDN (Research Data Network) and gave a presentation about our work in the Clipper project. Much of the attendees were involved with the Jisc shared service pilots in this area. The event was held in the historic Corpus Christi college and the main plenaries were held in the Mcrum Lecture Theatre – up a side alley from the famous Eagle pub (where I had a very fine pint of Greene King IPA – after work). You never know what may turn up at these events and it pays to keep an open mind about possible connections, this was one of those days when sparks seemed to fly from different ideas.

showing the overall between web annotation and data citation
Schematic showing the overlaps between web annotation and data citation

The day began with a really interesting and though provoking keynote from Danny Kingsley – the Head of Scholarly Communications at Cambridge. During this she mentioned the challenges presented by time based data such as audio and video (Clipper I thought!). But Danny also mentioned the growing field of data citation and the challenges this presented. This created Spark No.1 – I though to myself – well Clipper is actually a form of data citation – specialising in time based data (citing parts of a web data resource via a URI and making some comments about it in context).

But the more I thought about this as I sat in the lecture theatre I started to scribble some notes. Clipper is also a web annotation tool that is using emerging W3C standards in this area so that standard provides a nice potential for a vehicle to create and transport data citations more generally. This then got me thinking about the work we have been doing with the Roslin Institute at Edinburgh University in the project (see the draft ‘Clipper Snapshot Case Studies‘ document) where we discussed linking Clipper annotations to the DataCite DOIs ‘minted’ by Roslin for their data that linked to the time based media files we were annotating. The DOIs provide the provenance of the data we are ‘clipping’ and annotating, it seemed to make a lot of sense then in the Clipper project and perhaps now in the wider field of general data citation. After all, the content of a W3C web annotation can carry any information we like so it should be able to accommodate all disciplines and emerging data citation formats?

I was musing about this at the lunch break when I briefly bumped into Neil Jefferies (Head of Innovation at the Bodleian Library Oxford) who I knew from the Jisc Data Spring Programme. I was explaining these ideas to him when he added the idea of using the ORCID standard into the  mix to identify researchers and link them to their data – so that was Spark No.2. It’s an attractive idea – use existing standards (DOI, ORCID) with the soon to be standard W3C Web Annotation data model as a means of creation and transport for data citation. One of the advantages of this is that the citations themselves would be easily shared on the web and so accessible by search engines and analytics services.

Perhaps at some point it would be useful to do some pilot work in this area…

Some images from the Cambridge event  are below and here is the slidshare version of our workshop

Addendum: Neil got back in touch and suggested I look at the subject of ‘nano pubs’ – at first , I have to confess I thought of micro breweries! But a search showed up this link

http://nanopub.org/wordpress/?page_id=65

It seems to map nicely onto what we have been discussing…hopefully to be continued.

Images from the RDN event are below

 

Where the Clipper project workshop was held
Where the Clipper project workshop was held – the ‘new’ part of Corpus Christi College
rdn1
The old part of the Corpus Christi College where the other workshops were held

 

 

 

 

 

 

The Corpus Christi Dining Hall at lunchtime.
The Corpus Christi Dining Hall at lunchtime.

It’s the little things…Clipper & the W3C at Berlin

Trevor and myself attended the IAnnotate web annotation conference in Berlin this week, having been kindly alerted to it by colleagues at EUSCREEN. Having previously encountered the image annotation standard IIIF with colleagues from Digirati in the UK. Previous experience with standards had made us a little wary as sometimes standards work can lose  contact with practical everyday experience and become an expensive end in their own right, consuming vast resources but leading nowhere, – my own experience with educational interoperability standards confirms that  :-).

So, we were beware of getting entangled in a standards runaway – as it happens some of the other participants had similar reservations about past standards initiatives, including W3C ones. However, our experience of attending the W3C working group briefing on the development of the web annotation standards was like a great of fresh air. One statement in particular stuck in my mind – it went something like

“Look, we don’t care what you do inside your own [web annotation] systems, but when you come to share your data with the outside world it makes sense to do it in a standardised way – so that others can make sense of it and use it”

This was the turning point for me – the little thing that revealed the intent – that and the fact that the proposed standard is admirably practical, light weight and makes useful reuse of other W3C standards such as media fragments. Believe it or not I have seen developers and designers trying to adopt a heavy standard internally in their systems in a slavish and sometimes pedantic manner – leading to what might be most charitably described as ‘sub optimal outcomes’.

So, a great result for us from attendance at the conference – we also get a ready made data model that we can adopt and build on without having to dream up our own that also makes compliance with the emerging W3C web annotation standards easier and more useful.

John

 

Down the Rabbit Hole

In this third stage of Clipper Development we have, after some discussion, decided to change the technical infrastructure we have been using (javascript, SQL PHP) to a more modern, powerful and scalable set of technologies (Angular2, MongoDB, NodeJS, JSON-LD). This comes at a price some of it is very new  and still evolving (Angular2) and others are new to us as technologists and developers. In small team with fixed project time limits this presents us with risks and extremely steep learning curves. Our first encounters in creating a stripped down test version (‘Clipper Lite’) have confirmed this, yet we think the potential benefits outweigh the risk for future development benefits (speed – eventually!)  and  other related products and services we can create on the same foundation.

Hence the title of this post:

“Down the rabbit hole”, a metaphor for an entry into the unknown, the disorienting or the mentally deranging, from its use in Alice’s Adventures in Wonderland

Addendum – September 13 2016

It seems to have paid off we are making some great progress now and entering a testing cycle before releasing a beta service and code for evaluation

Clipper @ IIIF Audio/Video Workshop

IIIF AV Workshop attendees photograph
IIIF AV Workshop attendees

Last week the Clipper team participated in an invited workshop at the British Library, organised by the International Image Interoperability Framework (IIIF) consortium. The purpose of the workshop was to collate use cases and start outlining a development road map for extending the IIIF to include support for Audio/Video annotation. This was a great opportunity to find out more about the IIIF and the collaborative design process that has produced it.

Continue reading

Open University Workshop Videos

On Friday the 27th November we held a Clipper project meeting at the OU and then followed it with 2 workshops that were also videoed and webcast live over the internet by the OU. It was a long day but very productive. The workshops were held at the Knowledge Media Institute, Open University, Milton Keynes.

IIIF Workshop

The first workshop was delivered by Tom Crane of Digerati, with whom we have been discussing what  technical standards to include in the Clipper project. The subject of the workshop was the International Image Interoperability Framework (IIIF), we have been discussing how this might be extended to cover annotating audio and video resources. You can find the webcast at this link http://stadium.open.ac.uk/2620

Clipper Workshop

The second workshop was a short overview of the Clipper project, based on our previous community engagement workshops, followed by a question and answer session. You can find the webcast at this link http://stadium.open.ac.uk/2624

 

Technical Standards / System Design Part 2: Looking Forwards to Phase 3

The current prototype Clipper application is built using these open Web standards

Moving forwards in phase 3 we envisage using / investigating these standards

Our aim from the beginning has been to create a toolkit that has little or no dependency on any proprietary and ‘closed’ technology or standards. Choosing the above standards was a good start. Moving forwards we shall need to create a more detailed data model. We had been aware of the W3C Annotation Data model: http://www.w3.org/TR/annotation-model/ and the W3C web annotation working group http://www.w3.org/annotation/.

From a research point of view the following 3 standards could provide the vital ‘glue’ to bind a Clipper installation or service into the global digital research ecosystem

  1. DOI: Digital Object Identifier System: In our discussions at the Roslin Institute we have identified the possible use of DOI’s to identify Cliplists, clips and annotations as well as the audio-visual resources they are linked to
  2. ORCID: Provides a way of linking annotations etc. to individual researchers
  3. OAI-PMH; Provides a useful way of sharing Cliplist information between repositories

As a result of our community engagement activities we have been fortunate in encountering Tom Crane and the Digirati company and in the ensuing discussions Tom has been suggesting that that these existing and emerging standards will be really worth exploring in Phase 3 and we think they look really promising:

Tom has pointed out that the IIIF Presentation API – http://iiif.io/api/presentation/2.1/ with its concept of an IIIF manifest is close to our idea of the project being the container for Cliplists etc. He has also suggested that the IIIF Shared Canvas: http://iiif.io/model/shared-canvas/1.0/index.html concept can be extended to time based media. With some time-based media vocabulary the IIIF work might be just what we need in Clipper. Tom is coming to the OU this Friday (27/11/15) to present the work of the IIIF and we hope to discuss this further with him then and make plans for phase 3.

Technical Standards / System Design Part 1: Reflections

We have been discussing the Clipper toolkit with people recently as part of our community consultation process. One interesting question we have been asked by the digital library / information community is what ‘Data Model’ are we using? To be honest we have not thought too much about this until now as we had done a fair bit on that previously around 2009. So, a bit of explanation here might help us to clarify our position going forwards.

In the earliest phase of Clipper (around 2009) we created it in Adobe Flash and ActionScript using the Adobe AIR rich ‘internet application’ to create a cross-platform app (PC and Mac that is). This was a little before the HTML5 take off and the rise of tablets and smart phones). In that earlier project we did a lot of thinking about the data flows involved in the user interacting with audio-visual resources and what data would need to be gathered by the system to deliver the functionality the user needed. You can find a set of graphic flowcharts representing the data flow at this link. At the time we were fortunate in working with a colleague at Manchester University (Gayle Calverley) who had just completed a study for Jisc on the types of metadata needed for the storage and management of time based media in repositories. The report that Gayle created was thorough and really useful it was called the “Time Based Media Application Profile”, and it is still on line:

http://wiki.manchester.ac.uk/tbmap/index.php/Main_Page

In the end we did not implement a detailed data model based on that study, instead we developed our own ‘slimline’ version based on user ‘walkthroughs’ of the system and ‘reverse engineering’ approaches to see what data would be required to deliver the functionality we needed. The metadata schema we came up with was based on Dublin Core. We produced our own report detailing our approach to metadata and, with Gayle’s help, mapped it to the Jisc TBMAP report. This approach certainly made our life a lot easier then and to extent it still does today, it is useful to reflect on this as we go forwards and I think we shall certainly be using this and Gayle’s report in Phase 3.