The Shared DAMS (digital asset management system) can be used by the UC Libraries to create and manage object-level metadata and content files. It also provides workflows for, if desired, publishing digital objects to Calisphere (for public access) and/or depositing them in Merritt (for long-term preservation). The shared DAMS is a customized implementation of Nuxeo, an enterprise platform that is used by a variety of organizations–from media companies to CollectionSpace (developed by UC Berkeley and the Museum of the Moving Image)–for their content management needs. Nuxeo was deployed as part of the UC Libraries Digital Collection (UCLDC) implementation project.
The UCLDC implementation project was a multi-year initiative by the UC Libraries and the CDL to deploy a technical stack for managing, aggregating, and providing access to digital resources owned by institutions across the libraries and the state. It provided: A pipeline for exposing metadata broadly A showcase for digital collections A Shared DAMS (Nuxeo) for creating and storing digital content and metadata A registry for planning future initiatives Our project wiki has some older but still-relevant information about the technical model.
Only authorized users—UC Libraries staff, student workers, and other individuals expressly designated by the UC Libraries—may use the shared DAMS. The libraries can control permissions for their staff for performing various functions in the system such as creating objects and editing metadata. For more information…
Note that in the UCLDC model, the shared DAMS does not provide public access to end-users. Rather, the public interface (Calisphere) is a separate layer of the UCLDC stack, which allows us to aggregate and display collections from both the DAMS and other sources such as campus Fedora-based systems.
At this time, the shared DAMS is open only to digital content owned and/or stewarded by the UC Libraries. The initial use cases for the software have focused on unique materials held and digitized by the UC Libraries, although there is the potential to provide the service more broadly. Nuxeo supports a wide range of file formats–including images, text documents, audio files, video files, and beyond–and metadata options. Files and metadata may be modeled into both simple and complex objects. Learn more about specific file formats and the metadata scheme that we have implemented…
Background: How We Arrived at Nuxeo
The vision for a systemwide DAMS–as well as for the UCLDC platform as a whole–was arrived at through a systemwide planning process over approximately six years and comprising various task forces and committees (DLSTF1, DLSTF2, NGTS New Modes of Access, and NGTS POT1).
By 2009-2010, broad consensus had emerged that a systemwide solution was needed for the description and management of digital content, with a particular and acute urgency around unique digital content owned and stewarded by the libraries: images, texts, and A/V materials predominantly (but by no means entirely) held by special collections and archives. Without such a solution, several of the libraries had found that they were unable to generate, steward, and provide access to their digital collections on an ongoing basis.
This situation was seen as a serious threat not only to the libraries’ ability to independently serve their researchers, but also to the vision of a “UC Libraries Digital Collection”: a coordinated, combined digital collection that would showcase the riches of the libraries to users worldwide, as envisioned in a paper by the UC Libraries Collection Development Committee.
Under NGTS POT1, a number of systemwide “lightning teams” were formed to work on different aspects of the UCLDC vision. Two teams in particular most directly shaped the specific direction of the model and the technologies now being implemented. Those teams were focused on establishing the requirements and the technical model for the UCLDC project.
- Requirements: Lightning Team 1A surveyed staff across the libraries to determine requirements for a systemwide DAMS (as well as for a harvest component and a public access layer). The requirements heavily emphasized functionality necessary for campus libraries to perform management functions as relate to common workflows for digitization and metadata creation. The team did not include advanced technical requirements, such as linked data capabilities, in its report.
- Technical Model: Lightning Team 1C, which included technologists from 4 campuses and CDL, put forth a technical model for the UCLDC and recommended a platform for the DAMS. The team based these decisions on an analysis of vetted systems and a comparison of the systems relative to the requirements put forth by LT1A (described above). The team determined that Nuxeo held the greatest possibility for meeting the most requirements on the quickest possible timeline. Thus, they recommended it be implemented “as the first step towards a systemwide digital asset management system strategy.” Meanwhile, mindful of growing interest on the campuses of participating in community development in this space, and particularly around the Fedora framework, the team recommended an additional “long-term strategy of participating in the library-specific Project Hydra and Islandora communities utilizing the Fedora (or Fedora Futures) repository framework.” (POT1 Lightning Team 1C report, p. 10).
The Systemwide Operations and Planning Advisory Group (SOPAG) accepted POT1’s summary report, with the exception that it would not endorse Fedora as the long-term solution for the DAMS infrastructure:
While the LT1C Final Report introduces the concept of short- and long-term strategies for developing a systemwide DAMS, SOPAG supports only the technical model as presented in Figure 1, p.3 and does not believe that long-term system technologies can be defined at this stage. As with any UC Libraries shared service initiative, after the service has moved from the implementation stage and into the operational stage, an operations team will be charged with the responsibility to monitor technology developments and provide recommendations for service enhancements. We trust that following this established process for operations and continual improvement will meet the service and technical needs of the future UC Libraries Digital Collection.
The University Librarians subsequently charged CDL to implement the UCLDC, now under the auspices of SAG2. CDL followed the recommendations of the lightning teams/SOPAG in its implementation project.
CDL opened the DAMS to campus libraries in July 2014, after approximately ten months of implementation work. This was about a year ahead of POT1’s timeline, with the aim of getting campus users in “on the ground” and enabling them to test, provide feedback, and begin to transition over the remaining months of the implementation period. Campus users at that point could log into Nuxeo using Shibboleth, upload files, build simple and complex objects using the custom metadata model, and batch edit metadata. Since then we have continued to develop toward additional requirements (both existing and uncovered), for example by adding a bulk file import client.
The soft launch of the Calisphere beta site in July 2015 was on track with POT1’s timeline and signified the successful implementation of the full “pipeline” and goals of the project: the implementation of the DAMS, the harvest from various campus systems, the development of a robust public interface, and the distribution of all metadata to the Digital Public Library of America (a service not originally in scope but folded into the project upon recognition of the overlap in required infrastructure). Full public access was available with the release of the Calisphere beta site approximately 4-8 weeks later, in time for the new school year.
Only three major requirements will not be met on the two-year timeline: controlled vocabularies in the DAMS; support for “on-the-fly” creation of topical collections on Calisphere; and restricted access to content through Calisphere. Prioritization was necessary given the short timeline for the multi-layered project, and, in consultation with campus partners, these three requirements were deemed lower priorities for launch. Now that the pipeline is in place, we could work on implementing these and other more advanced features that the Libraries need and want.
The CDL implementation of Nuxeo currently meets a majority of the requirements defined by the POT1 lightning team, for example:
- login and authentication through Shibboleth
- support for complex objects
- batch edit, bulk import, and other critical functionality for library workflows
- a flexible metadata model, allowing for a custom schema(s) if necessary
- accommodation of all file types and unlimited storage on Amazon’s S3 cloud service
We don't maintain an official list of features for Nuxeo, because a) there are different use cases which reflect different functionality and b) the product is always changing, as we build new tools and leverage its upgrades. That said, we realize that campuses often want to compare functional requirements. Here is an annotated list of original requirements created by the systemwide precursor to this service. However, this list does not represent the full range of capabilities of the platform.
The way that the current service is structured, there are effectively two “layers” of support for libraries using the DAMS: Nuxeo and CDL.
CDL has a support contract and service level agreement (“Silver level”) with Nuxeo, which provides us with ready access to Nuxeo’s technical team – as well as access to the latest hotfixes, service packs, and upgrades to the platform. We have been impressed with the rapid response that we’ve received from Nuxeo to help us implement and customize the product. As a recent example, the Nuxeo team helped us troubleshoot and enhance the product to support importing of PCD (photo compact disc) format files–as required by UC Irvine–within the span of a few days. Nuxeo has also been responsive to incorporating enhancements that we’ve requested into their development roadmap, such as updates to Studio (an online tool that makes it easy for us to configure and customize the software without expending development resources). Read more about our service level agreement…
Nuxeo’s open-source code base, myriad plug-ins, and extension points allow CDL to provide a second layer of support for the Libraries. The software has proven flexible and customizable enough for us to meet the requirements identified by POT1 as well as new needs identified by the UC Libraries. For example, a number of campus libraries expressed a desire for bulk importing large numbers of files into Nuxeo without CDL’s mediation. (Although Nuxeo does latently support bulk upload through the user interface, it is necessarily limited by bandwidth.) Within a matter of weeks, we were able to leverage Nuxeo’s API to rapidly prototype and develop a client application, which campus users could install on their desktops. Campus users have been using the client for the past few months to successfully upload batches of files on a self-serve basis.
More information about specific Nuxeo functionality is available throughout this user guide.
Nuxeo’s metadata scheme is highly customizable and extensible, and can be modified using the Studio tool rather than requiring developer time. To date, we have configured Nuxeo to support a specific metadata scheme, which was developed in consultation with the UCLDC Project Stakeholder Group. The schema adapts Dublin Core-based elements (which in turn have analogs with other standard data structure schemes such as VRA Core, MODS, and MARC). The schema was intentionally designed to account for a broad range of content types and to support discovery and use. So far this schema has proved appropriate and effective for the Libraries’ content, and it accommodates repeating fields, multi-valued fields, complex objects, etc.
If a campus library requires a custom metadata schema, we have the option to modify the schema or add additional schemas. For example, over the course of using Nuxeo, we have extended particular metadata fields (e.g., Description) to address needs voiced by campus libraries to have more granular types of descriptive notes in their metadata – which can then be available for indexing and display. It is also possible to integrate external vocabularies into the schema for ease of cataloging and authority control.
The metadata records in Nuxeo can be obtained in the form of XML; the XML can then be transformed into other outputs (e.g., JSON, Dublin Core XML) using standard transformation tools such as XSLT. Hence, if there is a scenario where a particular output is required by a campus library, the Nuxeo XML output is highly adaptable.
The technology stack utilizes Apache Tomcat with Redis, Postgres, and Elasticsearch components. The core repository utilizes CMIS/Visible Content Store. For file storage, we are utilizing Amazon Web Service’s (AWS) Simple Storage Service (S3). S3 is dynamically provisioned, so we can scale to any volume without pre-provisioning capacity.
The software is updated using .jar files. Customizations can be applied using Studio, an online platform that generates .jar configuration files.
Yes. We are using the Amazon Web Service (AWS) Simple Storage Service (S3) to store content files and the AWS Relational Database Service (RDS) to store the backend Nuxeo databases/metadata records. The S3 “Standard Storage” tier that we are utilizing includes secure cross-region replication of files and version services. RDS also includes automated backup, database snapshots, and data recovery services.
Although the metadata managed within Nuxeo is not natively stored in the RDF format, both Nuxeo specifically and the broader technical stack underlying Calisphere generally supports Linked Open Data use cases. Within Nuxeo, for example, we believe it would be both immediately advantageous and relatively straightforward to integrate linked data controlled vocabularies. The metadata scheme in Nuxeo anticipates drawing on name authorities, thesauri, and other controlled vocabularies that are available as linked data, and we could create a “picker” in the user interface to make it easy for library staff to find and add this metadata to their objects.
Our harvesting infrastructure, meanwhile – which harvests and aggregates metadata from Nuxeo as well as other platforms maintained across the UC Libraries – latently incorporates linked data elements. For example, harvested metadata is normalized and structured based on the Digital Public Library of America’s (DPLA) Metadata Application Profile; the harvested metadata is subsequently expressed as JSON-LD and shared with DPLA. The harvested metadata is also being stored in a Solr index underlying Calisphere, accessible through an API as well as schema.org encodings in the forthcoming Calisphere BETA site.
We are interested in the potential for linked data and open to exploring with the Libraries additional use cases for incorporating it throughout the technical stack. That said, we believe that linked data opportunities are not dependent on using RDF as a primary data store–.. and that we should focus specifically on the linked data opportunities that promote efficiencies and/or advance access to the Libraries’ collections.
For requirements that were not met at the time of Calisphere's soft launch in July 2015, Nuxeo provides promising solutions (and, indeed, we see some of these requirements explicitly on the product’s roadmap). It is overwhelmingly the case that some requirements were not met simply because they require more systemwide discussion than could be accomplished on the project’s ambitious timeframe. CDL anticipates working with the Libraries to further unpack, prioritize, and add new requirements to this service offering, whether we meet these requirements within Nuxeo or with another product.
Nuxeo is a flexible platform with a well-documented API, offering a range of opportunities for co-developing new features and integrations with other systems. As an example, staff from the UC Santa Cruz Libraries and CDL Nuxeo team developed a strategy for integrating data between Omeka and Nuxeo; UC Santa Cruz subsequently created a plug-in for Omeka that allows users to import content from Nuxeo into Omeka, utilizing the APIs from each platform; the content can then subsequently be published through Omeka, leveraging Omeka to create a tailored or customized view of that content (e.g., as part of an exhibit).
More broadly, Calisphere and Nuxeo also supports a range of opportunities for co-development, leveraging the APIs that are available.
We welcome opportunities to work with the UC Libraries to identify and develop other enhancements to Nuxeo and the broader platform underlying Calisphere. For example, we anticipate that Nuxeo could be extended to integrate shared name authority and thesaurus data, potentially using linked open data techniques; this type of functionality could benefit from co-development approaches. If there is a particular area of the platform that you would like to extend, and are interested in collaborating with us on development work, please contact us!
CDL does not currently charge the Libraries for use of the Nuxeo DAMS, nor for the storage of metadata and files in Amazon S3. We have no plans to start charging for the service. Only if storage costs became unsustainable for CDL would we consider re-charging for storage only; we would make an assessment in consultation with the Libraries. Note that we have provided an ingest pipeline to Merritt from the Nuxeo DAMS.
Once a given UC Library transfers content from Nuxeo into Merritt, the pass-through storage costs associated with Merritt will be charged to that UC Library. Only a Nuxeo Administrator can designate content for transfer using this workflow.
Yes. Given the express statement in the UCLDC planning process of the need for long-term planning around a DAMS solution–and given the rapidly changing nature of technology and the digital library landscape–CDL has been following alternative developments, particularly in the Fedora/Hydra communities, even while we have moved forward on the implementation of the Nuxeo solution as charged by the libraries. Indeed, we have de facto considered it part of our charge to keep an eye on this space and consider its implications for the ongoing development and provisioning of a systemwide DAMS service. (For example, we are currently evaluating the Portland Common Data Model and we participated in the UC San Diego Hydra/Fedora/PCDM camp.)
Meanwhile, we have learned many things from implementing and customizing the Nuxeo platform that we believe warrant discussion among the UC Libraries, vis-à-vis a potential pivot to Fedora. For example, one potential strategy to engage in library community development efforts–while maintaining the high level of service enabled by the Nuxeo platform–would be to incorporate other library technologies into the broader technical stack (such as we have done using Loris); “borrow and build” may in fact be an option. We have anticipated a full evaluation given the explicit recommendation voiced by LT1C and SOPAG’s cover letter—and, indeed, believe it is necessary.
The timing and move to new technologies should be a systemwide conversation that is respectful of the process, the needs voiced, and the resources required.
Nuxeo supports batch exporting of metadata, with references to associated content files in Amazon S3, through its administrative user interface as well as through its API. Hence, we have flexible options for migrating content from Nuxeo to another platform.