This page provides an overview of our Calisphere harvesting system, as well as our plans to deploy the updated system. 

Upgrading our harvesting infrastructure has included work to: build a new harvester, update our pipelines to contributors’ digital collections platforms, and implement a new index through fresh re-harvests from those platforms. Re-harvesting ensures that Calisphere has the most current version of your published collections. 

On June 11, 2024: As our first step, we published the first new index using the new system.

  • This index reflects collections that we re-harvested using updated pipelines.We successfully updated the bulk of our existing pipelines; there are a handful of remaining pipelines that we will update after this transition.
    • For a summary of pipelines currently in place – and pipelines under active development – see Supported Harvest Methods and Platforms.
    • In cases where we were not able to complete the updating of an existing pipeline by June 11, 2024, we migrated a “snapshot” of the collection from our old index to the new index.
  • Exact phrase search: Exact phrase search is back!
    • Search tip: To find items with an exact phrase, use quotes [" "] around your keyword phrase. For example: "sierra buttes" or "Malcolm X."

  • Date filter on search results: The “decade” filter feature on Calisphere search results now surfaces exact “dates,” as provided in the item record metadata. This is a temporary adjustment and we will be modifying the functionality in the coming months, as we explore the options available with this new system.

Resuming harvesting services: Please feel free to submit requests to harvest and re-harvest collections. We will be in touch with details on our publication schedule. 

  • Completed pipelines: We will resume conducting harvests and publishing new indexes, where we have completed pipelines to contributors’ digital collections platforms. We will follow up with a preview of the requested collections.
  • Pipelines in progress: We are actively working towards completing updates to a handful of remaining pipelines, so we can resume harvesting from those systems (see Supported Harvest Methods and Platforms). We will queue your requests for harvesting in the meantime.

Please review this page for additional details.


About the new harvesting system

Re-establishing pipelines with contributor platforms


About the new harvesting system

In 2021, we started an active development project (called Rikolti), to replace our current (and “legacy”) Calisphere harvesting system.. In June 2024, we are transitioning all harvesting operations to the new system (and will sunset the legacy harvester). The new system is designed to be modular and fast, using current, well-supported technologies. More detailed Information about our approach is available in the Rikolti project GitHub

What is harvesting?

Calisphere uses a harvesting model for contributing your collections into this aggregation. This strategy allows us to programmatically “fetch” collections from your local digital asset management system – specifically, descriptive metadata and thumbnails for items in the collections. Once fetched, we “map” the metadata into a central index underlying Calisphere, to support searching and browsing of the items.

Why did you develop a new harvester?

Our previous Calisphere harvester was outdated, and used deprecated, unsupported technologies adapted from the Digital Public Library of America’s (DPLA) open-source code base from 2013. We’re committed to developing infrastructure that uses current, well-supported systems that can continuously support the statewide aggregation of digital cultural heritage resources.

What changes in harvesting will contributors notice? Will there be changes to Calisphere?

We are using an updated technology framework to create a more efficient, flexible harvesting operation. With the new system, we are anticipating quicker turnaround times with harvesting and publishing collections to Calisphere. (We previously were able to publish new indexes on a bi-weekly basis). We do not anticipate any changes to the process and steps involved with sharing collections with Calisphere. We also do not anticipate any impacts or changes to how collections and items are searched, browsed, and displayed in Calisphere.

When will development of the new harvester be complete?

In June 2024, we are transitioning all harvesting operations to the new system. After June, we will be working on completing updates to a handful of remaining pipelines, so we can resume harvesting from those systems. For a summary of pipelines currently in place – and pipelines under active development – see Supported Harvest Methods and Platforms. We will also continue to review our priorities to strategize development of feature enhancements.


Re-establishing pipelines with contributors' platforms

My organization’s collections are already in Calisphere. Why do you need to re-establish a pipeline to harvest our digital collections?

Calisphere harvests collections by connecting to the platforms that contributors are using to manage and publish their digital collections. As part of our harvesting system upgrade, we also upgraded our mechanisms to “fetch” collections from each of your local digital asset management systems, and how we “map” the metadata into a central index underlying Calisphere. Please see Supported Harvest Methods and Platforms to view a list of currently supported platforms.

As a way of testing the new infrastructure, we re-harvested your collections to the Calisphere "stage" site to ensure a continued connection to your platform. By doing so, we were able to verify that the full pipeline–from harvesting to building the Calisphere index–continues to perform as expected. This was also our opportunity to refresh your collections in Calisphere, so we are in sync with the records currently published in your public platform.

Have all contributor pipelines been re-established as part of the harvesting system upgrade?

To see the most up to date progress update on each currently supported harvesting platform, please see Supported Harvest Methods and Platforms. We are actively working towards completing updates to a handful of remaining pipelines, so we can resume harvesting from these systems.

Did you QA the re-harvested collections?

As part of testing the new harvesting infrastructure, we conducted “data validation tests” to ensure we are correctly “fetching” your digital collection data, and “mapping” your data into our underlying Calisphere index. CDL staff reviewed the results of these tests during the re-harvesting.

This validation testing programmatically compared the existing collection data in Calisphere, with the newly-reharvested collection data. The results of this validation provided outputs of any differences found between the two data sources; as a baseline goal, we aimed for 100% data fidelity between the two sets for prioritized, core fields. However, we did find that many collections were updated in your public platforms–adding new records, updating metadata, removing records–which our validation tests surfaced as data differences. We evaluated the results of the data validation tests to first evaluate whether our fetching and mapping processes are correctly configured.

My organization has contributed collections to Calisphere, but we have since migrated to a new platform. What will happen to our collections?

Please contact us if you have migrated to another platform, and we will arrange for an initial call to discuss the details. We will need to write a new mapper to work with your new platform, and will be able to continue this service once we are fully on the new harvester, after we transition to the new harvester system, in June 2024.

My organization worked with California Revealed to contribute collections to Calisphere. What will happen to our collections?

We are coordinating with the California Revealed team on establishing a pipeline to California Revealed’s Archipelago platform, and will provide an update when this is in place.