This page provides an overview of a new Calisphere harvesting system under development and our plans to fully deploy to the updated system by the end of the 2023 calendar year.
As a next step in development, we will be testing the new infrastructure to verify that the full pipeline–from harvesting to building the Calisphere index–is performing as expected.
If your organization has contributed digital collections to Calisphere, we will be in touch regarding test re-harvests of your collections. We will share the results with you, for previewing on our Calisphere-test site. The collections will not be published in the public/live Calisphere site until we transition completely over to our new system, by the end of 2023. No QA work is required on your part.
Testing the full harvesting process is a critical part of developing this new infrastructure. This will also ensure that Calisphere has the most current version of your published collections. Please review this page for additional details.
About the new harvesting system
- What is harvesting?
- Why are you developing a new harvester?
- What changes in harvesting will contributors notice? Will there be changes to Calisphere?
- When will development of the new harvester be complete?
Re-harvesting digital collections
- My organization’s collections are already in Calisphere. Why do they need to be re-harvested?
- Do we have to do any QA checking of our re-harvested collections?
- What do you mean by “data validation testing”?
- How do I preview the results from the re-harvest?
- When will these newly re-harvested digital collections be published?
Harvesting digital collections via the legacy harvester
- My organization has collections that we’d like harvested to Calisphere before the end of 2023. How do I request that?
- My organization has contributed collections to Calisphere, but we have since migrated to a new platform. What will happen to our collections?
- My organization worked with California Revealed to contribute collections to Calisphere. What will happen to our collections?
About the new harvesting system
In 2021, we started an active development project (called Rikolti), to replace our current (and “legacy”) Calisphere harvesting system. By the end of 2023, we are planning to sunset the legacy harvester and transition fully to the new system, which is designed to be modular and fast, using current, well-supported technologies. More detailed Information about our approach is available in the Rikolti project GitHub.
What is harvesting?
Calisphere uses a harvesting model for contributing your collections into this aggregation. This strategy allows us to programmatically “fetch” collections from your local digital asset management system – specifically, descriptive metadata and thumbnails for items in the collections. Once fetched, we “map” the metadata into a central index underlying Calisphere, to support searching and browsing of the items.
Why are you developing a new harvester?
Our existing Calisphere harvester is outdated, and uses deprecated, unsupported technologies adapted from the Digital Public Library of America’s (DPLA) open-source code base from 2013. We’re committed to developing infrastructure that uses current, well-supported systems that can continuously support the statewide aggregation of digital cultural heritage resources. We are planning to sunset our existing outdated harvester and transition fully to Rikolti by the end of the 2023 calendar year.
What changes in harvesting will contributors notice? Will there be changes to Calisphere?
We are using an updated technology framework to create a more efficient, flexible harvesting operation. We do not anticipate any changes to the process and steps involved with sharing collections with Calisphere. We also do not anticipate any impacts or changes to how collections and items are searched, browsed, and displayed in Calisphere.
When will development of the new harvester be complete?
We are aiming to deploy a fully functional and first iteration of the new harvester (a “minimum viable product”) by the end of 2023, and sunset our existing system at that time. After this work is complete, we will continue to review our priorities to strategize development of feature enhancements.
Re-harvesting digital collections
My organization’s collections are already in Calisphere. Why do they need to be re-harvested?
Calisphere harvests collections by connecting to the platforms that contributors are using to manage and publish their digital collections. As a way of testing the new infrastructure, we will be re-harvesting your collections to the Calisphere-test site to ensure a continued connection to your platform. By doing so, we will be able to verify that the full pipeline–from harvesting to building the Calisphere index–is performing as expected. Additionally, re-harvesting the collections will ensure that we have a current version of your collections. This is our opportunity to refresh your collections in Calisphere, so we are in sync with the records currently published in your public platform.
Do we have to do any QA checking of our re-harvested collections?
As part of testing the new harvesting infrastructure, we will be conducting “data validation tests” to ensure we are correctly “fetching” your digital collection data, and “mapping” your data into our underlying Calisphere index. CDL staff will be reviewing the results of these tests during the re-harvesting.
No QA work is required on your part, though we will be in touch if we have any questions related to re-harvesting. We will also be in touch with contributors to share a preview of your re-harvested digital collections in our Calisphere-test site.
What do you mean by “data validation testing”?
As part of our harvesting development, we are also conducting data validation testing. This validation testing will programmatically compare the existing collection data in Calisphere, with the newly-reharvested collection data. The results of this validation will output any differences found between the two data sources; as a baseline goal, we are aiming for 100% data fidelity between the two sets for prioritized, core fields. However, we do anticipate that collections may have been updated in your public platforms–adding new records, updating metadata, removing records–which our validation tests will surface as data differences. We will evaluate the results of the data validation tests to first evaluate whether our fetching and mapping processes are correctly configured.
Note that the newly-reharvested collections will only appear on the Calisphere-test site, and will not be published in the public/live Calisphere site. Test harvests will be available and shared with you to preview on the Calisphere-test site.
How do I preview the results from the re-harvest?
The Calisphere-test site has a banner at the top of the page, featuring a toggle option. This toggle enables viewing data from two distinct data sources: “View Legacy Index” will display collection data harvested from our current Calisphere harvester; “Preview New Index” will display collection data once re-harvested.
Choosing the “Preview New Index” toggle will display results from the new Calisphere harvester.
In our email to you, we will include links to view your contributor landing page on the Calisphere-test site. From that link, you will be able to preview your digital collections.
When will these newly re-harvested digital collections be published?
Throughout the rest of the 2023 calendar year, our goal is to completely replace our current harvesting infrastructure with the new system; this includes transitioning to the new Calisphere index. Once we switch Calisphere’s underlying index over to the new Calisphere index, the newly re-harvested digital collections will be published on the public/live Calisphere site. We will be in touch with a more specific timeframe on this switch.
Before this transition happens, we will reach out to each Calisphere contributor to share a preview of your re-harvested collections in the Calisphere-test site.
Harvesting digital collections via the legacy harvester
My organization has collections that we’d like harvested to Calisphere before the end of 2023. How do I request that?
Our legacy harvester will be sunsetted only after our new harvester is fully deployed and operational; we will continue to run harvesting requests using our existing operational workflows until we transition to the new harvester, through the end of 2023. Please submit requests to harvest new collections and/or re-harvest existing collections using our Harvest and Re-harvest Request Form.
Note that our legacy harvester will need to have an established connection with your platform (i.e., Calisphere has previously harvested from your system). If we have not harvested from your system before, we will first need to establish a connection with your platform; we will be able to begin onboarding new contributors and/or platforms once we are fully on the new harvester, after we transition to the new harvester system, by the end of 2023. Please feel free to contact us for more information.
My organization has contributed collections to Calisphere, but we have since migrated to a new platform. What will happen to our collections?
Please contact us if you have migrated to another platform, and we will arrange for an initial call to discuss the details. We will need to write a new mapper to work with your new platform, and will be able to continue this service once we are fully on the new harvester, after we transition to the new harvester system, by the end of 2023.
My organization worked with California Revealed to contribute collections to Calisphere. What will happen to our collections?
We are coordinating with the California Revealed team; once we are fully on the new harvester, we are planning to reharvest your collections from California Revealed’s platform.