⚠ Calisphere's underlying Solr index and associated Solr API will be deprecating in February 2024:

As mentioned in our recent progress update to develop the new Calisphere harvester, we are planning to transition all harvesting operations to the new harvesting system early this year.

The new harvesting system will use OpenSearch as the underlying index. Once we transition, we will be deprecating the legacy Solr index and associated Solr API; as of February 2024, any API keys that have been assigned will no longer be able to access data through the legacy Solr API.

Following our transition to OpenSearch, Calisphere's new indexing platform, we will begin to develop an updated strategy to provide programmatic access to Calisphere data. We will provide updates through this service announcement.

In the meantime, we continue to share Calisphere data with the Digital Public Library of America (DPLA) on a bi-annual basis. DPLA maintains an API (and static data file) of content in their aggregation (https://pro.dp.la/developers); please feel free to query the DPLA API. DPLA’s latest snapshot of Calisphere data is from November 2023. 


About the APIs

There are two different APIs associated with content published in Calisphere:

  • Solr API: The Solr index contains object level metadata only. Each record has collection and repository fields referencing collections and institutions in the Collection Registry.
  • Collection Registry API: The Collection Registry contains collection and institution level metadata.

Solr API

In order to access the Solr API, you will have to obtain an API Authentication Token. Request a token by contacting us.

Queries can be made using curl, or any Solr library that supports authentication tokens:

curl -H 'X-Authentication-Token: xxxx-xxxx-xxxx-xxxx' "https://solr.calisphere.org/solr/query/?q=fred"

We’re using solrpy - a python library with Solr bindings - to hit Solr: https://github.com/edsu/solrpy.

The following are common query parameters. Consult the Solr documentation for a complete list of available query parameters.

q - default query parameter, searches the text field by default unless a different field is otherwise specified. Takes a comma separated list of field: search_string pairs:

rows - number of objects to return, default value is 10

start - object to start on, default value is 0 - specifying start and rows together can create pagination

fq - filter the query by specifying a value for a field. Filter on the string (_ss) version of the field to avoid tokenization and provide exact matches

facet - true|false

facet_limit - number of facets to return (set to -1 to display all facets)

facet_field - fields to return facets for (ex: collection_name)

The Solr Fields

This scheme is still undergoing active development. Find the most up-to-date scheme on GitHub: github.com/ucldc/solr_api/.

  • Name: indicates the field name
  • Type: indicates the field type
  • Comments: notes regarding use of the field
  • Multi-valued: indicates if field is repeatable
  • Indexed: indicates if the value of the field can be used in queries to retrieve matching documents
  • Stored: indicates if the value of the field is stored in the index, and the value of the field can be retrieved by queries

 

Name

Type

Comments

Multi-Valued

Indexed?

Stored?

General and administrative fields
createddaterefers to creation of the metadata document, not creation of the Solr document, nor creation of the content objectnoyesyes
created_sstringstring variant of created for wildcard searchingnoyesyes
idstringUnique identifier assigned by CDL to the object, derived from identifier (if the value is an ARK) or otherwise auto-generated. This value also is also used within the context of the URL for the object in Calisphere.noyesyes
last_modifieddaterefers to the date the metadata document was last modifiednoyesyes
last_modified_sstringstring variant of last_modified for wildcard searchingnoyesyes
texttext_generalnot stored; catchall text field for keyword search that indexes tokens - for each object, contains the following fields: title, contributor, creator, coverage, date, description, extent, format, identifier, language, publisher, relation, rights, source, subject, and typeyesyesno
text_revtext_general_revnot stored; the same as the text field, but in reverse for efficient leading wildcard queriesyesyesno
timestampdatetimestampon the Solr document - default value is NOW, ie the time of object creation in the Solr index.noyesyes
Metadata fields (supplied through the Collection Registry; all multivalued so an object can be related to more than one Campus, Repository, and/or Collection)
campusstringcampus stores the URL to the registry API campus objectyesyesyes
campus_datastringcampus_name::campus_urlyesyesyes
campus_namestringStores the name of the campus, so that clients don’t need to look up against the registry APIyesyesyes
campus_urlstringStores the URL to the registry API campus objectyesyesyes
collection_datastringcollection_url::collection_nameyesyesyes
collection_namestringStores the name of the collection, so that clients don’t need to look up against the registry APIyesyesyes
collection_urlstringStores the URL to the registry API collection objectyesyesyes
repository_datastringrepository_url::repository_nameyesyesyes
repository_namestringStores the name of the repository, so that clients don’t need to look up against the registry APIyesyesyes
repository_urlstringStores the URL to the registry API repository objectyesyesyes
Metadata fields (stored and indexed as strings, instead of tokenized text) NOTE: Use these values for display in your app
alternative_title_ssstring
yesyesyes
contributor_ssstring
yesyesyes
coverage_ssstring
yesyesyes
creator_ssstring
yesyesyes
date_ssstring
yesyesyes
extent_ssstring
yesyesyes
format_ssstring
yesyesyes
genre_ssstring
yesyesyes
identifier_ssstring
yesyesyes
item_countintegerUsed to indicate complex objets (indicates the total number of components, for a given complex object)noyesyes
language_ssstring
yesyesyes
location_ssstring
yesyesyes
provenance_ssstring
yesyesyes
publisher_ssstring
yesyesyes
relation_ssstring
yesyesyes
rights_ssstring
yesyesyes
rights_holder_ssstring
yesyesyes
rights_note_ssstring
yesyesyes
rights_date_ssstring
yesyesyes
rights_uristringUsed to indicate a URI to a rights expression (e.g., Creative Commons, RightsStatements.org)noyesyes
source_ssstring
yesyesyes
spatial_ssstring
yesyesyes
subject_ssstring
yesyesyes
temporal_ssstring
yesyesyes
title_ssstringonly required fieldyesyesyes
type_ssstring
yesyesyes
Sort fields (Normalized fields to enable easier sorting)
sort_collection_datastringcollection data with a normalized collection name for sortingyesyesyes
sort_date_startdate
noyesyes
sort_date_enddate
noyesyes
sort_titlealphaSpaceSortVersion of title used for lexical orderingnoyesyes
Content file fields
manifeststringintended for IIIF manifest information (forthcoming)nonoyes
object_templatestringintended for Nuxeo object form/genre, to facilitate display in front-end (forthcoming)nonoyes
url_itemstringbestguess at home URL for the item. Filled in at time of harvesting, currently indexed to search for items with it filled innoyesyes
reference_image_dimensionsstringPixel width:height.nonoyes
reference_image_md5stringnot indexed; holds the md5 of the best image found for image objects this will then be passed to the thumbnail server for nicely sized images. For now you can use md5s3stash to calculate the URL to imagenoyesyes
structmap_textstring
noyesno
structmap_urlstringOnly present for objects harvested from Nuxeo
https://github.com/ucldc/ucldc-docs/wiki/media.json
nonoyes
Metadata fields (stored and indexed as tokenized text) NOTE: Use these for searching a specific field. In future versions of the index, these may not be stored. Use the _s or _ss versions of the field for display of values. These will remain indexed for seaching against the tokenized values
alternative_titletext_general
yesyesyes
contributortext_general
yesyesyes
coveragetext_general
yesyesyes
creatortext_general
yesyesyes
datetext_general
yesyesyes
descriptiontext_general
yesyesyes
extenttext_general
yesyesyes
facet_decadestring
yesyesyes
formattext_general
yesyesyes
genretext_general
yesyesyes
identifiertext_general
yesyesyes
languagetext_general
yesyesyes
locationtext_general
yesyesyes
provenancetext_general
yesyesyes
publishertext_general
yesyesyes
relationtext_general
yesyesyes
rightstext_general
yesyesyes
rights_holdertext_general
yesyesyes
rights_notetext_general
yesyesyes
rights_datetext_general
yesyesyes
rights_uristring
noyesyes
sourcetext_general
yesyesyes
spatialtext_general
yesyesyes
subjecttext_general
yesyesyes
temporaltext_general
yesyesyes
transcriptiontext_general
yesyesyes
titletext_generalonly required fieldyesyesyes
typetext_general
yesyesyes

Collection Registry API

The Collection Registry has a HATEOAS API powered by Tasty Pie.

HATEOAS APIs attempt to be self describing.  All available endpoints may be discovered from the base URL.

Format

The following formats are supported: [‘json’, ‘jsonp’, ‘xml’, ‘yaml’, ‘plist’]

Base

Respository

Collection

Campus

Institution JSON

  • join http://dsc.cdlib.org/institution-json/ with the "ark": (no trailing slash) in Repository or Campus to get address, phone number, email etc. from the voro dashboard.

The API is configured in django in this file.