Calisphere API : OAC/Calisphere Contributor Help Center

⚠ Calisphere's underlying Solr index and associated Solr API will be deprecating in February 2024:

As mentioned in our recent progress update to develop the new Calisphere harvester, we are planning to transition all harvesting operations to the new harvesting system early this year.

The new harvesting system will use OpenSearch as the underlying index. Once we transition, we will be deprecating the legacy Solr index and associated Solr API; as of February 2024, any API keys that have been assigned will no longer be able to access data through the legacy Solr API.

Following our transition to OpenSearch, Calisphere's new indexing platform, we will begin to develop an updated strategy to provide programmatic access to Calisphere data. We will provide updates through this service announcement.

In the meantime, we continue to share Calisphere data with the Digital Public Library of America (DPLA) on a bi-annual basis. DPLA maintains an API (and static data file) of content in their aggregation (https://pro.dp.la/developers); please feel free to query the DPLA API. DPLA’s latest snapshot of Calisphere data is from November 2023.

About the APIs

There are two different APIs associated with content published in Calisphere:

Solr API: The Solr index contains object level metadata only. Each record has collection and repository fields referencing collections and institutions in the Collection Registry.
Collection Registry API: The Collection Registry contains collection and institution level metadata.

Solr API

In order to access the Solr API, you will have to obtain an API Authentication Token. Request a token by contacting us.

Queries can be made using curl, or any Solr library that supports authentication tokens:

curl -H 'X-Authentication-Token: xxxx-xxxx-xxxx-xxxx' "https://solr.calisphere.org/solr/query/?q=fred"

We’re using solrpy - a python library with Solr bindings - to hit Solr: https://github.com/edsu/solrpy.

The following are common query parameters. Consult the Solr documentation for a complete list of available query parameters.

q - default query parameter, searches the text field by default unless a different field is otherwise specified. Takes a comma separated list of field: search_string pairs:

q=*:* -returns all objects in the index
https://solr.calisphere.org/solr/query/?q=*:*&wt=json&indent=true
q=mosswood park - returns all objects in the index with an instance of mosswood park in their metadata
https://solr.calisphere.org/solr/query/?q=mosswood+park&wt=json&indent=true
q=title: "mosswood park", collection_name: "Parks in Oakland, California - Views" - returns all objects with mosswood park in the title and a collection name of "Parks in Oakland, California - Views"
https://solr.calisphere.org/solr/query/?q=title: "mosswood park", collection_name: "Parks in Oakland, California - Views"&wt=json&indent=true

rows - number of objects to return, default value is 10

rows=6 - returns the first six objects in the index
https://solr.calisphere.org/solr/query/?q=*:*&rows=6&wt=json&indent=true

start - object to start on, default value is 0 - specifying start and rows together can create pagination

start=2 - returns the third object through the 13th in the index
https://solr.calisphere.org/solr/query/?q=*:*&start=2&wt=json&indent=true

fq - filter the query by specifying a value for a field. Filter on the string (_ss) version of the field to avoid tokenization and provide exact matches

fq=type_ss: image - returns all objects with a type of 'image'
https://solr.calisphere.org/solr/query/?q=*:*&fq=type_ss: image&wt=json&indent=true

facet - true|false

facet_limit - number of facets to return (set to -1 to display all facets)

facet_field - fields to return facets for (ex: collection_name)

The Solr Fields

This scheme is still undergoing active development. Find the most up-to-date scheme on GitHub: github.com/ucldc/solr_api/.

Name: indicates the field name
Type: indicates the field type
Comments: notes regarding use of the field
Multi-valued: indicates if field is repeatable
Indexed: indicates if the value of the field can be used in queries to retrieve matching documents
Stored: indicates if the value of the field is stored in the index, and the value of the field can be retrieved by queries

Name	Type	Comments	Multi-Valued	Indexed?	Stored?
General and administrative fields
created	date	refers to creation of the metadata document, not creation of the Solr document, nor creation of the content object	no	yes	yes
created_s	string	string variant of created for wildcard searching	no	yes	yes
id	string	Unique identifier assigned by CDL to the object, derived from identifier (if the value is an ARK) or otherwise auto-generated. This value also is also used within the context of the URL for the object in Calisphere.	no	yes	yes
last_modified	date	refers to the date the metadata document was last modified	no	yes	yes
last_modified_s	string	string variant of last_modified for wildcard searching	no	yes	yes
text	text_general	not stored; catchall text field for keyword search that indexes tokens - for each object, contains the following fields: title, contributor, creator, coverage, date, description, extent, format, identifier, language, publisher, relation, rights, source, subject, and type	yes	yes	no
text_rev	text_general_rev	not stored; the same as the text field, but in reverse for efficient leading wildcard queries	yes	yes	no
timestamp	date	timestampon the Solr document - default value is NOW, ie the time of object creation in the Solr index.	no	yes	yes
Metadata fields (supplied through the Collection Registry; all multivalued so an object can be related to more than one Campus, Repository, and/or Collection)
campus	string	campus stores the URL to the registry API campus object	yes	yes	yes
campus_data	string	campus_name::campus_url	yes	yes	yes
campus_name	string	Stores the name of the campus, so that clients don’t need to look up against the registry API	yes	yes	yes
campus_url	string	Stores the URL to the registry API campus object	yes	yes	yes
collection_data	string	collection_url::collection_name	yes	yes	yes
collection_name	string	Stores the name of the collection, so that clients don’t need to look up against the registry API	yes	yes	yes
collection_url	string	Stores the URL to the registry API collection object	yes	yes	yes
repository_data	string	repository_url::repository_name	yes	yes	yes
repository_name	string	Stores the name of the repository, so that clients don’t need to look up against the registry API	yes	yes	yes
repository_url	string	Stores the URL to the registry API repository object	yes	yes	yes
Metadata fields (stored and indexed as strings, instead of tokenized text) NOTE: Use these values for display in your app
alternative_title_ss	string		yes	yes	yes
contributor_ss	string		yes	yes	yes
coverage_ss	string		yes	yes	yes
creator_ss	string		yes	yes	yes
date_ss	string		yes	yes	yes
extent_ss	string		yes	yes	yes
format_ss	string		yes	yes	yes
genre_ss	string		yes	yes	yes
identifier_ss	string		yes	yes	yes
item_count	integer	Used to indicate complex objets (indicates the total number of components, for a given complex object)	no	yes	yes
language_ss	string		yes	yes	yes
location_ss	string		yes	yes	yes
provenance_ss	string		yes	yes	yes
publisher_ss	string		yes	yes	yes
relation_ss	string		yes	yes	yes
rights_ss	string		yes	yes	yes
rights_holder_ss	string		yes	yes	yes
rights_note_ss	string		yes	yes	yes
rights_date_ss	string		yes	yes	yes
rights_uri	string	Used to indicate a URI to a rights expression (e.g., Creative Commons, RightsStatements.org)	no	yes	yes
source_ss	string		yes	yes	yes
spatial_ss	string		yes	yes	yes
subject_ss	string		yes	yes	yes
temporal_ss	string		yes	yes	yes
title_ss	string	only required field	yes	yes	yes
type_ss	string		yes	yes	yes
Sort fields (Normalized fields to enable easier sorting)
sort_collection_data	string	collection data with a normalized collection name for sorting	yes	yes	yes
sort_date_start	date		no	yes	yes
sort_date_end	date		no	yes	yes
sort_title	alphaSpaceSort	Version of title used for lexical ordering	no	yes	yes
Content file fields
manifest	string	intended for IIIF manifest information (forthcoming)	no	no	yes
object_template	string	intended for Nuxeo object form/genre, to facilitate display in front-end (forthcoming)	no	no	yes
url_item	string	bestguess at home URL for the item. Filled in at time of harvesting, currently indexed to search for items with it filled in	no	yes	yes
reference_image_dimensions	string	Pixel width:height.	no	no	yes
reference_image_md5	string	not indexed; holds the md5 of the best image found for image objects this will then be passed to the thumbnail server for nicely sized images. For now you can use md5s3stash to calculate the URL to image	no	yes	yes
structmap_text	string		no	yes	no
structmap_url	string	Only present for objects harvested from Nuxeo https://github.com/ucldc/ucldc-docs/wiki/media.json	no	no	yes
Metadata fields (stored and indexed as tokenized text) NOTE: Use these for searching a specific field. In future versions of the index, these may not be stored. Use the _s or _ss versions of the field for display of values. These will remain indexed for seaching against the tokenized values
alternative_title	text_general		yes	yes	yes
contributor	text_general		yes	yes	yes
coverage	text_general		yes	yes	yes
creator	text_general		yes	yes	yes
date	text_general		yes	yes	yes
description	text_general		yes	yes	yes
extent	text_general		yes	yes	yes
facet_decade	string		yes	yes	yes
format	text_general		yes	yes	yes
genre	text_general		yes	yes	yes
identifier	text_general		yes	yes	yes
language	text_general		yes	yes	yes
location	text_general		yes	yes	yes
provenance	text_general		yes	yes	yes
publisher	text_general		yes	yes	yes
relation	text_general		yes	yes	yes
rights	text_general		yes	yes	yes
rights_holder	text_general		yes	yes	yes
rights_note	text_general		yes	yes	yes
rights_date	text_general		yes	yes	yes
rights_uri	string		no	yes	yes
source	text_general		yes	yes	yes
spatial	text_general		yes	yes	yes
subject	text_general		yes	yes	yes
temporal	text_general		yes	yes	yes
transcription	text_general		yes	yes	yes
title	text_general	only required field	yes	yes	yes
type	text_general		yes	yes	yes

Collection Registry API

The Collection Registry has a HATEOAS API powered by Tasty Pie.

HATEOAS APIs attempt to be self describing. All available endpoints may be discovered from the base URL.

Format

The following formats are supported: [‘json’, ‘jsonp’, ‘xml’, ‘yaml’, ‘plist’]

Base

https://registry.cdlib.org/api/v1/?format=json

Respository

https://registry.cdlib.org/api/v1/repository/?format=json
- https://registry.cdlib.org/api/v1/repository/schema/?format=json

Collection

https://registry.cdlib.org/api/v1/collection/?format=json
- https://registry.cdlib.org/api/v1/collection/schema/?format=json

Campus

https://registry.cdlib.org/api/v1/campus/?format=json
- https://registry.cdlib.org/api/v1/campus/schema/?format=json

Institution JSON

join http://dsc.cdlib.org/institution-json/ with the "ark": (no trailing slash) in Repository or Campus to get address, phone number, email etc. from the voro dashboard.

The API is configured in django in this file.

Calisphere APIs