⚠ Calisphere's underlying Solr index and associated Solr API will be deprecating in February 2024:
As mentioned in our recent progress update to develop the new Calisphere harvester, we are planning to transition all harvesting operations to the new harvesting system early this year.
The new harvesting system will use OpenSearch as the underlying index. Once we transition, we will be deprecating the legacy Solr index and associated Solr API; as of February 2024, any API keys that have been assigned will no longer be able to access data through the legacy Solr API.
Following our transition to OpenSearch, Calisphere's new indexing platform, we will begin to develop an updated strategy to provide programmatic access to Calisphere data. We will provide updates through this service announcement.
In the meantime, we continue to share Calisphere data with the Digital Public Library of America (DPLA) on a bi-annual basis. DPLA maintains an API (and static data file) of content in their aggregation (https://pro.dp.la/developers); please feel free to query the DPLA API. DPLA’s latest snapshot of Calisphere data is from November 2023.
About the APIs
There are two different APIs associated with content published in Calisphere:
- Solr API: The Solr index contains object level metadata only. Each record has collection and repository fields referencing collections and institutions in the Collection Registry.
- Collection Registry API: The Collection Registry contains collection and institution level metadata.
Solr API
In order to access the Solr API, you will have to obtain an API Authentication Token. Request a token by contacting us.
Queries can be made using curl, or any Solr library that supports authentication tokens:
curl -H 'X-Authentication-Token: xxxx-xxxx-xxxx-xxxx' "https://solr.calisphere.org/solr/query/?q=fred"
We’re using solrpy - a python library with Solr bindings - to hit Solr: https://github.com/edsu/solrpy.
The following are common query parameters. Consult the Solr documentation for a complete list of available query parameters.
q - default query parameter, searches the text field by default unless a different field is otherwise specified. Takes a comma separated list of field: search_string pairs:
- q=*:* -returns all objects in the index
https://solr.calisphere.org/solr/query/?q=*:*&wt=json&indent=true - q=mosswood park - returns all objects in the index with an instance of mosswood park in their metadata
https://solr.calisphere.org/solr/query/?q=mosswood+park&wt=json&indent=true - q=title: "mosswood park", collection_name: "Parks in Oakland, California - Views" - returns all objects with mosswood park in the title and a collection name of "Parks in Oakland, California - Views"
https://solr.calisphere.org/solr/query/?q=title: "mosswood park", collection_name: "Parks in Oakland, California - Views"&wt=json&indent=true
rows - number of objects to return, default value is 10
- rows=6 - returns the first six objects in the index
https://solr.calisphere.org/solr/query/?q=*:*&rows=6&wt=json&indent=true
start - object to start on, default value is 0 - specifying start and rows together can create pagination
- start=2 - returns the third object through the 13th in the index
https://solr.calisphere.org/solr/query/?q=*:*&start=2&wt=json&indent=true
fq - filter the query by specifying a value for a field. Filter on the string (_ss) version of the field to avoid tokenization and provide exact matches
- fq=type_ss: image - returns all objects with a type of 'image'
https://solr.calisphere.org/solr/query/?q=*:*&fq=type_ss: image&wt=json&indent=true
facet - true|false
facet_limit - number of facets to return (set to -1 to display all facets)
facet_field - fields to return facets for (ex: collection_name)
The Solr Fields
This scheme is still undergoing active development. Find the most up-to-date scheme on GitHub: github.com/ucldc/solr_api/.
- Name: indicates the field name
- Type: indicates the field type
- Comments: notes regarding use of the field
- Multi-valued: indicates if field is repeatable
- Indexed: indicates if the value of the field can be used in queries to retrieve matching documents
- Stored: indicates if the value of the field is stored in the index, and the value of the field can be retrieved by queries
Name | Type | Comments | Multi-Valued | Indexed? | Stored? |
General and administrative fields | |||||
created | date | refers to creation of the metadata document, not creation of the Solr document, nor creation of the content object | no | yes | yes |
created_s | string | string variant of created for wildcard searching | no | yes | yes |
id | string | Unique identifier assigned by CDL to the object, derived from identifier (if the value is an ARK) or otherwise auto-generated. This value also is also used within the context of the URL for the object in Calisphere. | no | yes | yes |
last_modified | date | refers to the date the metadata document was last modified | no | yes | yes |
last_modified_s | string | string variant of last_modified for wildcard searching | no | yes | yes |
text | text_general | not stored; catchall text field for keyword search that indexes tokens - for each object, contains the following fields: title, contributor, creator, coverage, date, description, extent, format, identifier, language, publisher, relation, rights, source, subject, and type | yes | yes | no |
text_rev | text_general_rev | not stored; the same as the text field, but in reverse for efficient leading wildcard queries | yes | yes | no |
timestamp | date | timestampon the Solr document - default value is NOW, ie the time of object creation in the Solr index. | no | yes | yes |
Metadata fields (supplied through the Collection Registry; all multivalued so an object can be related to more than one Campus, Repository, and/or Collection) | |||||
campus | string | campus stores the URL to the registry API campus object | yes | yes | yes |
campus_data | string | campus_name::campus_url | yes | yes | yes |
campus_name | string | Stores the name of the campus, so that clients don’t need to look up against the registry API | yes | yes | yes |
campus_url | string | Stores the URL to the registry API campus object | yes | yes | yes |
collection_data | string | collection_url::collection_name | yes | yes | yes |
collection_name | string | Stores the name of the collection, so that clients don’t need to look up against the registry API | yes | yes | yes |
collection_url | string | Stores the URL to the registry API collection object | yes | yes | yes |
repository_data | string | repository_url::repository_name | yes | yes | yes |
repository_name | string | Stores the name of the repository, so that clients don’t need to look up against the registry API | yes | yes | yes |
repository_url | string | Stores the URL to the registry API repository object | yes | yes | yes |
Metadata fields (stored and indexed as strings, instead of tokenized text) NOTE: Use these values for display in your app | |||||
alternative_title_ss | string | yes | yes | yes | |
contributor_ss | string | yes | yes | yes | |
coverage_ss | string | yes | yes | yes | |
creator_ss | string | yes | yes | yes | |
date_ss | string | yes | yes | yes | |
extent_ss | string | yes | yes | yes | |
format_ss | string | yes | yes | yes | |
genre_ss | string | yes | yes | yes | |
identifier_ss | string | yes | yes | yes | |
item_count | integer | Used to indicate complex objets (indicates the total number of components, for a given complex object) | no | yes | yes |
language_ss | string | yes | yes | yes | |
location_ss | string | yes | yes | yes | |
provenance_ss | string | yes | yes | yes | |
publisher_ss | string | yes | yes | yes | |
relation_ss | string | yes | yes | yes | |
rights_ss | string | yes | yes | yes | |
rights_holder_ss | string | yes | yes | yes | |
rights_note_ss | string | yes | yes | yes | |
rights_date_ss | string | yes | yes | yes | |
rights_uri | string | Used to indicate a URI to a rights expression (e.g., Creative Commons, RightsStatements.org) | no | yes | yes |
source_ss | string | yes | yes | yes | |
spatial_ss | string | yes | yes | yes | |
subject_ss | string | yes | yes | yes | |
temporal_ss | string | yes | yes | yes | |
title_ss | string | only required field | yes | yes | yes |
type_ss | string | yes | yes | yes | |
Sort fields (Normalized fields to enable easier sorting) | |||||
sort_collection_data | string | collection data with a normalized collection name for sorting | yes | yes | yes |
sort_date_start | date | no | yes | yes | |
sort_date_end | date | no | yes | yes | |
sort_title | alphaSpaceSort | Version of title used for lexical ordering | no | yes | yes |
Content file fields | |||||
manifest | string | intended for IIIF manifest information (forthcoming) | no | no | yes |
object_template | string | intended for Nuxeo object form/genre, to facilitate display in front-end (forthcoming) | no | no | yes |
url_item | string | bestguess at home URL for the item. Filled in at time of harvesting, currently indexed to search for items with it filled in | no | yes | yes |
reference_image_dimensions | string | Pixel width:height. | no | no | yes |
reference_image_md5 | string | not indexed; holds the md5 of the best image found for image objects this will then be passed to the thumbnail server for nicely sized images. For now you can use md5s3stash to calculate the URL to image | no | yes | yes |
structmap_text | string | no | yes | no | |
structmap_url | string | Only present for objects harvested from Nuxeo https://github.com/ucldc/ucldc-docs/wiki/media.json | no | no | yes |
Metadata fields (stored and indexed as tokenized text) NOTE: Use these for searching a specific field. In future versions of the index, these may not be stored. Use the _s or _ss versions of the field for display of values. These will remain indexed for seaching against the tokenized values | |||||
alternative_title | text_general | yes | yes | yes | |
contributor | text_general | yes | yes | yes | |
coverage | text_general | yes | yes | yes | |
creator | text_general | yes | yes | yes | |
date | text_general | yes | yes | yes | |
description | text_general | yes | yes | yes | |
extent | text_general | yes | yes | yes | |
facet_decade | string | yes | yes | yes | |
format | text_general | yes | yes | yes | |
genre | text_general | yes | yes | yes | |
identifier | text_general | yes | yes | yes | |
language | text_general | yes | yes | yes | |
location | text_general | yes | yes | yes | |
provenance | text_general | yes | yes | yes | |
publisher | text_general | yes | yes | yes | |
relation | text_general | yes | yes | yes | |
rights | text_general | yes | yes | yes | |
rights_holder | text_general | yes | yes | yes | |
rights_note | text_general | yes | yes | yes | |
rights_date | text_general | yes | yes | yes | |
rights_uri | string | no | yes | yes | |
source | text_general | yes | yes | yes | |
spatial | text_general | yes | yes | yes | |
subject | text_general | yes | yes | yes | |
temporal | text_general | yes | yes | yes | |
transcription | text_general | yes | yes | yes | |
title | text_general | only required field | yes | yes | yes |
type | text_general | yes | yes | yes |
Collection Registry API
The Collection Registry has a HATEOAS API powered by Tasty Pie.
HATEOAS APIs attempt to be self describing. All available endpoints may be discovered from the base URL.
Format
The following formats are supported: [‘json’, ‘jsonp’, ‘xml’, ‘yaml’, ‘plist’]
Base
Respository
Collection
Campus
Institution JSON
- join
http://dsc.cdlib.org/institution-json/
with the"ark":
(no trailing slash) in Repository or Campus to get address, phone number, email etc. from the voro dashboard.
The API is configured in django in this file.