About the APIs

There are two different APIs associated with content published in Calisphere:

  • Solr API: The Solr index contains object level metadata only. Each record has collection and repository fields referencing collections and institutions in the Collection Registry.
  • Collection Registry API: The Collection Registry contains collection and institution level metadata.

Solr API

In order to access the Solr API, you will have to obtain an API Authentication Token. Request a token by contacting us.

Queries can be made using curl, or any Solr library that supports authentication tokens:

curl -H 'X-Authentication-Token: xxxx-xxxx-xxxx-xxxx' "https://solr.calisphere.org/solr/query/?q=fred"

We’re using solrpy - a python library with Solr bindings - to hit Solr: https://github.com/edsu/solrpy.

The following are common query parameters. Consult the Solr documentation for a complete list of available query parameters.

q - default query parameter, searches the text field by default unless a different field is otherwise specified. Takes a comma separated list of field: search_string pairs:

rows - number of objects to return, default value is 10

start - object to start on, default value is 0 - specifying start and rows together can create pagination

fq - filter the query by specifying a value for a field. Filter on the string (_ss) version of the field to avoid tokenization and provide exact matches

facet - true|false

facet_limit - number of facets to return (set to -1 to display all facets)

facet_field - fields to return facets for (ex: collection_name)

The Solr Fields

This scheme is still undergoing active development. Find the most up-to-date scheme on GitHub: github.com/ucldc/solr_api/.

  • Name: indicates the field name
  • Type: indicates the field type
  • Comments: notes regarding use of the field
  • Multi-valued: indicates if field is repeatable
  • Indexed: indicates if the value of the field can be used in queries to retrieve matching documents
  • Stored: indicates if the value of the field is stored in the index, and the value of the field can be retrieved by queries

 

Name

Type

Comments

Multi-Valued

Indexed?

Stored?

General and administrative fields
created
date
refers to creation of the metadata document, not creation of the Solr document, nor creation of the content object
no
yes
yes
created_s
string
string variant of created for wildcard searching
no
yes
yes
id
string
Unique identifier assigned by CDL to the object, derived from identifier (if the value is an ARK) or otherwise auto-generated. This value also is also used within the context of the URL for the object in Calisphere.
no
yes
yes
last_modified
date
refers to the date the metadata document was last modified
no
yes
yes
last_modified_s
string
string variant of last_modified for wildcard searching
no
yes
yes
text
text_general
not stored; catchall text field for keyword search that indexes tokens - for each object, contains the following fields: title, contributor, creator, coverage, date, description, extent, format, identifier, language, publisher, relation, rights, source, subject, and type
yes
yes
no
text_rev
text_general_rev
not stored; the same as the text field, but in reverse for efficient leading wildcard queries
yes
yes
no
timestamp
date
timestampon the Solr document - default value is NOW, ie the time of object creation in the Solr index.
no
yes
yes
Metadata fields (supplied through the Collection Registry; all multivalued so an object can be related to more than one Campus, Repository, and/or Collection)
campus
string
campus stores the URL to the registry API campus object
yes
yes
yes
campus_data
string
campus_name::campus_url
yes
yes
yes
campus_name
string
Stores the name of the campus, so that clients don’t need to look up against the registry API
yes
yes
yes
campus_url
string
Stores the URL to the registry API campus object
yes
yes
yes
collection_data
string
collection_url::collection_name
yes
yes
yes
collection_name
string
Stores the name of the collection, so that clients don’t need to look up against the registry API
yes
yes
yes
collection_url
string
Stores the URL to the registry API collection object
yes
yes
yes
repository_data
string
repository_url::repository_name
yes
yes
yes
repository_name
string
Stores the name of the repository, so that clients don’t need to look up against the registry API
yes
yes
yes
repository_url
string
Stores the URL to the registry API repository object
yes
yes
yes
Metadata fields (stored and indexed as strings, instead of tokenized text) NOTE: Use these values for display in your app
alternative_title_ss
string

yes
yes
yes
contributor_ss
string

yes
yes
yes
coverage_ss
string

yes
yes
yes
creator_ss
string

yes
yes
yes
date_ss
string

yes
yes
yes
extent_ss
string

yes
yes
yes
format_ss
string

yes
yes
yes
genre_ss
string

yes
yes
yes
identifier_ss
string

yes
yes
yes
item_count
integer
Used to indicate complex objets (indicates the total number of components, for a given complex object)
no
yes
yes
language_ss
string

yes
yes
yes
location_ss
string

yes
yes
yes
provenance_ss
string

yes
yes
yes
publisher_ss
string

yes
yes
yes
relation_ss
string

yes
yes
yes
rights_ss
string

yes
yes
yes
rights_holder_ss
string

yes
yes
yes
rights_note_ss
string

yes
yes
yes
rights_date_ss
string

yes
yes
yes
rights_uri
string
Used to indicate a URI to a rights expression (e.g., Creative Commons, RightsStatements.org)
no
yes
yes
source_ss
string

yes
yes
yes
spatial_ss
string

yes
yes
yes
subject_ss
string

yes
yes
yes
temporal_ss
string

yes
yes
yes
title_ss
string
only required field
yes
yes
yes
type_ss
string

yes
yes
yes
Sort fields (Normalized fields to enable easier sorting)
sort_collection_data
string
collection data with a normalized collection name for sorting
yes
yes
yes
sort_date_start
date

no
yes
yes
sort_date_end
date

no
yes
yes
sort_title
alphaSpaceSort
Version of title used for lexical ordering
no
yes
yes
Content file fields
manifest
string
intended for IIIF manifest information (forthcoming)
no
no
yes
object_template
string
intended for Nuxeo object form/genre, to facilitate display in front-end (forthcoming)
no
no
yes
url_item
string
bestguess at home URL for the item. Filled in at time of harvesting, currently indexed to search for items with it filled in
no
yes
yes
reference_image_dimensions
string
Pixel width:height.
no
no
yes
reference_image_md5
string
not indexed; holds the md5 of the best image found for image objects this will then be passed to the thumbnail server for nicely sized images. For now you can use md5s3stash to calculate the URL to image
no
yes
yes
structmap_text
string

no
yes
no
structmap_url
string
Only present for objects harvested from Nuxeo
https://github.com/ucldc/ucldc-docs/wiki/media.json
no
no
yes
Metadata fields (stored and indexed as tokenized text) NOTE: Use these for searching a specific field. In future versions of the index, these may not be stored. Use the _s or _ss versions of the field for display of values. These will remain indexed for seaching against the tokenized values
alternative_title
text_general

yes
yes
yes
contributor
text_general

yes
yes
yes
coverage
text_general

yes
yes
yes
creator
text_general

yes
yes
yes
date
text_general

yes
yes
yes
description
text_general

yes
yes
yes
extent
text_general

yes
yes
yes
facet_decade
string

yes
yes
yes
format
text_general

yes
yes
yes
genre
text_general

yes
yes
yes
identifier
text_general

yes
yes
yes
language
text_general

yes
yes
yes
location
text_general

yes
yes
yes
provenance
text_general

yes
yes
yes
publisher
text_general

yes
yes
yes
relation
text_general

yes
yes
yes
rights
text_general

yes
yes
yes
rights_holder
text_general

yes
yes
yes
rights_note
text_general

yes
yes
yes
rights_date
text_general

yes
yes
yes
rights_uri
string

no
yes
yes
source
text_general

yes
yes
yes
spatial
text_general

yes
yes
yes
subject
text_general

yes
yes
yes
temporal
text_general

yes
yes
yes
transcription
text_general

yes
yes
yes
title
text_general
only required field
yes
yes
yes
type
text_general

yes
yes
yes

Collection Registry API

The Collection Registry has a HATEOAS API powered by Tasty Pie.

HATEOAS APIs attempt to be self describing.  All available endpoints may be discovered from the base URL.

Format

The following formats are supported: [‘json’, ‘jsonp’, ‘xml’, ‘yaml’, ‘plist’]

Base

Respository

Collection

Campus

Institution JSON

  • join http://dsc.cdlib.org/institution-json/ with the "ark": (no trailing slash) in Repository or Campus to get address, phone number, email etc. from the voro dashboard.

The API is configured in django in this file.