Extent of content hosted in Nuxeo
Extent statistics for the total file size of objects in Nuxeo are available in the Admin/Aggregate project folder, within Nuxeo itself. The "deduplicated count" columns reflect the total size of truly unique files managed in Nuxeo, which includes 1) Main Content Files and auxiliary/supplemental files directly imported into Nuxeo, and 2) derivative files automatically generated by Nuxeo.
We also maintain logs for every file in Nuxeo with filesize and fixity data. The logs are available in the Admin project folder within Nuxeo itself, and the logs are subsetted into individual project folders for each campus library (e.g., Admin/UCR).
Here is an example row of data, from one of the files, with an explanation of the individual data elements:
- name=sim_182_000_011_0010.tif data=http://localhost:8080/nuxeo/nxfile/default/4afa5bfb-d589-4064-803b-76886d8e3d2d/file:content/sim_182_000_011_0010.tif
uid: The first element is the unique identifier for the object, automatically generated and assigned by Nuxeo
path: This indicates the full path/directory for the file in Nuxeo:
xpath: Nuxeo documents can be tought of as XML documents with a special "binary" node type. This is an XPath in the Nuxeo document that can be used to access the file.
name: This is the source filename for the Main Content File, as ingested into Nuxeo.
data: This URL (with modifications) can be used as the basis to download the file
md5: this is a checksum for the file. It is also used as the filename for the object within the context of Amazon S3.
size: Filesize, in bytes
size_h: Filesize in 1024 based metric units.
media: This designes the MIME type for the file
Number of Published Objects (i.e., harvested to Calisphere)
Extent statistics for the number of objects published in Calisphere are available at https://voro.cdlib.org/calisphere.org/. (And, just in case you're also interested in Usage Stats for your published objects, you can read about how to get those in the Calisphere guide.)
UC Libraries Statistics Schedule F Reporting Tips
Each year, every campus library prepares an annual snapshot of holdings information which are collocated by CDL on behalf of the UC Libraries. Schedule F, in particular, is used to report digital holdings for a given year. We have the following recommendations for factoring in Nuxeo-based holdings:
Number of collections: Count number of project folders in Nuxeo (if your project folders have a one-to-one alignment with how your library model "collections"). Alternatively, count the total number of your Nuxeo-based collections that have been published in Calisphere. (If you take the latter approach, we recommend indicating a caveat that you are only counting published collections).
Megabytes: You can obtain total file size stats in Nuxeo -- see instructions above on obtaining extent information, using the statistics in the Admin/Aggregate project folder.
Items: Contact us to obtain a count of the total number of items in Nuxeo.