Provenance in APPLAUSE DR3
Provenance is defined as “the fact of coming from some particular source or quarter; source, derivation” in the Oxford English dictionary. The term was originally mostly used in relation to works of art but is now used in a similar sense in a wide range of fields, including science and computing. For the APPLAUSE archive, provenance information contains the path between the plate taken at its time and location to the extracted sources we offer in our source_calib or lightcurve table (or: as much information about the steps involved as available, including scanning process, used software, organisations and people who were involved. Provenance is closely related to the workflow creating the digital objects.
All the example queries below are using ‘postgres’ as query dialect. You can cut and paste the examples directly in APPLAUSES query interface.
First example: how to find out what sources, plates and corresponding scans (with their previews) were involved in the construction of the lightcurve with the UCAC4 id 614-089373:
SELECT lc.source_id, lc.scan_id, lc.plate_id, preview.filename
FROM APPLAUSE_DR2.lightcurve AS lc
INNER JOIN APPLAUSE_DR3.plate AS pl ON lc.plate_id = pl.plate_id
INNER JOIN APPLAUSE_DR3.preview AS preview ON preview.plate_id = pl.plate_id
WHERE lc.ucac4_id = '614-089373'
The result table has the source ids, plate ids, scan ids and conveniently the previews for the scans.
APPLAUSE DR3 stores the workflow/provenance information alongside the data. Each scan is tied to the plate by the pair (scan_id, plate_id), as well each process is connected to (scan_id, plate_id). The plates are ordered in collections – archives. So each plate also has the information which archive it belongs to.
Example 2: Which version of Pyplate were used on the plate with the number 2180 and the scans taken from this plate:
SELECT distinct pr.scan_id, pr.process_id, pr.pyplate_version, pl.archive_id
FROM applause_dr3.plate AS pl
INNER JOIN applause_dr3.process AS pr ON pr.plate_id = pl.plate_id
WHERE pl.plate_id = 2180
Example 3: Get all information on plate 2180 from scanning to source extraction and crossmatching.
SELECT sc.plate_id, sc.scan_id AS scan, sc."scan_date" AS scanned_on,
sc.filename_scan, pr.pyplate_version AS pyplate, pr.timestamp_end AS processed_on
FROM applause_dr3.scan AS sc
INNER JOIN applause_dr3.process AS pr ON pr.scan_id = sc.scan_id
WHERE sc.plate_id = 2180