Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for "gold" Changemaker data #1300

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ With the above overview in mind, we can now summarize what this service layer do
- Store **proposals** -- the actual responses to opportunities, with long-term consistency provided via the above-described mapping.
- **Authenticate** users and provide **access control**, so that a given organization's data is only shared with whom that organization has authorized.
- Provide a **programmatic interface** (an [API](https://en.wikipedia.org/wiki/API)) by which authorized users (both changemakers and funders) can **browse**, **search**, and, where appropriate, **update** opportunities and proposals, subject to the access controls defined by data owners.
- Track the **provenance** and **update history** of all information, noticing and handling discrepancies. For example, if two different [GMS](https://en.wikipedia.org/wiki/Grant_management_software) tools connect to the PDC and provide conflicting information about an application or an applicant, the PDC may be able to pick the right answer automatically (based on a up-to-date date or on some other precedence rule), or it may flag the conflict and require a human to resolve it.
- Track the **provenance** and **update history** of all information, noticing and handling discrepancies. For example, if two different [GMS](https://en.wikipedia.org/wiki/Grant_management_software) tools connect to the PDC and provide conflicting information about an application or an applicant, the PDC may be able to pick the right answer automatically (based on a up-to-date date or on some other precedence rule), or it may flag the conflict and require a human to resolve it. See [more on Changemaker data](docs/CHANGEMAKER_DATA.md).

Of all these features, the API is probably the most important, because it is the heart of the PDC's interoperability. It enables GMSs and other systems to connect to the PDC to give and receive information about opportunities and proposals. For example, it can enable a second funder to discover a proposal that a changemaker had proposed to some other potential funder originally; it even provides ways for the originally considered funder to deliberately share (assuming the changemaker authorizes) a good proposal with a specific funder that might be more appropriate for it.

Expand Down
59 changes: 59 additions & 0 deletions docs/CHANGEMAKER_DATA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Changemaker data in the Philanthropy Data Commons (PDC)

Data about Changemakers (such as grant seekers, applicants, non-profits, etc.),
are stored and retrieved via the central PDC instance API.

## Storing Changemaker data

To add a Changemaker to the PDC, use `POST /changemakers`. Note, however, that
this endpoint only registers a Changemaker by tax ID and name. It is not the
endpoint to add other Changemaker attributes such as location or contacts. One
writes Changemaker data to PDC via `POST /proposalVersions` and its associated
endpoints. In other words, proposal data contains much data about Changemakers,
therefore proposal data is an important source of Changemaker data within the
PDC. As of this writing proposals are the sole source of Changemaker data.

## Viewing Changemaker data

To see Changemaker data in the PDC, use `GET /changemakers`. This endpoint
retrieves rich data about Changemakers. The rich data retrieved from `GET
/changemakers` are aggregated and prioritized by the back-end service to present
a best-effort, aka "gold", version of attributes of a Changemaker from PDC data.
For each base field in the PDC that has more than one associated response value
for the given Changemaker, PDC returns exactly one prioritized value. Each
returned value may come from a separate data source, such as a proposal to a
funder, a data platform provider (DataProvider in PDC), or the Changemakers
themselves. As of this writing values come solely from proposals.

### Data prioritization or conflict resolution ("gold" data)

Changemaker data can vary across or within data sources. The PDC automatically
selects the best available data on a field-by-field basis using a heuristic. As
of this writing, the PDC uses the following heuristic:

- only valid data (i.e. well-formatted data) are returned,
- Changemaker-sourced data are better than Funder-sourced data,
- Funder-sourced data are better than DataProvider-sourced data,
- DataProvider-sourced data are better than old PDC (source unknown) data,
- and newer data are better than older data.

The "valid data" rule is a hard filter. No invalid data are returned. The next
three rules are to choose from among categories of sources. The last rule breaks
a tie when there are values from multiple sources within a category.

For example, if the same funder posted multiple proposals from a given
Changemaker to the PDC, and these were the only source of data for that
Changemaker, the most recent data values for each base field would be returned.
However, if a Changemaker (theoretically as of this writing) added a value to
the PDC, that Changemaker-added value would take priority for that one base
field. In either case, the response may have data from multiple sources because
the prioritization applies to each base field having associated data values.

For exact prioritization details, see the source code at
`src/database/initialization/changemaker_to_json.sql`.

Only authenticated users may see rich field values. Unauthenticated users will
always see an empty list of field values in the response.

In the future, with finer-grained permissioning, any given user should only see
the values that user has been authorized to see.
Loading