Skip to content

Commit e235257

Browse files
committed
Merge branch 'develop' into 9464-schema-creator-validator
2 parents 866b5ea + 960751f commit e235257

File tree

18 files changed

+534
-232
lines changed

18 files changed

+534
-232
lines changed

conf/solr/9.3.0/solrconfig.xml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -588,6 +588,7 @@
588588
check for "Circuit Breakers tripped" in logs and the corresponding error message should tell
589589
you what transpired (if the failure was caused by tripped circuit breakers).
590590
-->
591+
591592
<!--
592593
<str name="memEnabled">true</str>
593594
<str name="memThreshold">75</str>
@@ -599,10 +600,12 @@
599600
whether the circuit breaker is enabled and the average load over the last minute at which the
600601
circuit breaker should start rejecting queries.
601602
-->
603+
602604
<!--
603605
<str name="cpuEnabled">true</str>
604606
<str name="cpuThreshold">75</str>
605607
-->
608+
606609
</circuitBreaker>
607610

608611
<!-- Request Dispatcher
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- As of this release application-side support is added for the "circuit breaker" mechanism in Solr that makes it drop requests more gracefully when the search engine is experiencing load issues.
2+
3+
Please see the "Installing Solr" section of the Installation Prerequisites guide.
4+

doc/sphinx-guides/source/installation/config.rst

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2868,7 +2868,6 @@ To enable setting file-level PIDs per collection::
28682868

28692869
When :AllowEnablingFilePIDsPerCollection is true, setting File PIDs to be enabled/disabled for a given collection can be done via the Native API - see :ref:`collection-attributes-api` in the Native API Guide.
28702870

2871-
28722871
.. _:IndependentHandleService:
28732872

28742873
:IndependentHandleService
@@ -3109,6 +3108,21 @@ If ``:SolrFullTextIndexing`` is set to true, the content of files of any size wi
31093108

31103109
``curl -X PUT -d 314572800 http://localhost:8080/api/admin/settings/:SolrMaxFileSizeForFullTextIndexing``
31113110

3111+
3112+
.. _:DisableSolrFacets:
3113+
3114+
:DisableSolrFacets
3115+
++++++++++++++++++
3116+
3117+
Setting this to ``true`` will make the collection ("dataverse") page start showing search results without the usual search facets on the left side of the page. A message will be shown in that column informing the users that facets are temporarily unavailable. Generating the facets is more resource-intensive for Solr than the main search results themselves, so applying this measure will significantly reduce the load on the search engine when its performance becomes an issue.
3118+
3119+
This setting can be used in combination with the "circuit breaker" mechanism on the Solr side (see the "Installing Solr" section of the Installation Prerequisites guide). An admin can choose to enable it, or even create an automated system for enabling it in response to Solr beginning to drop incoming requests with the HTTP code 503.
3120+
3121+
To enable the setting::
3122+
3123+
curl -X PUT -d true "http://localhost:8080/api/admin/settings/:DisableSolrFacets"
3124+
3125+
31123126
.. _:SignUpUrl:
31133127

31143128
:SignUpUrl

doc/sphinx-guides/source/installation/prerequisites.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,25 @@ Finally, you need to tell Solr to create the core "collection1" on startup::
211211

212212
echo "name=collection1" > /usr/local/solr/solr-9.3.0/server/solr/collection1/core.properties
213213

214+
Dataverse collection ("dataverse") page uses Solr very heavily. On a busy instance this may cause the search engine to become the performance bottleneck, making these pages take increasingly longer to load, potentially affecting the overall performance of the application and/or causing Solr itself to crash. If this is observed on your instance, we recommend uncommenting the following lines in the ``<circuitBreaker ...>`` section of the ``solrconfig.xml`` file::
215+
216+
<str name="memEnabled">true</str>
217+
<str name="memThreshold">75</str>
218+
219+
and::
220+
221+
<str name="cpuEnabled">true</str>
222+
<str name="cpuThreshold">75</str>
223+
224+
This will activate Solr "circuit breaker" mechanisms that make it start dropping incoming requests with the HTTP code 503 when it starts experiencing load issues. As of Dataverse 6.1, the collection page will recognize this condition and display a customizeable message to the users informing them that the search engine is unavailable because of heavy load, with the assumption that the condition is transitive and suggesting that they try again later. This is still an inconvenience to the users, but still a more graceful handling of the problem, rather than letting the pages time out or causing crashes. You may need to experiment and adjust the threshold values defined in the lines above.
225+
226+
If this becomes a common issue, another temporary workaround an admin may choose to use is to enable the following setting::
227+
228+
curl -X PUT -d true "http://localhost:8080/api/admin/settings/:DisableSolrFacets"
229+
230+
This will make the collection page show the search results without the usual search facets on the left side of the page. Another customizeable message will be shown in that column informing the users that facets are temporarily unavailable. Generating these facets is more resource-intensive for Solr than the main search results themselves, so applying this measure will significantly reduce the load on the search engine.
231+
232+
214233
Solr Init Script
215234
================
216235

src/main/java/edu/harvard/iq/dataverse/DatasetPage.java

Lines changed: 41 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -754,17 +754,29 @@ public boolean isIndexedVersion() {
754754
if (isIndexedVersion != null) {
755755
return isIndexedVersion;
756756
}
757+
758+
// Just like on the collection page, facets on the Dataset page can be
759+
// disabled instance-wide by an admin:
760+
if (settingsWrapper.isTrueForKey(SettingsServiceBean.Key.DisableSolrFacets, false)) {
761+
return isIndexedVersion = false;
762+
}
763+
757764
// The version is SUPPOSED to be indexed if it's the latest published version, or a
758-
// draft. So if none of the above is true, we return false right away:
759-
765+
// draft. So if none of the above is true, we can return false right away.
760766
if (!(workingVersion.isDraft() || isThisLatestReleasedVersion())) {
761767
return isIndexedVersion = false;
762768
}
763-
764-
// ... but if it is the latest published version or a draft, we want to test
765-
// and confirm that this version *has* actually been indexed and is searchable
766-
// (and that solr is actually up and running!), by running a quick solr search:
767-
return isIndexedVersion = isThisVersionSearchable();
769+
// If this is the latest published version, we want to confirm that this
770+
// version was successfully indexed after the last publication
771+
772+
if (isThisLatestReleasedVersion()) {
773+
return isIndexedVersion = (workingVersion.getDataset().getIndexTime() != null)
774+
&& workingVersion.getDataset().getIndexTime().after(workingVersion.getReleaseTime());
775+
}
776+
777+
// Drafts don't have the indextime stamps set/incremented when indexed,
778+
// so we'll just assume it is indexed, and will then hope for the best.
779+
return isIndexedVersion = true;
768780
}
769781

770782
/**
@@ -820,8 +832,18 @@ public List<FacetLabel> getFileTagsFacetLabels() {
820832
/**
821833
* Verifies that solr is running and that the version is indexed and searchable
822834
* @return boolean
823-
*/
835+
* Commenting out this method for now, since we have decided it was not
836+
* necessary, to query solr just to figure out if we can query solr. We will
837+
* rely solely on the latest-relesed status and the indexed timestamp from
838+
* the database for that. - L.A.
839+
*
824840
public boolean isThisVersionSearchable() {
841+
// Just like on the collection page, facets on the Dataset page can be
842+
// disabled instance-wide by an admin:
843+
if (settingsWrapper.isTrueForKey(SettingsServiceBean.Key.DisableSolrFacets, false)) {
844+
return false;
845+
}
846+
825847
SolrQuery solrQuery = new SolrQuery();
826848
827849
solrQuery.setQuery(SearchUtil.constructQuery(SearchFields.ENTITY_ID, workingVersion.getDataset().getId().toString()));
@@ -856,6 +878,7 @@ public boolean isThisVersionSearchable() {
856878
857879
return false;
858880
}
881+
*/
859882

860883
/**
861884
* Finds the list of numeric datafile ids in the Version specified, by running
@@ -967,10 +990,19 @@ public Set<Long> getFileIdsInVersionFromSolr(Long datasetVersionId, String patte
967990
logger.fine("Remote Solr Exception: " + ex.getLocalizedMessage());
968991
String msg = ex.getLocalizedMessage();
969992
if (msg.contains(SearchFields.FILE_DELETED)) {
993+
// This is a backward compatibility hook put in place many versions
994+
// ago, to accommodate instances running Solr with schemas that
995+
// don't include this flag yet. Running Solr with an up-to-date
996+
// schema has been a hard requirement for a while now; should we
997+
// remove it at this point? - L.A.
970998
fileDeletedFlagNotIndexed = true;
999+
} else {
1000+
isIndexedVersion = false;
1001+
return resultIds;
9711002
}
9721003
} catch (Exception ex) {
9731004
logger.warning("Solr exception: " + ex.getLocalizedMessage());
1005+
isIndexedVersion = false;
9741006
return resultIds;
9751007
}
9761008

@@ -983,6 +1015,7 @@ public Set<Long> getFileIdsInVersionFromSolr(Long datasetVersionId, String patte
9831015
queryResponse = solrClientService.getSolrClient().query(solrQuery);
9841016
} catch (Exception ex) {
9851017
logger.warning("Caught a Solr exception (again!): " + ex.getLocalizedMessage());
1018+
isIndexedVersion = false;
9861019
return resultIds;
9871020
}
9881021
}

src/main/java/edu/harvard/iq/dataverse/GuestbookPage.java

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -288,19 +288,21 @@ public String save() {
288288

289289
Command<Dataverse> cmd;
290290
try {
291+
// Per recent #dv-tech conversation w/ Jim - copying the code
292+
// below from his QDR branch; the code that used to be here called
293+
// UpdateDataverseCommand when saving new guestbooks, and that involved
294+
// an unnecessary reindexing of the dataverse (and, in some cases,
295+
// reindexing of the underlying datasets). - L.A.
291296
if (editMode == EditMode.CREATE || editMode == EditMode.CLONE ) {
292297
guestbook.setCreateTime(new Timestamp(new Date().getTime()));
293-
guestbook.setUsageCount(new Long(0));
298+
guestbook.setUsageCount(Long.valueOf(0));
294299
guestbook.setEnabled(true);
295300
dataverse.getGuestbooks().add(guestbook);
296-
cmd = new UpdateDataverseCommand(dataverse, null, null, dvRequestService.getDataverseRequest(), null);
297-
commandEngine.submit(cmd);
298301
create = true;
299-
} else {
300-
cmd = new UpdateDataverseGuestbookCommand(dataverse, guestbook, dvRequestService.getDataverseRequest());
301-
commandEngine.submit(cmd);
302-
}
303-
302+
}
303+
cmd = new UpdateDataverseGuestbookCommand(dataverse, guestbook, dvRequestService.getDataverseRequest());
304+
commandEngine.submit(cmd);
305+
304306
} catch (EJBException ex) {
305307
StringBuilder error = new StringBuilder();
306308
error.append(ex).append(" ");

src/main/java/edu/harvard/iq/dataverse/api/Info.java

Lines changed: 8 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
package edu.harvard.iq.dataverse.api;
22

3-
import edu.harvard.iq.dataverse.api.auth.AuthRequired;
43
import edu.harvard.iq.dataverse.settings.JvmSettings;
54
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
65
import edu.harvard.iq.dataverse.util.SystemConfig;
@@ -9,8 +8,6 @@
98
import jakarta.json.JsonValue;
109
import jakarta.ws.rs.GET;
1110
import jakarta.ws.rs.Path;
12-
import jakarta.ws.rs.container.ContainerRequestContext;
13-
import jakarta.ws.rs.core.Context;
1411
import jakarta.ws.rs.core.Response;
1512

1613
@Path("info")
@@ -35,30 +32,27 @@ public Response getMaxEmbargoDurationInMonths() {
3532
}
3633

3734
@GET
38-
@AuthRequired
3935
@Path("version")
40-
public Response getInfo(@Context ContainerRequestContext crc) {
36+
public Response getInfo() {
4137
String versionStr = systemConfig.getVersion(true);
4238
String[] comps = versionStr.split("build",2);
4339
String version = comps[0].trim();
4440
JsonValue build = comps.length > 1 ? Json.createArrayBuilder().add(comps[1].trim()).build().get(0) : JsonValue.NULL;
45-
46-
return response( req -> ok( Json.createObjectBuilder().add("version", version)
47-
.add("build", build)), getRequestUser(crc));
41+
return ok(Json.createObjectBuilder()
42+
.add("version", version)
43+
.add("build", build));
4844
}
4945

5046
@GET
51-
@AuthRequired
5247
@Path("server")
53-
public Response getServer(@Context ContainerRequestContext crc) {
54-
return response( req -> ok(JvmSettings.FQDN.lookup()), getRequestUser(crc));
48+
public Response getServer() {
49+
return ok(JvmSettings.FQDN.lookup());
5550
}
5651

5752
@GET
58-
@AuthRequired
5953
@Path("apiTermsOfUse")
60-
public Response getTermsOfUse(@Context ContainerRequestContext crc) {
61-
return response( req -> ok(systemConfig.getApiTermsOfUse()), getRequestUser(crc));
54+
public Response getTermsOfUse() {
55+
return ok(systemConfig.getApiTermsOfUse());
6256
}
6357

6458
@GET

src/main/java/edu/harvard/iq/dataverse/api/Search.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,9 @@ public Response search(
157157
numResultsPerPage,
158158
true, //SEK get query entities always for search API additional Dataset Information 6300 12/6/2019
159159
geoPoint,
160-
geoRadius
160+
geoRadius,
161+
showFacets, // facets are expensive, no need to ask for them if not requested
162+
showRelevance // no need for highlights unless requested either
161163
);
162164
} catch (SearchException ex) {
163165
Throwable cause = ex;

src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDataverseCommand.java

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ public class UpdateDataverseCommand extends AbstractCommand<Dataverse> {
3232
private final List<DatasetFieldType> facetList;
3333
private final List<Dataverse> featuredDataverseList;
3434
private final List<DataverseFieldTypeInputLevel> inputLevelList;
35+
36+
private boolean datasetsReindexRequired = false;
3537

3638
public UpdateDataverseCommand(Dataverse editedDv, List<DatasetFieldType> facetList, List<Dataverse> featuredDataverseList,
3739
DataverseRequest aRequest, List<DataverseFieldTypeInputLevel> inputLevelList ) {
@@ -74,9 +76,13 @@ public Dataverse execute(CommandContext ctxt) throws CommandException {
7476
}
7577
}
7678

77-
DataverseType oldDvType = ctxt.dataverses().find(editedDv.getId()).getDataverseType();
78-
String oldDvAlias = ctxt.dataverses().find(editedDv.getId()).getAlias();
79-
String oldDvName = ctxt.dataverses().find(editedDv.getId()).getName();
79+
Dataverse oldDv = ctxt.dataverses().find(editedDv.getId());
80+
81+
DataverseType oldDvType = oldDv.getDataverseType();
82+
String oldDvAlias = oldDv.getAlias();
83+
String oldDvName = oldDv.getName();
84+
oldDv = null;
85+
8086
Dataverse result = ctxt.dataverses().save(editedDv);
8187

8288
if ( facetList != null ) {
@@ -101,6 +107,14 @@ public Dataverse execute(CommandContext ctxt) throws CommandException {
101107
}
102108
}
103109

110+
// We don't want to reindex the children datasets unnecessarily:
111+
// When these values are changed we need to reindex all children datasets
112+
// This check is not recursive as all the values just report the immediate parent
113+
if (!oldDvType.equals(editedDv.getDataverseType())
114+
|| !oldDvName.equals(editedDv.getName())
115+
|| !oldDvAlias.equals(editedDv.getAlias())) {
116+
datasetsReindexRequired = true;
117+
}
104118

105119
return result;
106120
}
@@ -110,9 +124,16 @@ public boolean onSuccess(CommandContext ctxt, Object r) {
110124

111125
// first kick of async index of datasets
112126
// TODO: is this actually needed? Is there a better way to handle
127+
// It appears that we at some point lost some extra logic here, where
128+
// we only reindex the underlying datasets if one or more of the specific set
129+
// of fields have been changed (since these values are included in the
130+
// indexed solr documents for dataasets). So I'm putting that back. -L.A.
113131
Dataverse result = (Dataverse) r;
114-
List<Dataset> datasets = ctxt.datasets().findByOwnerId(result.getId());
115-
ctxt.index().asyncIndexDatasetList(datasets, true);
132+
133+
if (datasetsReindexRequired) {
134+
List<Dataset> datasets = ctxt.datasets().findByOwnerId(result.getId());
135+
ctxt.index().asyncIndexDatasetList(datasets, true);
136+
}
116137

117138
return ctxt.dataverses().index((Dataverse) r);
118139
}

src/main/java/edu/harvard/iq/dataverse/mydata/DataRetrieverAPI.java

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,6 @@
3939
import jakarta.ws.rs.Path;
4040
import jakarta.ws.rs.Produces;
4141
import jakarta.ws.rs.QueryParam;
42-
import jakarta.ws.rs.DefaultValue;
4342
import jakarta.ws.rs.container.ContainerRequestContext;
4443
import jakarta.ws.rs.core.Context;
4544

@@ -226,7 +225,12 @@ private SolrQueryResponse getTotalCountsFromSolr(DataverseRequest dataverseReque
226225
//SearchFields.RELEASE_OR_CREATE_DATE, SortBy.DESCENDING,
227226
0, //paginationStart,
228227
true, // dataRelatedToMe
229-
SearchConstants.NUM_SOLR_DOCS_TO_RETRIEVE //10 // SearchFields.NUM_SOLR_DOCS_TO_RETRIEVE
228+
SearchConstants.NUM_SOLR_DOCS_TO_RETRIEVE, //10 // SearchFields.NUM_SOLR_DOCS_TO_RETRIEVE
229+
true,
230+
null,
231+
null,
232+
false, // no need to request facets here ...
233+
false // ... same for highlights
230234
);
231235
} catch (SearchException ex) {
232236
logger.severe("Search for total counts failed with filter query");

src/main/java/edu/harvard/iq/dataverse/search/IndexServiceBean.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -420,6 +420,7 @@ public void asyncIndexDataset(Dataset dataset, boolean doNormalSolrDocCleanUp) {
420420
}
421421
}
422422

423+
@Asynchronous
423424
public void asyncIndexDatasetList(List<Dataset> datasets, boolean doNormalSolrDocCleanUp) {
424425
for(Dataset dataset : datasets) {
425426
asyncIndexDataset(dataset, true);

0 commit comments

Comments
 (0)