Skip to content

Commit b96f2fd

Browse files
namedgraphclaude
andauthored
Fix connection pool exhaustion from proxy requests (#292)
* Fix connection pool exhaustion from ?uri= proxy requests (#287) - Add 30 s connectionRequestTimeout to both HTTP client builders in Application so pool exhaustion fails fast instead of blocking forever - Replace allMatch(HTMLMediaTypePredicate) with Request.selectVariant() in ProxyRequestFilter so real browser Accept headers (text/html, application/xml;q=0.9, */*;q=0.8) correctly trigger the early return, leaving (X)HTML responses to the downstream handler and Varnish cache - In client.xsl ldh:rdf-document-response, detect external ?uri= URIs and replace-content on #content-body with bs2:Row rendering of the fetched RDF instead of iterating stale home-page blocks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Server-side condition * Server-side progress bar * Fix external URI proxy bypass and client-side rendering - ProxyRequestFilter: use Core-only MediaTypes (no HTML) with combined Model+ResultSet writable variant list; selectVariant==null is the sole bypass signal so Accept:*/* correctly reaches the proxy instead of falling through to the HTML handler - Thread pre-computed Variant through all getResponse() overloads to avoid a second selectVariant call inside Core's Response constructor - client.xsl onsubmit: skip the XHTML round-trip for external URIs and call PushState + RDFDocumentLoad directly, advancing the progress bar to 66% between the two steps; fixes the double-click issue - client.xsl ldh:rdf-document-response: respect the #layout-modes mode selector for client-side rendered external resources; refactor the duplicate id('content-body') lookup out of both xsl:choose branches - ProxyRequestFilterTest: stub Request.selectVariant() to return a non-null Variant so both tests reach the logic they exercise Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ProxyRequestFilter: document HTML bypass rationale; cache MediaTypes instance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ProxyRequestFilter: clarify HTML bypass as resource exhaustion defence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Make HTTP client connectionRequestTimeout configurable Defaults to 30000 ms (via Dockerfile ENV). Passed through the CATALINA_OPTS path (same as allowInternalUrls) to avoid exceeding the ~30-param libxslt limit already reached by context.xsl. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix ProxyRequestFilter HTML bypass: check Accept header explicitly Replace the selectVariant==null bypass with an explicit check for non-wildcard text/html or application/xhtml+xml in the Accept header. Browsers list these types explicitly (q=1.0) and get bypassed to the app shell; API clients that send only */* reach the proxy. The old approach (Core MediaTypes, selectVariant==null) failed for browsers because their */*;q=0.8 wildcard matched RDF variants, causing the proxy to return RDF instead of the (X)HTML app shell. Add testHtmlAcceptBypassesProxy to cover the bypass path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix ContentMode block rendering for proxied external resources Proxied resources' ContentMode blocks (charts, maps) were querying the local SPARQL endpoint instead of the remote one because ProxyRequestFilter discarded all external response headers and ResponseHeadersFilter then injected the local sd:endpoint Link. - ApplicationFilter: register external ?uri= target in request context (AC.uri property) as authoritative proxy marker - ProxyRequestFilter: forward all Link headers from external response - ResponseHeadersFilter: skip local sd:endpoint/ldt:ontology/ac:stylesheet for proxy requests; removes now-unused parseLinkHeaderValues/getLinksByRel - client.xsl (ldh:rdf-document-response): extract sd:endpoint from Link header and store in LinkedDataHub.endpoint, mirroring acl:mode pattern - functions.xsl (sd:endpoint()): return LinkedDataHub.endpoint when set, fall back to local /sparql — no changes needed in view.xsl or chart.xsl - CLAUDE.md: document the proxy/client-side rendering architecture Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Replace proxy-detection heuristics with ac:uri() / $ac:uri throughout - ApplicationFilter: store external URI as AC.uri context property; strip ?uri= from UriInfo - ProxyRequestFilter: read proxy target from AC.uri context property; bypass HTML requests - XsltExecutableFilter: remove SYSTEM_ID_PROPERTY; XSLTWriterBase reads AC.uri directly - XSLTWriterBase: pass $ac:uri to server-side XSLT when proxying - layout.xsl: declare $ac:uri param; use it for export links and search input pre-fill - document.xsl: remove proxy spinner branch from bs2:ContentBody - client/functions.xsl: add ac:uri() function (dynamic read of ixsl:query-params()?uri); ldh:base-uri() now calls ac:uri() instead of stale global $ac:uri - client.xsl: drop global $ac:uri param; ldh:HTMLDocumentLoaded passes ldh:base-uri(.) to ldh:RDFDocumentLoad after pushState so URL is already updated - ProxyRequestFilterTest: update mocks to use AC.uri context property Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add server-side ac:uri() function; refactor ActionBar templates into document.xsl - Add ac:uri() server-side function to imports/default.xsl (mirrors acl:mode() pattern) - Move ActionBarLeft/ActionBarMain/ActionBarRight/BreadCrumbBar/ModeList/MediaTypeList templates from layout.xsl to document.xsl - Fix $effective-mode type error (xs:string → xs:anyURI) and simplify with [1] idiom - Use ac:uri() instead of $ac:uri in MediaTypeList hrefs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent fdd5a72 commit b96f2fd

File tree

17 files changed

+644
-502
lines changed

17 files changed

+644
-502
lines changed

CLAUDE.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,28 @@ The application runs as a multi-container setup:
9191
1. Requests come through nginx proxy
9292
2. Varnish provides caching layer
9393
3. LinkedDataHub application handles business logic
94-
4. Data persisted to appropriate Fuseki triplestore
95-
5. XSLT transforms data for client presentation
94+
4. RDF data is read/written via the **Graph Store Protocol** — each document in the hierarchy corresponds to a named graph in the triplestore; the document URI is the graph name
95+
5. Data persisted to appropriate Fuseki triplestore
96+
6. XSLT transforms data for client presentation
97+
98+
### Linked Data Proxy and Client-Side Rendering
99+
100+
LDH includes a Linked Data proxy that dereferences external URIs on behalf of the browser. The original design rendered proxied resources identically to local ones — server-side RDF fetch + XSLT. This created a DDoS/resource-exhaustion vector: scraper bots routing arbitrary external URIs through the proxy would trigger a full server-side pipeline (HTTP fetch → XSLT rendering) per request, exhausting HTTP connection pools and CPU.
101+
102+
The current design splits rendering by request origin:
103+
104+
- **Browser requests** (`Accept: text/html`): `ProxyRequestFilter` bypasses the proxy entirely. The server returns the local application shell. Saxon-JS then issues a second, RDF-typed request (`Accept: application/rdf+xml`) from the browser.
105+
- **RDF requests** (API clients, Saxon-JS second pass): `ProxyRequestFilter` fetches the external RDF, parses it, and returns it to the caller. No XSLT happens server-side.
106+
- **Client-side rendering**: Saxon-JS receives the raw RDF and applies the same XSLT 3 templates used server-side (shared stylesheet), so proxied resources look almost identical to local ones.
107+
108+
Key implementation files:
109+
- `ProxyRequestFilter.java` — intercepts `?uri=` and `lapp:Dataset` proxy requests; HTML bypass; forwards external `Link` headers
110+
- `ApplicationFilter.java` — registers external proxy target URI in request context (`AC.uri` property) as authoritative proxy marker
111+
- `ResponseHeadersFilter.java` — skips local-only hypermedia links (`sd:endpoint`, `ldt:ontology`, `ac:stylesheet`) for proxy requests; external ones are forwarded by `ProxyRequestFilter`
112+
- `client.xsl` (`ldh:rdf-document-response`) — receives the RDF proxy response client-side; extracts `sd:endpoint` from `Link` header; stores it in `LinkedDataHub.endpoint`
113+
- `functions.xsl` (`sd:endpoint()`) — returns `LinkedDataHub.endpoint` when set (external proxy), otherwise falls back to the local SPARQL endpoint
114+
115+
The SPARQL endpoint forwarding chain ensures ContentMode blocks (charts, maps) query the **remote** app's SPARQL endpoint, not the local one. `LinkedDataHub.endpoint` is reset to the local endpoint by `ldh:HTMLDocumentLoaded` on every HTML page navigation, so there is no stale state when navigating back to local documents.
96116

97117
### Key Extension Points
98118
- **Vocabulary definitions** in `com.atomgraph.linkeddatahub.vocabulary`

Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,8 @@ ENV MAX_TOTAL_CONN=40
109109

110110
ENV MAX_REQUEST_RETRIES=3
111111

112+
ENV CONNECTION_REQUEST_TIMEOUT=30000
113+
112114
ENV IMPORT_KEEPALIVE=
113115

114116
ENV MAX_IMPORT_THREADS=10

docker-compose.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ services:
6565
- SIGN_UP_CERT_VALIDITY=180
6666
- MAX_CONTENT_LENGTH=${MAX_CONTENT_LENGTH:-2097152}
6767
- ALLOW_INTERNAL_URLS=${ALLOW_INTERNAL_URLS:-}
68+
- CONNECTION_REQUEST_TIMEOUT=${CONNECTION_REQUEST_TIMEOUT:-}
6869
- NOTIFICATION_ADDRESS=LinkedDataHub <notifications@localhost>
6970
- MAIL_SMTP_HOST=email-server
7071
- MAIL_SMTP_PORT=25

platform/entrypoint.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1037,6 +1037,10 @@ if [ -n "$ALLOW_INTERNAL_URLS" ]; then
10371037
export CATALINA_OPTS="$CATALINA_OPTS -Dcom.atomgraph.linkeddatahub.allowInternalUrls=$ALLOW_INTERNAL_URLS"
10381038
fi
10391039

1040+
if [ -n "$CONNECTION_REQUEST_TIMEOUT" ]; then
1041+
export CATALINA_OPTS="$CATALINA_OPTS -Dcom.atomgraph.linkeddatahub.connectionRequestTimeout=$CONNECTION_REQUEST_TIMEOUT"
1042+
fi
1043+
10401044
if [ -n "$MAX_CONTENT_LENGTH" ]; then
10411045
MAX_CONTENT_LENGTH_PARAM="--stringparam ldhc:maxContentLength '$MAX_CONTENT_LENGTH' "
10421046
fi

src/main/java/com/atomgraph/linkeddatahub/Application.java

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@
214214
import org.apache.http.HttpClientConnection;
215215
import org.apache.http.HttpHost;
216216
import org.apache.http.client.HttpRequestRetryHandler;
217+
import org.apache.http.client.config.RequestConfig;
217218
import org.apache.http.config.Registry;
218219
import org.apache.http.config.RegistryBuilder;
219220
import org.apache.http.conn.socket.ConnectionSocketFactory;
@@ -358,6 +359,8 @@ public Application(@Context ServletConfig servletConfig) throws URISyntaxExcepti
358359
servletConfig.getServletContext().getInitParameter(LDHC.maxConnPerRoute.getURI()) != null ? Integer.valueOf(servletConfig.getServletContext().getInitParameter(LDHC.maxConnPerRoute.getURI())) : null,
359360
servletConfig.getServletContext().getInitParameter(LDHC.maxTotalConn.getURI()) != null ? Integer.valueOf(servletConfig.getServletContext().getInitParameter(LDHC.maxTotalConn.getURI())) : null,
360361
servletConfig.getServletContext().getInitParameter(LDHC.maxRequestRetries.getURI()) != null ? Integer.valueOf(servletConfig.getServletContext().getInitParameter(LDHC.maxRequestRetries.getURI())) : null,
362+
System.getProperty("com.atomgraph.linkeddatahub.connectionRequestTimeout") != null ? Integer.valueOf(System.getProperty("com.atomgraph.linkeddatahub.connectionRequestTimeout")) :
363+
servletConfig.getServletContext().getInitParameter(LDHC.connectionRequestTimeout.getURI()) != null ? Integer.valueOf(servletConfig.getServletContext().getInitParameter(LDHC.connectionRequestTimeout.getURI())) : null,
361364
servletConfig.getServletContext().getInitParameter(LDHC.maxImportThreads.getURI()) != null ? Integer.valueOf(servletConfig.getServletContext().getInitParameter(LDHC.maxImportThreads.getURI())) : null,
362365
servletConfig.getServletContext().getInitParameter(LDHC.notificationAddress.getURI()) != null ? servletConfig.getServletContext().getInitParameter(LDHC.notificationAddress.getURI()) : null,
363366
servletConfig.getServletContext().getInitParameter(LDHC.supportedLanguages.getURI()) != null ? servletConfig.getServletContext().getInitParameter(LDHC.supportedLanguages.getURI()) : null,
@@ -445,7 +448,7 @@ public Application(final ServletConfig servletConfig, final MediaTypes mediaType
445448
final String baseURIString, final String proxyScheme, final String proxyHostname, final Integer proxyPort,
446449
final String uploadRootString, final boolean invalidateCache,
447450
final Integer cookieMaxAge, final boolean enableLinkedDataProxy, final boolean allowInternalUrls, final Integer maxContentLength,
448-
final Integer maxConnPerRoute, final Integer maxTotalConn, final Integer maxRequestRetries, final Integer maxImportThreads,
451+
final Integer maxConnPerRoute, final Integer maxTotalConn, final Integer maxRequestRetries, final Integer connectionRequestTimeout, final Integer maxImportThreads,
449452
final String notificationAddressString, final String supportedLanguageCodes, final boolean enableWebIDSignUp, final String oidcRefreshTokensPropertiesPath,
450453
final String frontendProxyString, final String backendProxyAdminString, final String backendProxyEndUserString,
451454
final String mailUser, final String mailPassword, final String smtpHost, final String smtpPort,
@@ -709,10 +712,10 @@ public Application(final ServletConfig servletConfig, final MediaTypes mediaType
709712
trustStore.load(trustStoreInputStream, clientTrustStorePassword.toCharArray());
710713
}
711714

712-
client = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, null, false);
713-
externalClient = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, null, false);
714-
importClient = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, maxRequestRetries, true);
715-
noCertClient = getNoCertClient(trustStore, maxConnPerRoute, maxTotalConn, maxRequestRetries);
715+
client = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, null, false, connectionRequestTimeout);
716+
externalClient = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, null, false, connectionRequestTimeout);
717+
importClient = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, maxRequestRetries, true, connectionRequestTimeout);
718+
noCertClient = getNoCertClient(trustStore, maxConnPerRoute, maxTotalConn, maxRequestRetries, connectionRequestTimeout);
716719

717720
if (maxContentLength != null)
718721
{
@@ -1527,7 +1530,7 @@ public void submitImport(RDFImport rdfImport, com.atomgraph.linkeddatahub.apps.m
15271530
* @throws UnrecoverableKeyException key loading error
15281531
* @throws KeyManagementException key loading error
15291532
*/
1530-
public static Client getClient(KeyStore keyStore, String keyStorePassword, KeyStore trustStore, Integer maxConnPerRoute, Integer maxTotalConn, Integer maxRequestRetries, boolean buffered) throws NoSuchAlgorithmException, KeyStoreException, UnrecoverableKeyException, KeyManagementException
1533+
public static Client getClient(KeyStore keyStore, String keyStorePassword, KeyStore trustStore, Integer maxConnPerRoute, Integer maxTotalConn, Integer maxRequestRetries, boolean buffered, Integer connectionRequestTimeout) throws NoSuchAlgorithmException, KeyStoreException, UnrecoverableKeyException, KeyManagementException
15311534
{
15321535
if (keyStore == null) throw new IllegalArgumentException("KeyStore cannot be null");
15331536
if (keyStorePassword == null) throw new IllegalArgumentException("KeyStore password string cannot be null");
@@ -1592,7 +1595,11 @@ public void releaseConnection(final HttpClientConnection managedConn, final Obje
15921595
config.property(ClientProperties.FOLLOW_REDIRECTS, true);
15931596
config.property(ClientProperties.REQUEST_ENTITY_PROCESSING, RequestEntityProcessing.BUFFERED); // https://stackoverflow.com/questions/42139436/jersey-client-throws-cannot-retry-request-with-a-non-repeatable-request-entity
15941597
config.property(ApacheClientProperties.CONNECTION_MANAGER, conman);
1595-
1598+
if (connectionRequestTimeout != null)
1599+
config.property(ApacheClientProperties.REQUEST_CONFIG, RequestConfig.custom().
1600+
setConnectionRequestTimeout(connectionRequestTimeout).
1601+
build());
1602+
15961603
if (maxRequestRetries != null)
15971604
config.property(ApacheClientProperties.RETRY_HANDLER, (HttpRequestRetryHandler) (IOException ex, int executionCount, HttpContext context) ->
15981605
{
@@ -1629,7 +1636,7 @@ public void releaseConnection(final HttpClientConnection managedConn, final Obje
16291636
* @param maxRequestRetries maximum number of times that the HTTP client will retry a request
16301637
* @return client instance
16311638
*/
1632-
public static Client getNoCertClient(KeyStore trustStore, Integer maxConnPerRoute, Integer maxTotalConn, Integer maxRequestRetries)
1639+
public static Client getNoCertClient(KeyStore trustStore, Integer maxConnPerRoute, Integer maxTotalConn, Integer maxRequestRetries, Integer connectionRequestTimeout)
16331640
{
16341641
try
16351642
{
@@ -1688,7 +1695,11 @@ public void releaseConnection(final HttpClientConnection managedConn, final Obje
16881695
config.property(ClientProperties.FOLLOW_REDIRECTS, true);
16891696
config.property(ClientProperties.REQUEST_ENTITY_PROCESSING, RequestEntityProcessing.BUFFERED); // https://stackoverflow.com/questions/42139436/jersey-client-throws-cannot-retry-request-with-a-non-repeatable-request-entity
16901697
config.property(ApacheClientProperties.CONNECTION_MANAGER, conman);
1691-
1698+
if (connectionRequestTimeout != null)
1699+
config.property(ApacheClientProperties.REQUEST_CONFIG, RequestConfig.custom().
1700+
setConnectionRequestTimeout(connectionRequestTimeout).
1701+
build());
1702+
16921703
if (maxRequestRetries != null)
16931704
config.property(ApacheClientProperties.RETRY_HANDLER, (HttpRequestRetryHandler) (IOException ex, int executionCount, HttpContext context) ->
16941705
{
@@ -1708,7 +1719,7 @@ public void releaseConnection(final HttpClientConnection managedConn, final Obje
17081719
}
17091720
return false;
17101721
});
1711-
1722+
17121723
return ClientBuilder.newBuilder().
17131724
withConfig(config).
17141725
sslContext(ctx).

src/main/java/com/atomgraph/linkeddatahub/server/filter/request/ApplicationFilter.java

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,23 @@ public void filter(ContainerRequestContext request) throws IOException
107107

108108
requestURI = builder.build();
109109
}
110-
else requestURI = request.getUriInfo().getRequestUri();
110+
else
111+
{
112+
request.setProperty(AC.uri.getURI(), graphURI); // authoritative external proxy marker
113+
114+
// strip ?uri= from the effective request URI — server-side sees only the path;
115+
// the ContainerRequestContext property is the sole indicator of proxy mode
116+
MultivaluedMap<String, String> externalQueryParams = new MultivaluedHashMap();
117+
externalQueryParams.putAll(request.getUriInfo().getQueryParameters());
118+
externalQueryParams.remove(AC.uri.getLocalName());
119+
120+
UriBuilder externalBuilder = UriBuilder.fromUri(request.getUriInfo().getAbsolutePath());
121+
for (Entry<String, List<String>> params : externalQueryParams.entrySet())
122+
for (String value : params.getValue())
123+
externalBuilder.queryParam(params.getKey(), value);
124+
125+
requestURI = externalBuilder.build();
126+
}
111127
}
112128
catch (URISyntaxException ex)
113129
{

0 commit comments

Comments
 (0)