1. Introduction
This is an implementation of the Accumulo Instance
Application Programming Interface (API) which uses the Accumulo-supplied, Apache Thrift-based
proxy server as the back-end for communicating with an Accumulo instance.
Apache Accumulo provides two methods for interacting with an Accumulo instance: a Java-based client library and an Apache Thrift-based API. The Java-based client library requires that the client application have direct network access to all data nodes in the Accumulo instance cloud as the client application communicates directly with the tablet servers where the data is stored.
The Thrift-based API interacts with an Accumulo instance through a Proxy Server; the client application only needs direct network access to the Proxy Server (the proxy server, in turn, communicates with the tablet servers directly on behalf of the requesting client application). While providing similar capabilities as the Java-based client library, the Thrift-based API is significantly different than the Java-based API. The Thrift API was originally developed to provide Accumulo access to non-Java applications. However, in situations where the Accumulo cloud is not entirely network addressable by Java-based client applications (e.g., isolated behind a firewall), it is useful to allow Java clients to utilize the proxy service. Furthermore, It would be ideal to expose the proxy service through the same API as the traditional Java-based client library to protect client source code from significant changes based only on differences in the network topology.
This Proxy Instance
implementation provides such an implementation. It is a Java-based client library for interacting with Accumulo’s
ProxyService via the Thrift interface, but exposes the same Java API as the traditional Java-based client library. This enables,
in the future (e.g., after development and testing) by moving the client code onto the isolated Accumulo network and with a simple switch
of the Instance
type created, the Java client application can take advantage of the performance increase using the traditional Java
client library.
This version was written, compiled, and tested against Accumulo 1.6.2.
2. Building
The source hierarchy for ProxyInstance
is:
-
proxy-instance-project: POM contains general plugin versions and configurations
-
proxy-instance: The actual source and test code for the
ProxyInstance
-
proxy-instance-docs: This documentation
-
proxy-instance-build: Resources necessary for building (e.g, license information, formatting templates)
-
To build the system, you can execute:
mvn package
To build it while executing the integration tests (requires an external Accumulo instance and Proxy Server configured and running):
mvn failsafe:integration-test package -Daccumulo.proxy.host=myhost -Daccumulo.proxy.port=myport -Daccumulo.proxy.user=myuser -Daccumulo.proxy.password=mypassword
3. Usage
This section contains a brief introduction to getting setup using the Proxy Instance.
-
You must have the Accumulo Proxy Server up and running. See http://accumulo.apache.org/1.6/accumulo_user_manual.html#_proxy for more information.
-
Include this for maven (or download the latest JARs from Maven Central)
<dependency> <groupId>edu.jhuapl.accumulo</groupId> <artifactId>proxy-instance</artifactId> <version>${proxy.version}</version> </dependency>
The current version is 1.0.0.
-
Create an instance of
ProxyInstance
and provide it the hostname (or IP address) and port where the proxy is running.Instance instance = new ProxyInstance("proxyhost", 4567); Connector connector = instance.getConnector(user, new PassworkToken(password));
-
Use the
instance
andconnector
objects as you normally would using the traditional Java API. -
When finished, we also need to close the
ProxyInstance
. Unfortunately, a method to close anInstance
does not exist in the public API. Therefore, we must add something like this when we are done with theInstance
:if (instance instanceof ProxyInstance) { ((ProxyInstance)instance).close(); }
3.1. Configuration Parameters
3.1.1. BatchScanner Fetch Size
When fetching data from a BatchScanner,
the Proxy Server API allows you to request data in batches. By batching multiple {key,value}
pairs up into a single fetch request, data latency and network bandwidth consumed may be reduced. The pairs are queued in the Proxy Server
and returned in a batch when the fetch size (or the end of the Scanner
) is reached.
The ProxyInstance
BatchScanner
implementation defaults to a fetch size of 1,000 pairs. In general, this is good for fetching a lot of
"small" data from very large tables. However, in certain circumstances this batching can actually increase the data latency for the first
data element. The ProxyServer will fill the entire fetch size buffer before sending any data to the client. If very selective filtering is
applied to the scanner on the server-side, it may take a long time for the Proxy Server to find 1,000 pairs to return even if a few are found
very quickly. Furthermore, if the keys and/or values are sufficiently large (or the RAM available to the Proxy Server is sufficiently
limited), queuing 1,000 pairs in RAM can cause the Proxy Server to crash with an OutOfMemoryException
.
Therefore, the ProxyInstance
provides a way for clients to modify the default fetch size via an optional argument to the ProxyInstance
constructor. Unfortunately, there is no place within the Java API to specify a fetch size for a BatchScanner
on a per-scanner
basis. Therefore, changing the fetch size via the ProxyInstance
constructor changes the fetch size for all BatchScanner
s created
by that instance. If multiple fetch sizes are required/desired, the client application will have to create and manage multiple
ProxyInstance
s and utilize the correct instance to create individual BatchScanner
s based on the fetch size requirements.
The fetch size specified must be strictly greater than 0 and less than or equal to 2,000. If the value provided is outside of that range, a warning will be logged and the default value of 1,000 will be used.
To set a new fetch size via the constructor, use the 3-argument constructor:
String host = "myhost";
int port = 5432;
int fetchSize = 10;
Instance inst = new ProxyInstance(host, port, fetchSize);
...
4. Development
This chapter includes information for those wishing to further development on the ProxyInstance
.
4.1. Unit and Integration Tests
The ProxyInstance
repository contains a number of unit tests (XxxTest.java
) and integration tests (XxxIT.java
) in
the src/test
directory. But more are needed.
As a convenience, developers may choose to have their unit and integration test classes subclass the provided ConnectorBase
class. The ConnectorBase handles obtaining an Instance
and Connector
references that can be utilized by the
tests.
A special tag interface called IntegrationTest
has also been created. If a sub-class of ConnectorBase containing tests
implements the IntegrationTest
interface, it will be provided a ProxyInstance
configured using the required System
properties: accumulo.proxy.host
, accumulo.proxy.port
, accumulo.proxy.user
, and
accumulo.proxy.password
. The integration tests assume a real Proxy Server is running somewhere on the network. If the
test class does not implement IntegrationTest
, a local Proxy Server will be created backed by a MockInstance
Accumulo (we do not use the mini due to ACCUMULO-3293).
A common pattern when testing is to be able to test the logic via the local mock instance proxy server as a unit test and test the same capabilities again against an external proxy server as an integration test. The current infrastructure enables this without duplication of code. The general pattern is:
-
Create your unit tests by subclassing
ConnectorBase
and writing your tests against theprotected
member variablesinstance
andconnector
. When run as a unit tests, these will automatically be connected to a local, mock-based proxy server.public class SomeClassTest extends ConnectorBase { @Test public void testSomething() { // make use of member variables 'instance' and 'connector' as needed, // already initialized by parent ConnectorBase BatchWriter bw = connector.createBatchWriter(...); // do more stuff... } }
-
Create your integration test by simply sub-classing your unit test and implement the tag interface
IntegrationTest
. That is it. When you run the integration tests, theinstance
andconnector
member variables will be connected to the remote proxy server based on the required system parameters.public class SomeClassIT extends SomeClassTest implements IntegrationTest { // Do nothing more; the ConnectorBase parent will recoginize this is an // IntegrationTest and provide the appropriate 'instance' and // 'connector' references and then execute all of the same logic the unit // test executed. // Of course, you can create new methods here if there is additional // integration testing desired beyond what the unit test performed. }
5. Known Issues
Here we document the known issues.
-
Not all Java APIs are supported through the Proxy interface. Currently, the proxy throws an
UnsupportedOperationException
in such cases. We should enter tickets to update the proxy thrift interface to support these additional capabilities. -
We currently need to close the instance which is not part of the public API. Maybe we should open and close the transport every time we access the instance so we can open/close the transport each time? Is there a better way to handle closing the transport without incurring the overhead of re-establishing the connection on every call?
-
Others?
Copyright 2014-2015 The Johns Hopkins University / Applied Physics Laboratory
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.