1. Introduction

This is an implementation of the Accumulo Instance Application Programming Interface (API) which uses the Accumulo-supplied, Apache Thrift-based proxy server as the back-end for communicating with an Accumulo instance.

Apache Accumulo provides two methods for interacting with an Accumulo instance: a Java-based client library and an Apache Thrift-based API. The Java-based client library requires that the client application have direct network access to all data nodes in the Accumulo instance cloud as the client application communicates directly with the tablet servers where the data is stored.

Traditional

The Thrift-based API interacts with an Accumulo instance through a Proxy Server; the client application only needs direct network access to the Proxy Server (the proxy server, in turn, communicates with the tablet servers directly on behalf of the requesting client application). While providing similar capabilities as the Java-based client library, the Thrift-based API is significantly different than the Java-based API. The Thrift API was originally developed to provide Accumulo access to non-Java applications. However, in situations where the Accumulo cloud is not entirely network addressable by Java-based client applications (e.g., isolated behind a firewall), it is useful to allow Java clients to utilize the proxy service. Furthermore, It would be ideal to expose the proxy service through the same API as the traditional Java-based client library to protect client source code from significant changes based only on differences in the network topology.

This Proxy Instance implementation provides such an implementation. It is a Java-based client library for interacting with Accumulo’s ProxyService via the Thrift interface, but exposes the same Java API as the traditional Java-based client library. This enables, in the future (e.g., after development and testing) by moving the client code onto the isolated Accumulo network and with a simple switch of the Instance type created, the Java client application can take advantage of the performance increase using the traditional Java client library.

Proxy

This version was written, compiled, and tested against Accumulo 1.6.2.

2. Building

The source hierarchy for ProxyInstance is:

  • proxy-instance-project: POM contains general plugin versions and configurations

    • proxy-instance: The actual source and test code for the ProxyInstance

    • proxy-instance-docs: This documentation

    • proxy-instance-build: Resources necessary for building (e.g, license information, formatting templates)

To build the system, you can execute:

mvn package

To build it while executing the integration tests (requires an external Accumulo instance and Proxy Server configured and running):

mvn failsafe:integration-test package -Daccumulo.proxy.host=myhost -Daccumulo.proxy.port=myport -Daccumulo.proxy.user=myuser -Daccumulo.proxy.password=mypassword

3. Usage

This section contains a brief introduction to getting setup using the Proxy Instance.

  1. You must have the Accumulo Proxy Server up and running. See http://accumulo.apache.org/1.6/accumulo_user_manual.html#_proxy for more information.

  2. Include this for maven (or download the latest JARs from Maven Central)

    <dependency>
       <groupId>edu.jhuapl.accumulo</groupId>
       <artifactId>proxy-instance</artifactId>
       <version>${proxy.version}</version>
    </dependency>

    The current version is 1.0.0.

  3. Create an instance of ProxyInstance and provide it the hostname (or IP address) and port where the proxy is running.

     Instance instance = new ProxyInstance("proxyhost", 4567);
     Connector connector = instance.getConnector(user, new PassworkToken(password));
  4. Use the instance and connector objects as you normally would using the traditional Java API.

  5. When finished, we also need to close the ProxyInstance. Unfortunately, a method to close an Instance does not exist in the public API. Therefore, we must add something like this when we are done with the Instance:

      if (instance instanceof ProxyInstance) {
         ((ProxyInstance)instance).close();
      }

3.1. Configuration Parameters

3.1.1. BatchScanner Fetch Size

When fetching data from a BatchScanner, the Proxy Server API allows you to request data in batches. By batching multiple {key,value} pairs up into a single fetch request, data latency and network bandwidth consumed may be reduced. The pairs are queued in the Proxy Server and returned in a batch when the fetch size (or the end of the Scanner) is reached.

The ProxyInstance BatchScanner implementation defaults to a fetch size of 1,000 pairs. In general, this is good for fetching a lot of "small" data from very large tables. However, in certain circumstances this batching can actually increase the data latency for the first data element. The ProxyServer will fill the entire fetch size buffer before sending any data to the client. If very selective filtering is applied to the scanner on the server-side, it may take a long time for the Proxy Server to find 1,000 pairs to return even if a few are found very quickly. Furthermore, if the keys and/or values are sufficiently large (or the RAM available to the Proxy Server is sufficiently limited), queuing 1,000 pairs in RAM can cause the Proxy Server to crash with an OutOfMemoryException.

Therefore, the ProxyInstance provides a way for clients to modify the default fetch size via an optional argument to the ProxyInstance constructor. Unfortunately, there is no place within the Java API to specify a fetch size for a BatchScanner on a per-scanner basis. Therefore, changing the fetch size via the ProxyInstance constructor changes the fetch size for all BatchScanners created by that instance. If multiple fetch sizes are required/desired, the client application will have to create and manage multiple ProxyInstances and utilize the correct instance to create individual BatchScanners based on the fetch size requirements.

The fetch size specified must be strictly greater than 0 and less than or equal to 2,000. If the value provided is outside of that range, a warning will be logged and the default value of 1,000 will be used.

To set a new fetch size via the constructor, use the 3-argument constructor:

   String host = "myhost";
   int port = 5432;
   int fetchSize = 10;

   Instance inst = new ProxyInstance(host, port, fetchSize);
   ...

4. Development

This chapter includes information for those wishing to further development on the ProxyInstance.

4.1. Unit and Integration Tests

The ProxyInstance repository contains a number of unit tests (XxxTest.java) and integration tests (XxxIT.java) in the src/test directory. But more are needed.

As a convenience, developers may choose to have their unit and integration test classes subclass the provided ConnectorBase class. The ConnectorBase handles obtaining an Instance and Connector references that can be utilized by the tests.

A special tag interface called IntegrationTest has also been created. If a sub-class of ConnectorBase containing tests implements the IntegrationTest interface, it will be provided a ProxyInstance configured using the required System properties: accumulo.proxy.host, accumulo.proxy.port, accumulo.proxy.user, and accumulo.proxy.password. The integration tests assume a real Proxy Server is running somewhere on the network. If the test class does not implement IntegrationTest, a local Proxy Server will be created backed by a MockInstance Accumulo (we do not use the mini due to ACCUMULO-3293).

A common pattern when testing is to be able to test the logic via the local mock instance proxy server as a unit test and test the same capabilities again against an external proxy server as an integration test. The current infrastructure enables this without duplication of code. The general pattern is:

  1. Create your unit tests by subclassing ConnectorBase and writing your tests against the protected member variables instance and connector. When run as a unit tests, these will automatically be connected to a local, mock-based proxy server.

    public class SomeClassTest extends ConnectorBase {
    
         @Test
         public void testSomething() {
            // make use of member variables 'instance' and 'connector' as needed,
            // already initialized by parent ConnectorBase
            BatchWriter bw = connector.createBatchWriter(...);
    
            // do more stuff...
         }
    }
  2. Create your integration test by simply sub-classing your unit test and implement the tag interface IntegrationTest. That is it. When you run the integration tests, the instance and connector member variables will be connected to the remote proxy server based on the required system parameters.

    public class SomeClassIT extends SomeClassTest implements IntegrationTest {
    
        // Do nothing more; the ConnectorBase parent will recoginize this is an
        // IntegrationTest and provide the appropriate 'instance' and
        // 'connector' references and then execute all of the same logic the unit
        // test executed.
    
        // Of course, you can create new methods here if there is additional
        // integration testing desired beyond what the unit test performed.
    
    }

5. Known Issues

Here we document the known issues.

  • Not all Java APIs are supported through the Proxy interface. Currently, the proxy throws an UnsupportedOperationException in such cases. We should enter tickets to update the proxy thrift interface to support these additional capabilities.

  • We currently need to close the instance which is not part of the public API. Maybe we should open and close the transport every time we access the instance so we can open/close the transport each time? Is there a better way to handle closing the transport without incurring the overhead of re-establishing the connection on every call?

  • Others?

JHU/APL

Copyright 2014-2015 The Johns Hopkins University / Applied Physics Laboratory

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.