%HadoopGateway.Connection
hidden class %HadoopGateway.Connection extends %HadoopGateway.Base
This class represents a connection to a Hadoop instance via Hadoop Gateway. It provides methods for:
Connecting, disconnecting, and checking the state of a connection: %Connect(), %Disconnect(), %IsConnected(), %CurrentHadoopHome(), %CurrentHost(), %CurrentPort().
Creating an iterator to import a MapReduce result: %CreateMapReduceResult().
Executing commands on the Hadoop Namenode machine: %Execute().
Setting and getting transient config options for this connection: %SetOption(), %SetClassOption(), %SetNamespaceOption(), %GetOption(), %GetClassOption(), %GetNamespaceOption(). Transient values set by these set methods override the persistent values for the same options, set via the same-named class methods of %HadoopGateway.Config. They apply only to this connection, and persist across calls to %Connect() and %Disconnect(), for as long as this connection instance is in memory. They do not apply to a different connection instance that connects using the same "HadoopHome", "Host", and "Port" values. These get methods return the transient values of specified options if they have been set for this connection, else they return the persistent values that were set via the %HadoopGateway.Config set methods, or the defaults if no persistent values have been set. For options that can be specified both globally and for a class, if a transient value has not been set for a given class, %GetClassOption() returns the transient global value if one has been set, else it returns the value that would be returned by the %HadoopGateway.Config %GetClassOption class method. See %HadoopGateway.Config for details of available options.
Synchronizing data from Caché to Hadoop files: %StartSync(), %StopSync(), %Synchronize(), %SynchronizeAll(), %TimeSync(), %GetExportFilePathname(), %GetCurrentSyncJobID(), %GetSyncJobIDs(), %GetAllSyncJobIDs(), %GetSyncJobUpdateCount(), %GetSyncJobSyncCount(), %GetSyncJobStartTime(), %GetSyncJobLastTime(), %GetSyncJobErrors(), %GetSyncJobClassName(), %GetSyncJobUserName(), %GetSyncJobHadoopHome(), %GetSyncJobHost(), %IsSyncJobAlive(). These methods allows data from a Caché table to be synchronized to the Hadoop Distributed File System (HDFS), where it can be used as input for MapReduce analysis. Use %StartSync() to start near-real-time synchronization of a table to HDFS. Use %StopSync() to stop synchronizing a table. Use %Synchronize() to perform one-time synchronization of all not-yet-synchronized inserts, updates, and deletes to a table. Use %SynchronizeAll() to perform one-time synchronization of all data in a table, including data that existed prior to adding "DSTIME=AUTO" to the class definition. Get information about current and previously running background synchronization jobs by using methods %GetCurrentSyncJobID(), %GetSyncJobIDs(), %GetAllSyncJobIDs(), %GetSyncJobUpdateCount(), %GetSyncJobSyncCount(), %GetSyncJobStartTime(), %GetSyncJobLastTime(), %GetSyncJobErrors(), %GetSyncJobClassName(), %GetSyncJobUserName(), %GetSyncJobHadoopHome(), %GetSyncJobHost(), and %IsSyncJobAlive().
Prerequisites:
The configuration option "HadoopHome" must be set to the pathname of the root of a Hadoop installation. (Use methods of %HadoopGateway.Config to set persistent values of this and other options, or use the methods %SetOption(), %SetClassOption(), and %SetNamespaceOption() of the present class to set transient option values. All other options have reasonable defaults, but "HadoopHome" defaults to "/hadoophome", which is unlikely to be the actual pathname.)The Java Gateway must be running on the machine which is the "namenode" of the Hadoop instance. Start it in an operating system command shell with the command:
java -cp classpath com.intersys.gateway.JavaGateway port [logfile]
where:
classpath must include the Caché jar files cachejdbc.jar and cachegateway.jar.
port is the port number on which Java Gateway listens. Hadoop Gateway assumes a default Java Gateway port number of 56789, so if you use a different number you must set the configuration option "port" to that number.
logfile is an option argument specifying the pathname of a file in which Java Gateway logs all messages and errors.
A table to be synchronized to Hadoop has these additional prerequisites:
Its class definition must specify the parameter setting DSTIME=AUTO.
Optionally, the class definition may include a class query called HadoopExport. This query is used to determine which columns of the table to export in which order, and any desired conversions or calculations. The query's WHERE clause must contain the predicate "%id=:pID", and the query definition must specify the keyword SqlProc. Here is an example of a HadoopExport query definition:
Query HadoopExport(pID As %String) As %SQLQuery [ SqlProc ] { select patient, to_char(fromtime, 'YYYY-MM-DD'), to_char(totime,'YYYY-MM-DD'), eventtype from HSAA.EventCareProviderSite where %id=:pID }
If no HadoopExport query is specified, the query select * from tablename
where %id=:pID
is used.
Data is exported in delimited text format, with delimiter defaulting to "," (comma) and line separator defaulting to LF (line feed character, $c(10)). These can be customized using configuration options "Delimiter" and "LineSeparator" respectively.
Method Inventory
- %Connect()
- %CreateMapReduceResult()
- %CurrentHadoopHome()
- %CurrentHost()
- %CurrentPort()
- %Disconnect()
- %Execute()
- %GetAllSyncJobIDs()
- %GetClassOption()
- %GetCurrentSyncJobID()
- %GetExportFilePathname()
- %GetNamespaceOption()
- %GetOption()
- %GetSyncJobClassName()
- %GetSyncJobErrors()
- %GetSyncJobHadoopHome()
- %GetSyncJobHost()
- %GetSyncJobIDs()
- %GetSyncJobLastTime()
- %GetSyncJobStartTime()
- %GetSyncJobSyncCount()
- %GetSyncJobUpdateCount()
- %GetSyncJobUserName()
- %IsConnected()
- %SetClassOption()
- %SetNamespaceOption()
- %SetOption()
- %StartSync()
- %StopSync()
- %SyncJobIsAlive()
- %Synchronize()
- %SynchronizeAll()
- %TimeSync()
Methods
Connects via Java Gateway to the Hadoop instance on the machine specified by pHost whose root directory is specified by pHadoopHome, connecting to Java Gateway using the port number specified by pPort. If any of the arguments is omitted, the value of the corresponding config option is used.
Values of any arguments specified are set as the transient values of the corresponding config options for this connection, so that if it is disconnected, it can re-connect to the same Hadoop instance by calling this method with no arguments.The following methods of this class require this connection to be connected, and will throw an exception if called for a connection that is not connected: %StartSync(), %Synchronize(), %SynchronizeAll(), %TimeSync(), %CreateMapReduceResult(), %Execute().
pDirPath specifies the pathname within HDFS of a directory containing one or more MapReduce result files, over the lines of which this MapReduceResult instance will iterate. pLineSeparator optionally specifies a sequence of one or more characters used to separate lines in the MapReduce result files. If not specified, it defaults to the value of the "lineseparator" configuration option.
This method throws an exception if the connection is not currently connected.
Disconnects this connection. The connection retains any transient config options set via %SetOption(), %SetClassOption(), or %SetNamespaceOption(), and can be re-connected to the same Hadoop instance, or connected to a different Hadoop instance, by calling %Connect().
This method facilitates executing MapReduce jobs, starting and stopping the Hadoop instance, and performing other Hadoop management tasks from within Caché ObjectScript running on the client machine. Commands are executed via the Java Gateway, and execute with the operating system user, group, and privileges of the user who started the Java Gateway.
This method throws an exception if the connection is not currently connected.
If optional arguments pHadoopHome and pHost are not specified, returns the job ID of the background job that is connected to the Hadoop instance to which the current connection is connected, or, if the current connection is not connected, then the Hadoop instance specified by the values that would be returned by calling this connection's %GetOption() method for options "HadoopHome" and "Host".
If optional arguments pHadoopHome and pHost are not specified, lists jobs that synchronized to the Hadoop instance to which the current connection is connected, or, if the current connection is not connected, then the Hadoop instance specified by the values that would be returned by calling this connection's %GetOption() method for options "HadoopHome" and "Host".
Starts a background job to synchronize the table specified by pClassName. Returns the job ID of the newly-started job, or 0 if no job was started because a job for the specified table was already running. All inserts, updates, and deletes which were not already synchronized to Hadoop prior to this call are synchronized to Hadoop in near-real time, until %StopSync() is called, or until this Caché instance is shut down.
Inserts, updates, or deletes which were performed prior to adding "DSTIME=AUTO" to the class definition are not synchronized. To synchronize these, call %SynchronizeAll().
This method throws an exception if the connection is not currently connected.
If optional arguments pHadoopHome and pHost are not specified, stops the background job that is connected to the Hadoop instance to which the current connection is connected, or, if the current connection is not connected, then the Hadoop instance specified by the values that would be returned by calling this connection's %GetOption() method for options "HadoopHome" and "Host". A backround job connected to a different Hadoop instance can be stopped by using the arguments pHadoopHome and pHost to specify the values for that instance.
Returns the job ID of the job that was stopped, or 0 if there was no such job running.
Inserts, updates, or deletes which were performed prior to adding "DSTIME=AUTO" to the class definition are not synchronized. To synchronize these, call %SynchronizeAll()
This method throws an exception if the connection is not currently connected.
This method throws an exception if the connection is not currently connected.
This method throws an exception if the connection is not currently connected.
Inherited Members
Inherited Methods
- %AddToSaveSet()
- %ClassIsLatestVersion()
- %ClassName()
- %ConstructClone()
- %DispatchClassMethod()
- %DispatchGetModified()
- %DispatchGetProperty()
- %DispatchMethod()
- %DispatchSetModified()
- %DispatchSetMultidimProperty()
- %DispatchSetProperty()
- %Extends()
- %GetParameter()
- %IsA()
- %IsModified()
- %New()
- %NormalizeObject()
- %ObjectModified()
- %OriginalNamespace()
- %PackageName()
- %RemoveFromSaveSet()
- %SerializeObject()
- %SetModified()
- %ValidateObject()