Tuesday, April 24, 2012

Use tSystem in Talend with Windows Command Prompt

I was working with a colleague of mine to list the directories/files in a directory using Talend. We could have used tFileList but on 4.1.2 it is really slow especially when you want additional info such as file properties.

We struggled with this venture for a while but finally nailed it down.To get a list of file names in Windows you need to run the dir /b command on the cmd prompt. After a few reties and help from google -
http://stackoverflow.com/questions/515309/what-does-cmd-c-mean, we figured it out.

In the tSystem component make sure you have the home directory set to what-ever the top level directory is where your files exist and in the command box - type in
 cmd /c dir /b

This forces the command line to execute the command dir and close after it does so. If you use the /K command your Talend server/console will hang.

Hope this helps.  


Friday, April 20, 2012

Renaming the output from a tMap in Talend 4.1.2

Renaming the output from a tMap in Talend 4.1.2

I was working this morning on a tMap with 6 different outputs. I realized that I had named some of the outputs incorrectly. I tried to rename the tMap output in the editor but it did not work. 5 minutes on google and I got my answer..:-)

Instead of renaming the tMap output inside the editor, all you need to do is to rename the output link that connects that output to the next step. Once you do that, you should see your output name change in your tMap editor.

Monday, April 9, 2012

Setting up Pentaho 3.8.0 Repository in Oracle


As I am working towards a stable Pentaho deployment plan, I realized I needed some kind of a HA and DR solution for the Pentaho repository. The current test servers have the default MySQL repository which is on the same box and just does not cut it anymore. So I set out today to move the repository over to our Oracle 11g R2 RAC.  The other important requirement I had was to use one oracle schema/user for both the Hibernate and the Quartz databases

Here are the basic steps that I took to get this done –

Create the Repository
 I used an existing schema in Oracle and ran the script below. This script is derived from the default MySQL script that Pentaho provides. All I did was converted the data-types to ORACLE types(like VARCHAR to VARCHAR2, SMALL IN and BOOLEAN to NUMBER etc.) and shortened a few constraint names.


DROP TABLE GRANTED_AUTHORITIES; DROP TABLE DATASOURCE; DROP TABLE USERS; DROP TABLE AUTHORITIES; CREATE TABLE USERS(USERNAME VARCHAR2(50) NOT NULL PRIMARY KEY,PASSWORD VARCHAR2(50) NOT NULL,ENABLED NUMBER(1) NOT NULL,DESCRIPTION VARCHAR2(100)); CREATE TABLE AUTHORITIES(AUTHORITY VARCHAR2(50) NOT NULL PRIMARY KEY,DESCRIPTION VARCHAR2(100)); CREATE TABLE GRANTED_AUTHORITIES(USERNAME VARCHAR2(50) NOT NULL,AUTHORITY VARCHAR2(50) NOT NULL,CONSTRAINT FK_GRANTED_AUTH_USERS FOREIGN KEY(USERNAME) REFERENCES USERS(USERNAME), CONSTRAINT FK_GRANTED_AUTH_AUTHORITIES FOREIGN KEY(AUTHORITY) REFERENCES AUTHORITIES(AUTHORITY)); CREATE TABLE DATASOURCE(NAME VARCHAR2(50) NOT NULL PRIMARY KEY,MAXACTCONN NUMBER NOT NULL,DRIVERCLASS VARCHAR2(50) NOT NULL,IDLECONN NUMBER NOT NULL,USERNAME VARCHAR2(50) NULL,PASSWORD VARCHAR2(150) NULL, URL VARCHAR2(512) NOT NULL,QUERY VARCHAR2(100) NULL,WAIT NUMBER NOT NULL); DROP TABLE QRTZ_JOB_LISTENERS; DROP TABLE QRTZ_TRIGGER_LISTENERS; DROP TABLE QRTZ_FIRED_TRIGGERS; DROP TABLE QRTZ_PAUSED_TRIGGER_GRPS; DROP TABLE QRTZ_SCHEDULER_STATE; DROP TABLE QRTZ_LOCKS; DROP TABLE QRTZ_SIMPLE_TRIGGERS; DROP TABLE QRTZ_CRON_TRIGGERS; DROP TABLE QRTZ_BLOB_TRIGGERS; DROP TABLE QRTZ_TRIGGERS; DROP TABLE QRTZ_JOB_DETAILS; DROP TABLE QRTZ_CALENDARS; CREATE TABLE QRTZ_JOB_DETAILS ( JOB_NAME VARCHAR2(80) NOT NULL, JOB_GROUP VARCHAR2(80) NOT NULL, DESCRIPTION VARCHAR2(120) NULL, JOB_CLASS_NAME VARCHAR2(128) NOT NULL, IS_DURABLE VARCHAR2(1) NOT NULL, IS_VOLATILE VARCHAR2(1) NOT NULL, IS_STATEFUL VARCHAR2(1) NOT NULL, REQUESTS_RECOVERY VARCHAR2(1) NOT NULL, JOB_DATA BLOB, PRIMARY KEY (JOB_NAME,JOB_GROUP) ); CREATE TABLE QRTZ_JOB_LISTENERS ( JOB_NAME VARCHAR2(80) NOT NULL, JOB_GROUP VARCHAR2(80) NOT NULL, JOB_LISTENER VARCHAR2(80) NOT NULL, PRIMARY KEY (JOB_NAME,JOB_GROUP,JOB_LISTENER), FOREIGN KEY (JOB_NAME,JOB_GROUP) REFERENCES QRTZ_JOB_DETAILS(JOB_NAME,JOB_GROUP) ); CREATE TABLE QRTZ_TRIGGERS ( TRIGGER_NAME VARCHAR2(80) NOT NULL, TRIGGER_GROUP VARCHAR2(80) NOT NULL, JOB_NAME VARCHAR2(80) NOT NULL, JOB_GROUP VARCHAR2(80) NOT NULL, IS_VOLATILE VARCHAR2(1) NOT NULL, DESCRIPTION VARCHAR2(120) NULL, NEXT_FIRE_TIME NUMBER(13) NULL, PREV_FIRE_TIME NUMBER(13) NULL, TRIGGER_STATE VARCHAR2(16) NOT NULL, TRIGGER_TYPE VARCHAR2(8) NOT NULL, START_TIME NUMBER(13) NOT NULL, END_TIME NUMBER(13), CALENDAR_NAME VARCHAR2(80), MISFIRE_INSTR NUMBER(2) , JOB_DATA BLOB , PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP), FOREIGN KEY (JOB_NAME,JOB_GROUP) REFERENCES QRTZ_JOB_DETAILS(JOB_NAME,JOB_GROUP) ); CREATE TABLE QRTZ_SIMPLE_TRIGGERS ( TRIGGER_NAME VARCHAR2(80) NOT NULL, TRIGGER_GROUP VARCHAR2(80) NOT NULL, REPEAT_COUNT NUMBER(7) NOT NULL, REPEAT_INTERVAL NUMBER(12) NOT NULL, TIMES_TRIGGERED NUMBER(7) NOT NULL, PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP), FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP) REFERENCES QRTZ_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP) ); CREATE TABLE QRTZ_CRON_TRIGGERS ( TRIGGER_NAME VARCHAR2(80) NOT NULL, TRIGGER_GROUP VARCHAR2(80) NOT NULL, CRON_EXPRESSION VARCHAR2(80) NOT NULL, TIME_ZONE_ID VARCHAR2(80), PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP), FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP) REFERENCES QRTZ_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP) ); CREATE TABLE QRTZ_BLOB_TRIGGERS ( TRIGGER_NAME VARCHAR2(80) NOT NULL, TRIGGER_GROUP VARCHAR2(80) NOT NULL, BLOB_DATA BLOB NULL, PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP), FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP) REFERENCES QRTZ_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP) ); CREATE TABLE QRTZ_TRIGGER_LISTENERS ( TRIGGER_NAME VARCHAR2(80) NOT NULL, TRIGGER_GROUP VARCHAR2(80) NOT NULL, TRIGGER_LISTENER VARCHAR2(80) NOT NULL, PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP,TRIGGER_LISTENER), FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP) REFERENCES QRTZ_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP) ); CREATE TABLE QRTZ_CALENDARS ( CALENDAR_NAME VARCHAR2(80) NOT NULL, CALENDAR BLOB NOT NULL, PRIMARY KEY (CALENDAR_NAME) ); CREATE TABLE QRTZ_PAUSED_TRIGGER_GRPS ( TRIGGER_GROUP VARCHAR2(80) NOT NULL, PRIMARY KEY (TRIGGER_GROUP) ); CREATE TABLE QRTZ_FIRED_TRIGGERS ( ENTRY_ID VARCHAR2(95) NOT NULL, TRIGGER_NAME VARCHAR2(80) NOT NULL, TRIGGER_GROUP VARCHAR2(80) NOT NULL, IS_VOLATILE VARCHAR2(1) NOT NULL, INSTANCE_NAME VARCHAR2(80) NOT NULL, FIRED_TIME NUMBER(13) NOT NULL, STATE VARCHAR2(16) NOT NULL, JOB_NAME VARCHAR2(80) NULL, JOB_GROUP VARCHAR2(80) NULL, IS_STATEFUL VARCHAR2(1) NULL, REQUESTS_RECOVERY VARCHAR2(1) NULL, PRIMARY KEY (ENTRY_ID) ); CREATE TABLE QRTZ_SCHEDULER_STATE ( INSTANCE_NAME VARCHAR2(80) NOT NULL, LAST_CHECKIN_TIME NUMBER(13) NOT NULL, CHECKIN_INTERVAL NUMBER(13) NOT NULL, RECOVERER VARCHAR2(80) NULL, PRIMARY KEY (INSTANCE_NAME) ); CREATE TABLE QRTZ_LOCKS ( LOCK_NAME VARCHAR2(40) NOT NULL, PRIMARY KEY (LOCK_NAME) ); INSERT INTO QRTZ_LOCKS values('TRIGGER_ACCESS'); INSERT INTO QRTZ_LOCKS values('JOB_ACCESS'); INSERT INTO QRTZ_LOCKS values('CALENDAR_ACCESS'); INSERT INTO QRTZ_LOCKS values('STATE_ACCESS'); INSERT INTO QRTZ_LOCKS values('MISFIRE_ACCESS'); commit; INSERT INTO USERS VALUES('joe','cGFzc3dvcmQ=',1,''); INSERT INTO USERS VALUES('pat','cGFzc3dvcmQ=',1,''); INSERT INTO USERS VALUES('suzy','cGFzc3dvcmQ=',1,''); INSERT INTO USERS VALUES('tiffany','cGFzc3dvcmQ=',1,''); commit; -- Create default roles INSERT INTO AUTHORITIES VALUES('Admin','Super User'); INSERT INTO AUTHORITIES VALUES('Anonymous','User has not logged in'); INSERT INTO AUTHORITIES VALUES('Authenticated','User has logged in'); INSERT INTO AUTHORITIES VALUES('ceo','Chief Executive Officer'); INSERT INTO AUTHORITIES VALUES('cto','Chief Technology Officer'); INSERT INTO AUTHORITIES VALUES('dev','Developer'); INSERT INTO AUTHORITIES VALUES('devmgr','Development Manager'); INSERT INTO AUTHORITIES VALUES('is','Information Services'); commit; INSERT INTO GRANTED_AUTHORITIES VALUES('joe','Admin'); INSERT INTO GRANTED_AUTHORITIES VALUES('joe','ceo'); INSERT INTO GRANTED_AUTHORITIES VALUES('joe','Authenticated'); INSERT INTO GRANTED_AUTHORITIES VALUES('suzy','cto'); INSERT INTO GRANTED_AUTHORITIES VALUES('suzy','is'); INSERT INTO GRANTED_AUTHORITIES VALUES('suzy','Authenticated'); INSERT INTO GRANTED_AUTHORITIES VALUES('pat','dev'); INSERT INTO GRANTED_AUTHORITIES VALUES('pat','Authenticated'); INSERT INTO GRANTED_AUTHORITIES VALUES('tiffany','dev'); INSERT INTO GRANTED_AUTHORITIES VALUES('tiffany','devmgr'); INSERT INTO GRANTED_AUTHORITIES VALUES('tiffany','Authenticated'); commit;


Please note that once you re-start your BI server, you will see a bunch of other tables that Pentaho creates. Please make sure that the user you use to log-in to Oracle there-fore has DDL privileges.

Modify the hibernate settings
To modify the hibernate db settings, go to the directory [[PENTAHO_SOL_DIR]]/system/hibernate. You should see a file named -  oracle10g.hibernate.cfg.xml  . If not just create an xml file with the below content 

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE hibernate-configuration
  PUBLIC "-//Hibernate/Hibernate Configuration DTD//EN"
  "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">
<hibernate-configuration>
  <session-factory>


    <property name="cache.provider_class">org.hibernate.cache.EhCacheProvider</property>


    <property name="hibernate.generate_statistics">true</property>
    <property name="hibernate.cache.use_query_cache">true</property>
    <property name="cache.provider_class">org.hibernate.cache.EhCacheProvider</property>


    <!--  Oracle 10g Configuration -->
    <property name="connection.driver_class">oracle.jdbc.driver.OracleDriver</property>
    <property name="connection.url">jdbc:oracle:oci:@//[[URL]]</property>
    <property name="dialect">org.hibernate.dialect.Oracle10gDialect</property>
    <property name="connection.username">[[USERNAME]]</property>
    <property name="connection.password">{{PASSWORD]]</property>
    <property name="connection.pool_size">10</property>
    <property name="show_sql">false</property>
    <property name="hibernate.jdbc.use_streams_for_binary">true</property>
    <!-- replaces DefinitionVersionManager -->
    <property name="hibernate.hbm2ddl.auto">update</property>
    <!-- load resource from classpath -->
    <mapping resource="hibernate/oracle10g.hbm.xml" />
    <!-- mapping resource above is from CE; below is from EE -->
    <mapping resource="hibernate/oracle10g.EE.hbm.xml" />
    <!--  This is only used by Pentaho Administration Console. Spring Security will not use these mapping files -->
    <mapping resource="PentahoUser.hbm.xml" />
    <mapping resource="PentahoRole.hbm.xml" />


  </session-factory>
</hibernate-configuration>
~

You would also need to edit the - hibernate-settings.xml to point to the above oracle10g configuration file instead of the default MySQL file. Here is how the entry should look like –


Modify context.xml for the Quartz connection
You would also need to modify your context.xml under [[PENTAHO_INSTALL_DIR]]/server/biserver-ee/tomcat/webapps/Pentaho/META-INF (or wherever your Pentaho web-app/tomcat is installed). Modify the JNDI entries for Quartz and hibernate to point to the new connections. Here is the sample below –

<Resource name="jdbc/Hibernate" auth="Container" type="javax.sql.DataSource"
                driverClassName="oracle.jdbc.driver.OracleDriver"
                factory="oracle.jdbc.pool.OracleDataSourceFactory"
                url="jdbc:oracle:thin:@//[[URL]]"
                user="[[USER_NAME]]" password="[[PASSWORD]]" maxActive="200" maxIdle="50" maxWait="-1" defaultReadOnly="true"
                validationQuery="select 1 from dual"/>

        <Resource name="jdbc/Quartz" auth="Container" type="javax.sql.DataSource"
                driverClassName="oracle.jdbc.driver.OracleDriver"
                factory="oracle.jdbc.pool.OracleDataSourceFactory"
                url="jdbc:oracle:thin:@//[[URL]]"
                user="[[USER]]" password="[[PASSWORD]]" maxActive="200" maxIdle="50" maxWait="-1" defaultReadOnly="true"
                validationQuery="select 1 from dual"/>

Other Optional Steps
If you use the EE and are using the repository for storing JNDI connections used in your reports, copy over the HIBERNATE.DATASOURCE connections from MySQL to Oracle. You could choose to do it through the Enterprise Console otherwise. I was too lazy for the latter approach. :-)

If you are using JBDC security, you might have to modify the applicationcontext-security-jdbc.xml to point to Oracle. I am using LDAP/AD. So I personally did not do this.

Finally…
Re-start the BI Server and you should be good,