文章出處

軟件下載

Oracle Big Data Connectors:ODCH

下載地址:

http://www.oracle.com/technetwork/bdc/big-data-connectors/downloads/index.html

Jave SE 下載地址:

http://www.oracle.com/technetwork/java/javase/downloads/jdk6u38-downloads-1877406.html

Oracle11g下載地址:

Oracle Enterprise Linux下載地址:需要注冊oracle帳號才能下載:

https://edelivery.oracle.com/linux

安裝Oracle服務器

在虛擬機或者物理機下安裝都可以:

基本上就是一頓下一步。

然后安裝oracle11g,這個真是折騰了好幾天

最后按照這篇文章操作,安裝成功。

http://wenku.baidu.com/view/fc804254be23482fb4da4c63.html

我覺得這塊最好的文章就是oracle的官方給的教程:

http://docs.oracle.com/cd/E11882_01/install.112/e24326/toc.htm

安裝hadoop

http://blog.csdn.net/lichangzai/article/details/8206834

遇到問題1:

ssh 免密碼登錄設置后還是提示需要輸入密碼

解決方法如下:

chmod 700 /.ssh/

chmod 700 /home/userName
chmod 600 /.ssh/authorized_keys

遇到的問題2:

Agent admitted failure to sign using the key

解決辦法:

應用 ssh-add 指令將私鑰加進來(名稱可能不是 id_rsa)
# ssh-add /.ssh/id_rsa

Hadoop/Oracle架構

機器部署情況:

主機名

IP

系統版本

Oracle node

Hadoop node

hadoop進程名

Master

192.168.74.153

OEL6.4

Oracle11.2.0.2

master

namenode,jobtracker

Node1

192.168.2.147

Ubuntu12.04

slave

datanode,tasktracker

Node2

192.168.2.148

Ubuntu12.04

slave

datanode,tasktracker

主要環境變量:

變量名

變量值

ORACLE_HOME

/oracle/11g

ORACLE_SID

orcl

HADOOP_HOME

/home/hadoop/hadoop-1.2

HADOOP_CONF_DIR

HADOOP_HOME/conf

OSCH_HOME

/opt/odch/orahdfs-2.2.0

ORAHDFS_JAR

OSCH_HOME/jlib/orahdfs.jar

HDFS_BIN_PATH

/opt/odch/orahdfs-2.2.0

HADOOP_CLASSPATH

OSCH_HOME/jlib/*

按照以上配置。

安裝ODCH并設置環境變量

在opt下建立文件夾odch,將文件傳到這個目錄下解壓縮。

/opt/odch

解壓縮:

unzip orahdfs-2.2.0.zip

然后添加hadoop\oracle\ODCH主要要環境變量

進入hadoop用戶主目錄,vi .bash_profile

我的配置如下(可參考):

export ORACLE_BASE=/opt/oracle
export ORACLE_HOME=ORACLE_BASE/11g
export ORACLE_SID=orcl
export PATH=/usr/sbin:PATH
export PATH=ORACLE_HOME/bin:PATH
export LD_LIBRARY_PATH=ORACLE_HOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/usr/X11R6/lib64/
export CLASSPATH=ORACLE_HOME/JRE:ORACLE_HOME/jlib:ORACLE_HOME/rdbms/jlib

export HADOOP_HOME=/home/hadoop/hadoop/hadoop-1.2.0
PATH=PATH:HOME/bin:/home/hadoop/hadoop/jdk1.7.0_21/bin:HADOOP_HOME/bin
export HADOOP_CONF_DIR=HADOOP_HOME/conf

export OSCH_HOME=/opt/odch/orahdfs-2.2.0
export ORAHDFS_JAR=OSCH_HOME/jlib/orahdfs.jar
export HDFS_BIN_PATH=OSCH_HOME/bin

export JAVA_HOME=/home/hadoop/hadoop/jdk1.7.0_21
export PATH

修改用戶組屬性:

說明:因為之前的hadoop安裝在hadoop系統用戶下,而oracle安裝在了oracle系統用戶下,為了操作方便,使hadoop用戶有操作oracle操作庫的權限,現在修改hadoop系統用戶屬組,添加到和oracle同樣的用戶組下。

--查看之前的用戶屬組

[hadoop@master ] id hadoop
uid=501(hadoop) gid=503(hadoop) groups=503(hadoop)
[hadoop@master ] id oracle
uid=500(oracle) gid=501(oinstall) groups=501(oinstall),502(dba)

--修改用戶屬組

vi /etc/group
--再查看用戶組屬性
[hadoop@master ] id hadoop
uid=501(hadoop) gid=503(hadoop) groups=503(hadoop),501(oinstall),502(dba)

然后重啟一下計算機。

啟動oracle并測試

連接oracle啟動數據庫

[hadoop@master Desktop] sqlplus sys/oracle as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Sat Aug 17 11:06:08 2013
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
//啟動數據庫
SQL> startup;
ORACLE instance started.

Total System Global Area 1653518336 bytes
Fixed Size 2213896 bytes
Variable Size 973080568 bytes
Database Buffers 671088640 bytes
Redo Buffers 7135232 bytes
Database mounted.
Database opened.
//顯示數據庫實例名
SQL> show parameter db_name;

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_name string orcl

然后啟動oracle的listener

[hadoop@master Desktop] lsnrctl start
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-AUG-2013 11:21:33
Copyright (c) 1991, 2009, Oracle. All rights reserved.
Starting /opt/oracle/11g/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 11.2.0.1.0 - Production
System parameter file is /opt/oracle/11g/network/admin/listener.ora
Log messages written to /opt/oracle/diag/tnslsnr/master/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=master)(PORT=1521)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 17-AUG-2013 11:21:33
Uptime 0 days 0 hr. 0 min. 0 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /opt/oracle/11g/network/admin/listener.ora
Listener Log File /opt/oracle/diag/tnslsnr/master/listener/alert/log.xml
Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=master)(PORT=1521)))
The listener supports no services
The command completed successfully
//檢查一下狀態
[hadoop@master Desktop] lsnrctl stat
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 17-AUG-2013 11:21:53
Copyright (c) 1991, 2009, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 17-AUG-2013 11:21:33
Uptime 0 days 0 hr. 0 min. 19 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /opt/oracle/11g/network/admin/listener.ora
Listener Log File /opt/oracle/diag/tnslsnr/master/listener/alert/log.xml
Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=master)(PORT=1521)))
The listener supports no services
The command completed successfully

配置HDFS_STREAM和建表

進入目錄/opt/odch/orahdfs-2.2.0/bin,用vi打開hdfs_stream文件,加入如下配置:

export HADOOP_HOME=/home/hadoop/hadoop/hadoop-1.2.0
export OSCH_HOME=/opt/odch/orahdfs-2.2.0
export PATH=/usr/bin:/bin:HADOOP_HOME/bin

確保oracle用戶對ODCH_LOG_DIR/logs目錄有讀寫權限

因為Oracle用戶需要在 {ODCH_LOG_DIR} 目錄中創建log/bad文件,所以要確保oracle用戶對ODCH_LOG_DIR/log目錄有讀寫權限

--修改目錄權限,測試讀寫文件

[root@gc opt]# chmod -R 777 odch/
[root@gc opt]# su - oracle
[oracle@gc ] cd /opt/odch/orahdfs-2.2.0/log/
[oracle@gc log] touch ora_access_test
[oracle@gc log] rm ora_access_test

配置操作系統目錄和數據庫的Directory對象

--創建所用的系統目錄

[root@gc ]# mkdir -p /opt/odch/orahdfs-2.2.0/logs
[root@gc ]# mkdir -p /opt/odch/orahdfs-2.2.0/extdir
[root@gc ]# chmod 777 /opt/odch/orahdfs-2.2.0/logs
[root@gc ]# chmod 777 /opt/odch/orahdfs-2.2.0/extdir

--創建oracle Directory對象

sqlplus "/as sysdba"

SQL> create or replace directory ODCH_LOG_DIR as '/opt/odch/orahdfs-2.2.0/logs';
Directory created.
SQL> grant read, write on directory ODCH_LOG_DIR to SCOTT;
Grant succeeded.
SQL> create or replace directory ODCH_DATA_DIR as '/opt/odch/orahdfs-2.2.0/extdir';
Directory created.
SQL> grant read, write on directory ODCH_DATA_DIR to SCOTT;
Grant succeeded.
SQL> create or replace directory HDFS_BIN_PATH as '/opt/odch/orahdfs-2.2.0/bin';
Directory created.
SQL> grant execute on directory HDFS_BIN_PATH to SCOTT;
Grant succeeded.
SQL> grant read, write on directory HDFS_BIN_PATH to SCOTT;
Grant succeeded.

--目錄對象說明

HDFS_BIN_PATH::hdfs_stream腳本所在目錄.

HDFS_DATA_DIR:用來存放“位置文件”(location files)的目錄。“位置文件”(location files) 是一個配置文件,里面包含HDFS的文件路徑/文件名以及文件編碼格式。

ODCH_LOG_DIR:Oracle用來存放外部表的log/bad等文件的目錄.

創建oracle外部表:

conn scott/tiger
在登錄的時候提示:
ORA-28000: the account is locked
ALTER USER scott ACCOUNT UNLOCK;

連上之后然后創建表:

CREATE TABLE odch_ext_table ( ID NUMBER ,OWNER VARCHAR2(128) ,NAME VARCHAR2(128) ,MODIFIED DATE ,Val NUMBER ) ORGANIZATION EXTERNAL (TYPE oracle_loader DEFAULT DIRECTORY ODCH_DATA_DIR ACCESS PARAMETERS ( records delimited by newline preprocessor HDFS_BIN_PATH:hdfs_stream badfile ODCH_LOG_DIR:'odch_ext_table%a_%p.bad' logfile ODCH_LOG_DIR:'odch_ext_table%a_%p.log' fields terminated by ',' OPTIONALLY ENCLOSED BY '"' missing field values are null ( ID DECIMAL EXTERNAL, OWNER CHAR(200), NAME CHAR(200), MODIFIED CHAR DATE_FORMAT DATE MASK "YYYY-MM-DD HH24:MI:SS", Val DECIMAL EXTERNAL ) ) LOCATION ('Tmpdata.csv') ) PARALLEL REJECT LIMIT UNLIMITED;

準備示例文件

--示例文件內容

在附件中Tmpdata.csv

1,SYS,ORABASE,15-AUG-09,11
2,SYS,DUAL,15-AUG-09,116
3,PUBLIC,DUAL,15-AUG-09,33
4,PUBLIC,MAP_OBJECT,15-AUG-09,55

文件的準備過程:

Tmpdata.csv文件是我們通過all_objects生成的,SQL腳本為:select rownum,owner,object_name,created,data_object_id from all_objects

 -- 建立存儲過程
 CREATE OR REPLACE PROCEDURE SQL_TO_CSV
 (
 P_QUERY IN VARCHAR2, -- PLSQL文
 P_DIR IN VARCHAR2, -- 導出的文件放置目錄
 P_FILENAME IN VARCHAR2 -- CSV名
 )
 IS
 L_OUTPUT UTL_FILE.FILE_TYPE;
 L_THECURSOR INTEGER DEFAULT DBMS_SQL.OPEN_CURSOR;
 L_COLUMNVALUE VARCHAR2(4000);
 L_STATUS INTEGER;
 L_COLCNT NUMBER := 0;
 L_SEPARATOR VARCHAR2(1);
 L_DESCTBL DBMS_SQL.DESC_TAB;
 P_MAX_LINESIZE NUMBER := 32000;
 BEGIN
 --OPEN FILE
 L_OUTPUT := UTL_FILE.FOPEN(P_DIR, P_FILENAME, 'W', P_MAX_LINESIZE);
 --DEFINE DATE FORMAT
 EXECUTE IMMEDIATE 'ALTER SESSION SET NLS_DATE_FORMAT=''YYYY-MM-DD HH24:MI:SS''';
 --OPEN CURSOR
 DBMS_SQL.PARSE( L_THECURSOR, P_QUERY, DBMS_SQL.NATIVE );
 DBMS_SQL.DESCRIBE_COLUMNS( L_THECURSOR, L_COLCNT, L_DESCTBL );
 --DUMP TABLE COLUMN NAME
 FOR I IN 1 .. L_COLCNT LOOP
 UTL_FILE.PUT( L_OUTPUT, L_SEPARATOR  '' L_DESCTBL(I).COL_NAME '' );
 DBMS_SQL.DEFINE_COLUMN( L_THECURSOR, I, L_COLUMNVALUE, 4000 );
 L_SEPARATOR := ',';
 END LOOP;
 UTL_FILE.NEW_LINE( L_OUTPUT );
 --EXECUTE THE QUERY STATEMENT
 L_STATUS := DBMS_SQL.EXECUTE(L_THECURSOR);
 --DUMP TABLE COLUMN VALUE
 WHILE ( DBMS_SQL.FETCH_ROWS(L_THECURSOR) > 0 ) LOOP
 L_SEPARATOR := '';
 FOR I IN 1 .. L_COLCNT LOOP
 DBMS_SQL.COLUMN_VALUE( L_THECURSOR, I, L_COLUMNVALUE );
 UTL_FILE.PUT( L_OUTPUT, L_SEPARATOR  '' 
 TRIM(BOTH ' ' FROM REPLACE(L_COLUMNVALUE,'','””')) '');
 L_SEPARATOR := ',';
 END LOOP;
 UTL_FILE.NEW_LINE( L_OUTPUT );
 END LOOP;
 --CLOSE CURSOR
 DBMS_SQL.CLOSE_CURSOR(L_THECURSOR);
 --CLOSE FILE
 UTL_FILE.FCLOSE( L_OUTPUT );
 EXCEPTION
 WHEN OTHERS THEN
 RAISE;
 END;
 /
 
 -- 創建存放文件的目錄
 CREATE OR REPLACE DIRECTORY MYDIR AS 'C:\';
 
 -- 執行塊
 begin
 sql_to_csv('select * from au_jxs ','MYDIR','EXAMPLE.CSV');
 end;
 /

其中路徑文件名等可根據實際情況修改。

啟動hadoop

[hadoop@master ] start-all.sh 
Warning: HADOOP_HOME is deprecated.

starting namenode, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-namenode-master.out
node1: starting datanode, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-datanode-node1.out
node2: starting datanode, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-datanode-node2.out
master: starting secondarynamenode, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-secondarynamenode-master.out
starting jobtracker, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-jobtracker-master.out
node1: starting tasktracker, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-tasktracker-node1.out
node2: starting tasktracker, logging to /home/hadoop/hadoop/hadoop-1.2.0/libexec/../logs/hadoop-hadoop-tasktracker-node2.out

--先在Hadoop中建立一個目錄,然后把empxt*.dat放入該目錄中

[hadoop@master ] hadoop dfs -mkdir odch
Warning: HADOOP_HOME is deprecated.

[hadoop@master ] hadoop dfs -put Tmpdata.csv odch
Warning: HADOOP_HOME is deprecated.

[hadoop@master ] hadoop dfs -ls odch
Warning: HADOOP_HOME is deprecated.

[hadoop@master ] hadoop dfs -ls odch
Warning: HADOOP_HOME is deprecated.

Found 1 items
-rw-r--r-- 2 hadoop supergroup 113 2013-08-18 21:41 /user/hadoop/odch/Tmpdata.csv

生成位置文件

--執行下面的命令

export HADOOP_CLASSPATH="OSCH_HOME/jlib/*"
[hadoop@master ] hadoop jar \
> {ORAHDFS_JAR} oracle.hadoop.hdfs.exttab.ExternalTable \
> -D oracle.hadoop.hdfs.exttab.tableName=odch_ext_table \
> -D oracle.hadoop.hdfs.exttab.datasetPaths=odch \
> -D oracle.hadoop.hdfs.exttab.datasetRegex=Tmpdata.csv \
> -D oracle.hadoop.hdfs.exttab.connection.url="jdbc:oracle:thin:@//192.168.74.153:1521/orcl" \
> -D oracle.hadoop.hdfs.exttab.connection.user=SCOTT \
> -publish
Warning: HADOOP_HOME is deprecated.

DEPRECATED: The class oracle.hadoop.hdfs.exttab.ExternalTable is deprecated.
It is replaced by oracle.hadoop.exttab.ExternalTable.
Oracle SQL Connector for HDFS Release 2.2.0 - Production
Copyright (c) 2011, 2013, Oracle and/or its affiliates. All rights reserved.
[Enter Database Password:]
The publish command succeeded.
ALTER TABLE "SCOTT"."ODCH_EXT_TABLE" 
LOCATION
(
 'osch-20130818094259-1319-1'
);
The following location files were created.
osch-20130818094259-1319-1 contains 1 URI, 113 bytes
 113 hdfs://master:9000/user/hadoop/odch/Tmpdata.csv
The following location files were deleted.

參數說明:

ExternalTable:使用hadoop ExternalTable命令工具

-D:指定相關參數

tableName:外部表名字

datasetPaths:源數據存放路徑(HDFS)

datasetRegex:數據源格式

connection.url:oracle數據庫連接串

connection.user:數據庫用戶名scott

命令執行后還要輸入用戶名密碼:oracle

修改參數:

ALTER TABLE "SCOTT"."ODCH_EXT_TABLE"
LOCATION
(
'osch-20130818094259-1319-1'
);

查看位置文件內容和外鏈表信息

進入extdir目錄:

[hadoop@master ] cd /opt/odch/orahdfs-2.2.0/extdir/
[hadoop@master extdir] ls
osch-20130818083220-7675-1 osch-20130818084052-4686-1 osch-20130818085941-2623-1 osch-20130818094259-1319-1
[hadoop@master extdir] more osch-20130818094259-1319-1
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<locationFile>
 <header>
 <version>1.0</version>
 <fileName>osch-20130818094259-1319-1</fileName>
 <createDate>2013-08-18T21:15:40</createDate>
 <publishDate>2013-08-18T09:42:59</publishDate>
 <productName>Oracle SQL Connector for HDFS Release 2.2.0 - Production</productName>
 <productVersion>2.2.0</productVersion>
 </header>
 <uri_list>
 <uri_list_item size="113" compressionCodec="">hdfs://master:9000/user/hadoop/odch/Tmpdata.csv</uri_list_item
>
 </uri_list>
</locationFile>

可以看出上面有了到hadoop的指向信息。

查看外鏈到hadoop表信息:

在scott用戶查看:

SQL> set line 150 pages 1000; 
SQL> col owner for a10 
SQL> col name for a20 
SQL> select * from odch_ext_table;

 ID OWNER NAME MODIFIED VAL
---------- ---------- -------------------- --------- ----------
 1 SYS ORABASE 09-AUG-15 11
 2 SYS DUAL 09-AUG-15 116
 3 PUBLIC DUAL 09-AUG-15 33
 4 PUBLIC MAP_OBJECT 09-AUG-15 55

這些就是查詢到的hadoop中的數據信息。


文章列表


不含病毒。www.avast.com
文章標籤
全站熱搜
創作者介紹
創作者 大師兄 的頭像
大師兄

IT工程師數位筆記本

大師兄 發表在 痞客邦 留言(0) 人氣(6)