expdp user/password directory=datapump dumpfile=oracle_example.dmp content=metadata_only logfile=example.log
impdp user/password directory=datapump dumpfile=oracle_example.dmp sqlfile=example_sql.sql TRANSFORM=SEGMENT_ATTRIBUTES:N:table PARTITION_OPTIONS=MERGE
您需要将每个表的数据提取到数据列表文件(.lst)中。我们编写了一个简单的SQL脚本(spool.sql)来执行此操作:
cat spool.sql
SET ECHO OFF
SET TERMOUT OFF
SET FEEDBACK OFF
SET PAGESIZE 0
SET TRIMSPOOL ON
SET WRAP OFF
set linesize 30000
SET RECSEP OFF
SET VERIFY OFF
SET ARRAYSIZE 10000
SET COLSEP '|'
SPOOL '&1'
ALTER SESSION SET nls_date_format = 'YYYY-MM-DD HH24:MI:SS';
SELECT * from &1;
SPOOL OFF
SET PAGESIZE 24
SET FEEDBACK ON
SET TERMOUT ON
要提取数据,我们为SQL * Plus中的每个表运行了脚本:
$ sqlplus user/password
@spool CUSTOMERS
@spool ADDRESSES
@spool CARD_DETAILS
@spool WAREHOUSES
@spool ORDER_ITEMS
@spool ORDERS
@spool INVENTORIES
@spool PRODUCT_INFORMATION
@spool LOGON
@spool PRODUCT_DESCRIPTIONS
@spool ORDERENTRY_METADATA
.lst为每个表创建一个带有前导和尾随空格的数据列表文件()。
退出SQL * Plus:
>EXIT
每个表的数据列表文件都需要转换为CSV并针对hubble进行格式化。我们编写了一个简单的Python脚本(fix-example.py)来执行此操作:
cat fix-example.py
import csv
import string
import sys
for lstfile in sys.argv[1:]:
filename = lstfile.split(".")[0]
with open(sys.argv[1]) as f:
reader = csv.reader(f, delimiter="|")
with open(filename+".csv", "w") as fo:
writer = csv.writer(fo)
for rec in reader:
writer.writerow(map(string.strip, rec))
python fix-example.py CUSTOMERS.lst ADDRESSES.lst CARD_DETAILS.lst WAREHOUSES.lst ORDER_ITEMS.lst ORDERS.lst INVENTORIES.lst PRODUCT_INFORMATION.lst LOGON.lst PRODUCT_DESCRIPTIONS.lst ORDERENTRY_METADATA.lst
使用之前创建的SQL文件,编写IMPORT TABLE与要导入的表数据的模式匹配的语句。
删除所有特定于Oracle的属性,重新映射所有Oracle数据类型,重构所有CREATE TABLE语句以包括主键。
使用下表进行数据类型映射:
Oracle数据类型 | hubble数据类型 |
---|---|
BLOB | BYTES 1个 |
CHAR(n),CHARACTER(n)n <256 | CHAR(n),CHARACTER(n) |
CLOB | STRING 1个 |
DATE | DATE |
FLOAT(n) | DECIMAL(n) |
INTERVAL YEAR(p) TO MONTH | VARCHAR, INTERVAL |
INTERVAL DAY(p) TO SECOND(s) | VARCHAR, INTERVAL |
JSON | JSON 2 |
LONG | STRING |
LONG RAW | BYTES |
NCHAR(n)n <256 | CHAR(n) |
NCHAR(n)n> 255 | VARCHAR, STRING |
NCLOB | STRING |
NUMBER(p,0),NUMBER(p)1 <= p <5 | INT2 3 |
NUMBER(p,0),NUMBER(p)5 <= p <9 | INT4 3 |
NUMBER(p,0),NUMBER(p)9 <= p <19 | INT8 3 |
NUMBER(p,0),NUMBER(p)19 <= p <= 38 | DECIMAL(p) |
NUMBER(p,s) s> 0 | DECIMAL(p,s) |
NUMBER, NUMBER(*) | DECIMAL |
NVARCHAR2(n) | VARCHAR(n) |
RAW(n) | BYTES |
TIMESTAMP(p) | TIMESTAMP |
TIMESTAMP(p) WITH TIME ZONE | TIMESTAMP WITH TIMEZONE |
VARCHAR(n), VARCHAR2(n) | VARCHAR(n) |
XML | JSON 2 |
如同CSV数据迁移一致