2017年4月20日木曜日

HDP-2.6で追加されたMERGEを使って見る

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge
https://hortonworks.com/blog/apache-hive-moving-beyond-analytics-offload-with-sql-merge/
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_data-access/content/new-feature-insert-values-update-delete.html

ACIDが必要 (AcidOutputFormat)

 CREATE TABLE `ts_test2`(
   `id` int,
   `ts` timestamp)
 CLUSTERED BY (id) INTO 4 BUCKETS
 STORED AS ORC
 TBLPROPERTIES (
   'transactional'='true');

すべて上書き:
MERGE INTO ts_test2 AS T
USING ( select * from ts_test) AS S
ON T.id = S.id
when matched then update set ts = S.ts
when not matched then insert values (S.id, S.ts);

なかったら追加:
MERGE INTO ts_test2 AS T
USING ( select * from ts_test) AS S
ON T.id = S.id
when not matched then insert values (S.id, S.ts);


出力例:
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test;
Getting log thread is interrupted, since query is done!
+-------------+------------------------+--+
| ts_test.id  |       ts_test.ts       |
+-------------+------------------------+--+
| 1           | NULL                   |
| 2           | 1970-01-01 00:00:00.0  |
+-------------+------------------------+--+
2 rows selected (0.23 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id  |      ts_test2.ts       |
+--------------+------------------------+--+
| 1            | 1973-10-20 00:00:00.0  |
+--------------+------------------------+--+
1 row selected (0.187 seconds)
0: jdbc:hive2://node7.localdomain:10000> MERGE INTO ts_test2 AS T
0: jdbc:hive2://node7.localdomain:10000> USING ( select * from ts_test) AS S
0: jdbc:hive2://node7.localdomain:10000> ON T.id = S.id
0: jdbc:hive2://node7.localdomain:10000> when matched then update set ts = S.ts
0: jdbc:hive2://node7.localdomain:10000> when not matched then insert values (S.id, S.ts);
INFO  : Session is already open
INFO  : Dag name: MERGE INTO ts_test2 AS T
USING ( sel...S.ts)(Stage-3)
INFO  : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO  : Status: Running (Executing on YARN cluster with App id application_1492638963323_0002)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Map 5 ..........   SUCCEEDED      4          4        0        0       0       0
Reducer 2 ......   SUCCEEDED      4          4        0        0       0       0
Reducer 3 ......   SUCCEEDED      1          1        0        0       0       0
Reducer 4 ......   SUCCEEDED      4          4        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 05/05  [==========================>>] 100%  ELAPSED TIME: 14.64 s
--------------------------------------------------------------------------------
Getting log thread is interrupted, since query is done!
INFO  : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10002
INFO  : Loading data to table default.merge_tmp_table from hdfs://nnha-c6-ubu02/tmp/hive/hive/31e5df2a-7dab-498c-85c0-0810484896f4/_tmp_space.db/merge_tmp_table/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10004
INFO  : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10000
INFO  : Table default.ts_test2 stats: [numFiles=3, numRows=0, totalSize=1855, rawDataSize=0]
INFO  : Table default.merge_tmp_table stats: [numFiles=0, totalSize=0]
INFO  : Table default.ts_test2 stats: [numFiles=3, numRows=0, totalSize=1855, rawDataSize=0]
No rows affected (16.536 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id  |      ts_test2.ts       |
+--------------+------------------------+--+
| 1            | NULL                   |
| 2            | 1970-01-01 00:00:00.0  |
+--------------+------------------------+--+
0: jdbc:hive2://node7.localdomain:10000> MERGE INTO ts_test2 AS T
0: jdbc:hive2://node7.localdomain:10000> USING ( select * from ts_test) AS S
0: jdbc:hive2://node7.localdomain:10000> ON T.id = S.id
0: jdbc:hive2://node7.localdomain:10000> when not matched then insert values (S.id, S.ts);
INFO  : Session is already open
INFO  : Dag name: MERGE INTO ts_test2 AS T
USING ( sel...S.ts)(Stage-1)
INFO  : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO  : Status: Running (Executing on YARN cluster with App id application_1492638963323_0002)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Map 3 ..........   SUCCEEDED      4          4        0        0       0       0
Reducer 2 ......   SUCCEEDED      4          4        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 8.57 s
--------------------------------------------------------------------------------
Getting log thread is interrupted, since query is done!
INFO  : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-11-24_801_8984288262712289131-1/-ext-10000
INFO  : Table default.ts_test2 stats: [numFiles=2, numRows=0, totalSize=1271, rawDataSize=0]
No rows affected (10.344 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id  |      ts_test2.ts       |
+--------------+------------------------+--+
| 1            | 1973-10-20 00:00:00.0  |
| 2            | 1970-01-01 00:00:00.0  |
+--------------+------------------------+--+
2 rows selected (0.485 seconds)

0: jdbc:hive2://node7.localdomain:10000>


エラーになる例:
なぜかエイリアスは使えない
when matched then update set T.ts = S.ts
ワイルドカードはダメ
when not matched then insert values (S.*);

0 件のコメント:

コメントを投稿