https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge
https://hortonworks.com/blog/apache-hive-moving-beyond-analytics-offload-with-sql-merge/
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_data-access/content/new-feature-insert-values-update-delete.html
ACIDが必要 (AcidOutputFormat)
CREATE TABLE `ts_test2`(
`id` int,
`ts` timestamp)
CLUSTERED BY (id) INTO 4 BUCKETS
STORED AS ORC
TBLPROPERTIES (
'transactional'='true');
USING ( select * from ts_test) AS S
ON T.id = S.id
when matched then update set ts = S.ts
when not matched then insert values (S.id, S.ts);
Getting log thread is interrupted, since query is done!
+-------------+------------------------+--+
| ts_test.id | ts_test.ts |
+-------------+------------------------+--+
| 1 | NULL |
| 2 | 1970-01-01 00:00:00.0 |
+-------------+------------------------+--+
2 rows selected (0.23 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id | ts_test2.ts |
+--------------+------------------------+--+
| 1 | 1973-10-20 00:00:00.0 |
+--------------+------------------------+--+
1 row selected (0.187 seconds)
0: jdbc:hive2://node7.localdomain:10000> MERGE INTO ts_test2 AS T
0: jdbc:hive2://node7.localdomain:10000> USING ( select * from ts_test) AS S
0: jdbc:hive2://node7.localdomain:10000> ON T.id = S.id
0: jdbc:hive2://node7.localdomain:10000> when matched then update set ts = S.ts
0: jdbc:hive2://node7.localdomain:10000> when not matched then insert values (S.id, S.ts);
INFO : Session is already open
INFO : Dag name: MERGE INTO ts_test2 AS T
USING ( sel...S.ts)(Stage-3)
INFO : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO : Status: Running (Executing on YARN cluster with App id application_1492638963323_0002)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Map 5 .......... SUCCEEDED 4 4 0 0 0 0
Reducer 2 ...... SUCCEEDED 4 4 0 0 0 0
Reducer 3 ...... SUCCEEDED 1 1 0 0 0 0
Reducer 4 ...... SUCCEEDED 4 4 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 05/05 [==========================>>] 100% ELAPSED TIME: 14.64 s
--------------------------------------------------------------------------------
Getting log thread is interrupted, since query is done!
INFO : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10002
INFO : Loading data to table default.merge_tmp_table from hdfs://nnha-c6-ubu02/tmp/hive/hive/31e5df2a-7dab-498c-85c0-0810484896f4/_tmp_space.db/merge_tmp_table/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10004
INFO : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10000
INFO : Table default.ts_test2 stats: [numFiles=3, numRows=0, totalSize=1855, rawDataSize=0]
INFO : Table default.merge_tmp_table stats: [numFiles=0, totalSize=0]
INFO : Table default.ts_test2 stats: [numFiles=3, numRows=0, totalSize=1855, rawDataSize=0]
No rows affected (16.536 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id | ts_test2.ts |
+--------------+------------------------+--+
| 1 | NULL |
| 2 | 1970-01-01 00:00:00.0 |
+--------------+------------------------+--+
0: jdbc:hive2://node7.localdomain:10000> MERGE INTO ts_test2 AS T
0: jdbc:hive2://node7.localdomain:10000> USING ( select * from ts_test) AS S
0: jdbc:hive2://node7.localdomain:10000> ON T.id = S.id
0: jdbc:hive2://node7.localdomain:10000> when not matched then insert values (S.id, S.ts);
INFO : Session is already open
INFO : Dag name: MERGE INTO ts_test2 AS T
USING ( sel...S.ts)(Stage-1)
INFO : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO : Status: Running (Executing on YARN cluster with App id application_1492638963323_0002)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Map 3 .......... SUCCEEDED 4 4 0 0 0 0
Reducer 2 ...... SUCCEEDED 4 4 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 8.57 s
--------------------------------------------------------------------------------
Getting log thread is interrupted, since query is done!
INFO : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-11-24_801_8984288262712289131-1/-ext-10000
INFO : Table default.ts_test2 stats: [numFiles=2, numRows=0, totalSize=1271, rawDataSize=0]
No rows affected (10.344 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id | ts_test2.ts |
+--------------+------------------------+--+
| 1 | 1973-10-20 00:00:00.0 |
| 2 | 1970-01-01 00:00:00.0 |
+--------------+------------------------+--+
2 rows selected (0.485 seconds)
0: jdbc:hive2://node7.localdomain:10000>
エラーになる例:
なぜかエイリアスは使えない
when matched then update set T.ts = S.ts
ワイルドカードはダメ
when not matched then insert values (S.*);
https://hortonworks.com/blog/apache-hive-moving-beyond-analytics-offload-with-sql-merge/
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_data-access/content/new-feature-insert-values-update-delete.html
ACIDが必要 (AcidOutputFormat)
CREATE TABLE `ts_test2`(
`id` int,
`ts` timestamp)
CLUSTERED BY (id) INTO 4 BUCKETS
STORED AS ORC
TBLPROPERTIES (
'transactional'='true');
すべて上書き:
MERGE INTO ts_test2 AS TUSING ( select * from ts_test) AS S
ON T.id = S.id
when matched then update set ts = S.ts
when not matched then insert values (S.id, S.ts);
なかったら追加:
MERGE INTO ts_test2 AS T
USING ( select * from ts_test) AS S
ON T.id = S.id
when not matched then insert values (S.id, S.ts);
MERGE INTO ts_test2 AS T
USING ( select * from ts_test) AS S
ON T.id = S.id
when not matched then insert values (S.id, S.ts);
出力例:
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test;Getting log thread is interrupted, since query is done!
+-------------+------------------------+--+
| ts_test.id | ts_test.ts |
+-------------+------------------------+--+
| 1 | NULL |
| 2 | 1970-01-01 00:00:00.0 |
+-------------+------------------------+--+
2 rows selected (0.23 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id | ts_test2.ts |
+--------------+------------------------+--+
| 1 | 1973-10-20 00:00:00.0 |
+--------------+------------------------+--+
1 row selected (0.187 seconds)
0: jdbc:hive2://node7.localdomain:10000> MERGE INTO ts_test2 AS T
0: jdbc:hive2://node7.localdomain:10000> USING ( select * from ts_test) AS S
0: jdbc:hive2://node7.localdomain:10000> ON T.id = S.id
0: jdbc:hive2://node7.localdomain:10000> when matched then update set ts = S.ts
0: jdbc:hive2://node7.localdomain:10000> when not matched then insert values (S.id, S.ts);
INFO : Session is already open
INFO : Dag name: MERGE INTO ts_test2 AS T
USING ( sel...S.ts)(Stage-3)
INFO : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO : Status: Running (Executing on YARN cluster with App id application_1492638963323_0002)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Map 5 .......... SUCCEEDED 4 4 0 0 0 0
Reducer 2 ...... SUCCEEDED 4 4 0 0 0 0
Reducer 3 ...... SUCCEEDED 1 1 0 0 0 0
Reducer 4 ...... SUCCEEDED 4 4 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 05/05 [==========================>>] 100% ELAPSED TIME: 14.64 s
--------------------------------------------------------------------------------
Getting log thread is interrupted, since query is done!
INFO : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10002
INFO : Loading data to table default.merge_tmp_table from hdfs://nnha-c6-ubu02/tmp/hive/hive/31e5df2a-7dab-498c-85c0-0810484896f4/_tmp_space.db/merge_tmp_table/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10004
INFO : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10000
INFO : Table default.ts_test2 stats: [numFiles=3, numRows=0, totalSize=1855, rawDataSize=0]
INFO : Table default.merge_tmp_table stats: [numFiles=0, totalSize=0]
INFO : Table default.ts_test2 stats: [numFiles=3, numRows=0, totalSize=1855, rawDataSize=0]
No rows affected (16.536 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id | ts_test2.ts |
+--------------+------------------------+--+
| 1 | NULL |
| 2 | 1970-01-01 00:00:00.0 |
+--------------+------------------------+--+
0: jdbc:hive2://node7.localdomain:10000> MERGE INTO ts_test2 AS T
0: jdbc:hive2://node7.localdomain:10000> USING ( select * from ts_test) AS S
0: jdbc:hive2://node7.localdomain:10000> ON T.id = S.id
0: jdbc:hive2://node7.localdomain:10000> when not matched then insert values (S.id, S.ts);
INFO : Session is already open
INFO : Dag name: MERGE INTO ts_test2 AS T
USING ( sel...S.ts)(Stage-1)
INFO : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO : Status: Running (Executing on YARN cluster with App id application_1492638963323_0002)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Map 3 .......... SUCCEEDED 4 4 0 0 0 0
Reducer 2 ...... SUCCEEDED 4 4 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 8.57 s
--------------------------------------------------------------------------------
Getting log thread is interrupted, since query is done!
INFO : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-11-24_801_8984288262712289131-1/-ext-10000
INFO : Table default.ts_test2 stats: [numFiles=2, numRows=0, totalSize=1271, rawDataSize=0]
No rows affected (10.344 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id | ts_test2.ts |
+--------------+------------------------+--+
| 1 | 1973-10-20 00:00:00.0 |
| 2 | 1970-01-01 00:00:00.0 |
+--------------+------------------------+--+
2 rows selected (0.485 seconds)
0: jdbc:hive2://node7.localdomain:10000>
エラーになる例:
なぜかエイリアスは使えない
when matched then update set T.ts = S.ts
ワイルドカードはダメ
when not matched then insert values (S.*);
0 件のコメント:
コメントを投稿