2017年4月25日火曜日

HDP 2.5.3のKnox Demo LDAPをAmbariと連携して見る

参考:https://hadoop-and-hdp.blogspot.com.au/2017/01/hdp-253knox-demo-ldapbeeline.html
http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-security/content/configuring_ambari_for_ldap_or_active_directory_authentication.html

[root@node1 ~]# ambari-server setup-ldap
Using python  /usr/bin/python
Setting up LDAP properties...
Primary URL* {host:port} : node1.localdomain:33389
Secondary URL {host:port} :
Use SSL* [true/false] (false):
User object class* (posixAccount): person
User name attribute* (uid):
Group object class* (posixGroup): groupofnames
Group name attribute* (cn):
Group member attribute* (memberUid): member
Distinguished name attribute* (dn):
Base DN* : dc=hadoop,dc=apache,dc=org
Referral method [follow/ignore] : ignore
Bind anonymously* [true/false] (false):
Manager DN* : uid=admin,ou=people,dc=hadoop,dc=apache,dc=org
Enter Manager Password* :
Re-enter password:
====================
Review Settings
====================
authentication.ldap.managerDn: uid=admin,ou=people,dc=hadoop,dc=apache,dc=org
authentication.ldap.managerPassword: *****
Save settings [y/n] (y)?
Saving...done
Ambari Server 'setup-ldap' completed successfully.

[root@node1 ~]# ambari-server restart

[root@node1 ambari-server]# grep ldap /etc/ambari-server/conf/ambari.properties
ambari.ldap.isConfigured=true
authentication.ldap.baseDn=dc=hadoop,dc=apache,dc=org
authentication.ldap.bindAnonymously=false
authentication.ldap.dnAttribute=dn
authentication.ldap.groupMembershipAttr=member
authentication.ldap.groupNamingAttr=cn
authentication.ldap.groupObjectClass=groupofnames
authentication.ldap.managerDn=uid=admin,ou=people,dc=hadoop,dc=apache,dc=org
authentication.ldap.managerPassword=/etc/ambari-server/conf/ldap-password.dat
authentication.ldap.pagination.enabled=false
authentication.ldap.primaryUrl=node1.localdomain:33389
authentication.ldap.referral=ignore
authentication.ldap.useSSL=false
authentication.ldap.userObjectClass=person
authentication.ldap.usernameAttribute=uid
client.security=ldap


/etc/ambari-server/conf/log4j.properties
#LDAP debug (そんなに有効ではない)
log4j.logger.org.apache.ambari.server.security=DEBUG


AmbariServerログに、下記のエラーが出たらBase DNを疑ってみる。
25 Apr 2017 07:23:15,290 FATAL [pool-16-thread-1] AbstractRequestControlDirContextProcessor:186 - No matching response control found for paged results - looking for 'class javax.naming.ldap.PagedResultsResponseControl

それでもだめなら、下記をambari.propertiesに追加してみる
authentication.ldap.pagination.enabled=false


補足1: 'admin'ユーザがかぶるので、パスワードがadmin-passwordになります。
再度、ローカルユーザ'admin'を作ると、Ambariが正しくユーザリストを表示しなくなります。(DBはユーザネームがユニークでないのに、Ambari側はユーニークなのを期待している模様)
そこで、Admin以外のユーザだけを追加する:
ambari-server sync-ldap --users <(grep -E '^uid' /etc/knox/conf/users.ldif | grep -vw admin | awk '{print $2}' | tr '\n' ',') --verbose


補足2:ユーザをデータベースを消すのはよくなさそうなので、APIで:
for u in `grep -E '^uid' /etc/knox/conf/users.ldif | grep -vw admin | awk '{print $2}'`; do curl -i -u admin:admin -H 'X-Requested-By: ambari' -X DELETE http://localhost:8080/api/v1/users/$u; done

2017年4月20日木曜日

HDP-2.6で追加されたMERGEを使って見る

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge
https://hortonworks.com/blog/apache-hive-moving-beyond-analytics-offload-with-sql-merge/
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_data-access/content/new-feature-insert-values-update-delete.html

ACIDが必要 (AcidOutputFormat)

 CREATE TABLE `ts_test2`(
   `id` int,
   `ts` timestamp)
 CLUSTERED BY (id) INTO 4 BUCKETS
 STORED AS ORC
 TBLPROPERTIES (
   'transactional'='true');

すべて上書き:
MERGE INTO ts_test2 AS T
USING ( select * from ts_test) AS S
ON T.id = S.id
when matched then update set ts = S.ts
when not matched then insert values (S.id, S.ts);

なかったら追加:
MERGE INTO ts_test2 AS T
USING ( select * from ts_test) AS S
ON T.id = S.id
when not matched then insert values (S.id, S.ts);


出力例:
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test;
Getting log thread is interrupted, since query is done!
+-------------+------------------------+--+
| ts_test.id  |       ts_test.ts       |
+-------------+------------------------+--+
| 1           | NULL                   |
| 2           | 1970-01-01 00:00:00.0  |
+-------------+------------------------+--+
2 rows selected (0.23 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id  |      ts_test2.ts       |
+--------------+------------------------+--+
| 1            | 1973-10-20 00:00:00.0  |
+--------------+------------------------+--+
1 row selected (0.187 seconds)
0: jdbc:hive2://node7.localdomain:10000> MERGE INTO ts_test2 AS T
0: jdbc:hive2://node7.localdomain:10000> USING ( select * from ts_test) AS S
0: jdbc:hive2://node7.localdomain:10000> ON T.id = S.id
0: jdbc:hive2://node7.localdomain:10000> when matched then update set ts = S.ts
0: jdbc:hive2://node7.localdomain:10000> when not matched then insert values (S.id, S.ts);
INFO  : Session is already open
INFO  : Dag name: MERGE INTO ts_test2 AS T
USING ( sel...S.ts)(Stage-3)
INFO  : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO  : Status: Running (Executing on YARN cluster with App id application_1492638963323_0002)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Map 5 ..........   SUCCEEDED      4          4        0        0       0       0
Reducer 2 ......   SUCCEEDED      4          4        0        0       0       0
Reducer 3 ......   SUCCEEDED      1          1        0        0       0       0
Reducer 4 ......   SUCCEEDED      4          4        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 05/05  [==========================>>] 100%  ELAPSED TIME: 14.64 s
--------------------------------------------------------------------------------
Getting log thread is interrupted, since query is done!
INFO  : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10002
INFO  : Loading data to table default.merge_tmp_table from hdfs://nnha-c6-ubu02/tmp/hive/hive/31e5df2a-7dab-498c-85c0-0810484896f4/_tmp_space.db/merge_tmp_table/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10004
INFO  : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-07-04_535_7291759981953472305-1/-ext-10000
INFO  : Table default.ts_test2 stats: [numFiles=3, numRows=0, totalSize=1855, rawDataSize=0]
INFO  : Table default.merge_tmp_table stats: [numFiles=0, totalSize=0]
INFO  : Table default.ts_test2 stats: [numFiles=3, numRows=0, totalSize=1855, rawDataSize=0]
No rows affected (16.536 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id  |      ts_test2.ts       |
+--------------+------------------------+--+
| 1            | NULL                   |
| 2            | 1970-01-01 00:00:00.0  |
+--------------+------------------------+--+
0: jdbc:hive2://node7.localdomain:10000> MERGE INTO ts_test2 AS T
0: jdbc:hive2://node7.localdomain:10000> USING ( select * from ts_test) AS S
0: jdbc:hive2://node7.localdomain:10000> ON T.id = S.id
0: jdbc:hive2://node7.localdomain:10000> when not matched then insert values (S.id, S.ts);
INFO  : Session is already open
INFO  : Dag name: MERGE INTO ts_test2 AS T
USING ( sel...S.ts)(Stage-1)
INFO  : Setting tez.task.scale.memory.reserve-fraction to 0.30000001192092896
INFO  : Status: Running (Executing on YARN cluster with App id application_1492638963323_0002)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Map 3 ..........   SUCCEEDED      4          4        0        0       0       0
Reducer 2 ......   SUCCEEDED      4          4        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 8.57 s
--------------------------------------------------------------------------------
Getting log thread is interrupted, since query is done!
INFO  : Loading data to table default.ts_test2 from hdfs://nnha-c6-ubu02/apps/hive/warehouse/ts_test2/.hive-staging_hive_2017-04-20_12-11-24_801_8984288262712289131-1/-ext-10000
INFO  : Table default.ts_test2 stats: [numFiles=2, numRows=0, totalSize=1271, rawDataSize=0]
No rows affected (10.344 seconds)
0: jdbc:hive2://node7.localdomain:10000> select * from ts_test2;
Getting log thread is interrupted, since query is done!
+--------------+------------------------+--+
| ts_test2.id  |      ts_test2.ts       |
+--------------+------------------------+--+
| 1            | 1973-10-20 00:00:00.0  |
| 2            | 1970-01-01 00:00:00.0  |
+--------------+------------------------+--+
2 rows selected (0.485 seconds)

0: jdbc:hive2://node7.localdomain:10000>


エラーになる例:
なぜかエイリアスは使えない
when matched then update set T.ts = S.ts
ワイルドカードはダメ
when not matched then insert values (S.*);