Apache Impala

Impala Web UI port

  • Impala Daemon: 25000 ( cluster-wn1.abc.com:25000 )
  • Statestore Daemon: 25010 ( cluster-nn.abc.com:25010 )
  • Catalog Daemon: 25020 ( cluster-nn.abc.com:25020 )

Internal Wiki related To Impala

Impala Related External References

Production Architecture

  • impala-server Service는 WorkerNode 뿐만아니라 NameNode에서도 서비시를 제공할 수 있다. 하지만, impala-server service 자체가 Memory 및 CPU 자원을 많이 소비하므로 NameNode에서는 impala-server service를 지원하지 않는 것이 좋다.
  • statestore daemon과 catalog damon은 NameNode에서 서비스 제공

Impala built-in functions

show functions in _impala_builtins;

Impala에서 UDF 사용하기

SHOW functions

impala-shell> show functions;
Query: show functions
+-------------+------------------------------------+-------------+---------------+
| return type | signature                          | binary type | is persistent |
+-------------+------------------------------------+-------------+---------------+
| STRING      | json_get_object(STRING, STRING)    | NATIVE      | true          |
+-------------+------------------------------------+-------------+---------------+
Fetched 1 row(s) in 0.02s

DROP function

impala-shell> drop function json_get_object(string, string);

CREATE function

create function json_get_object(string, string) returns string location '/user/hive/udf/impala_udf.so' symbol='JsonGetObject';

Tracking Bug or Enhancement

Inserting data into INT, STRING

  • env: Impala 2.8
  • INT insertion shouldn't be wrapped with quote.
  • STRING insertion has to be wrapped with quote.
    create TABLE test_sorting_column_type (
        int_column INT,
        str_column STRING)
    STORED AS PARQUET
    
    Fetched 0 row(s) in 0.25s
    
    [local-cluster:21000] > insert into test_sorting_column_type values (1,1);
    Query: insert into test_sorting_column_type values (1,1)
    Query submitted at: 2016-10-27 16:21:56 (Coordinator: http://local-cluster:25000)
    ERROR: AnalysisException: Target table 'default.test_sorting_column_type' is incompatible with source expressions.
    Expression '1' (type: TINYINT) is not compatible with column 'str_column' (type: STRING)
    
    [local-cluster:21000] > insert into test_sorting_column_type values (1,"1");
    Query: insert into test_sorting_column_type values (1,"1")
    Query submitted at: 2016-10-27 16:22:09 (Coordinator: http://local-cluster:25000)
    Query progress can be monitored at: http://local-cluster:25000/query_plan?query_id=684020643ae905bc:97d4a38200000000
    Inserted 1 row(s) in 11.23s
    
    [local-cluster:21000] > insert into test_sorting_column_type values ("2","2");
    Query: insert into test_sorting_column_type values ("2","2")
    Query submitted at: 2016-10-27 16:22:28 (Coordinator: http://local-cluster:25000)
    ERROR: AnalysisException: Target table 'default.test_sorting_column_type' is incompatible with source expressions.
    Expression ''2'' (type: STRING) is not compatible with column 'int_column' (type: INT)
    
    [local-cluster:21000] > insert into test_sorting_column_type values (3,"3");  
    Query: insert into test_sorting_column_type values (3,"3")
    Query submitted at: 2016-10-27 16:22:39 (Coordinator: http://local-cluster:25000)
    Query progress can be monitored at: http://local-cluster:25000/query_plan?query_id=3541655ee7e2fcf1:d450803700000000
    Inserted 1 row(s) in 10.57s
    

Impala Sorting

  • env: 2.8
  • if Column Type is String, but having INT value, it won't be sorted like INT value.
    create TABLE test_sorting_column_type (
        int_column INT,
        str_column STRING)
    STORED AS PARQUET
    
    [local-cluster:21000] > select * from  test_sorting_column_type order by str_column desc;
    Query: select * from  test_sorting_column_type order by str_column desc
    Query submitted at: 2016-10-27 16:26:36 (Coordinator: http://local-cluster:25000)
    Query progress can be monitored at: http://local-cluster:25000/query_plan?query_id=b5437027611394fd:aa55219800000000
    +------------+------------+
    | int_column | str_column |
    +------------+------------+
    | 3          | 3          |
    | 2          | 2          |
    | 111        | 111        |
    | 1          | 1          |
    +------------+------------+
    WARNINGS: Unknown disk id.  This will negatively affect performance. Check your hdfs settings to enable block location metadata.
    
    Fetched 4 row(s) in 0.22s
    [local-cluster:21000] > select * from  test_sorting_column_type order by int_column desc;
    Query: select * from  test_sorting_column_type order by int_column desc
    Query submitted at: 2016-10-27 16:26:47 (Coordinator: http://local-cluster:25000)
    Query progress can be monitored at: http://local-cluster:25000/query_plan?query_id=4747300451d042e5:71f8387800000000
    +------------+------------+
    | int_column | str_column |
    +------------+------------+
    | 111        | 111        |
    | 3          | 3          |
    | 2          | 2          |
    | 1          | 1          |
    +------------+------------+
    WARNINGS: Unknown disk id.  This will negatively affect performance. Check your hdfs settings to enable block location metadata.
    
    Fetched 4 row(s) in 0.22s
    

Performance Tuning

Impala Profile Log

Last modified 23 months ago Last modified on 03/15/18 12:36:29