Oushu Database 3.0新特性 (发布日期:2017年9月20日) -------------------------------------------------------------- Oushu Database 3.0是一个Major Release,最大的一个亮点功能是完全重新设计的新执行器,性>能比2.2版本的执行器要快10倍左右,是当之无愧的世界上先进的执行器。新执行器充分利用了新>的CPU硬件指令,比如SIMD等,可以做到不浪费CPU的每一个时钟周期。 众所周知,Oushu Database 2.2版本的执行器基于Apache HAWQ的执行器,而Apache HAWQ的执行器 起源于Greenplum Database和PostgreSQL,是一个很成熟的执行器,经过多年的优化和沉淀。Oushu Database 3.0的全新执行器能够比老的执行器快10倍左右,其中难度可想而知,性能的指数级提 升是偶数卓越技术团队经过无数个日夜探索的巨大突破。 具体来说,Oushu Database 3.0版本比Oushu Database 2.2.0版本有如下新特性增强: * 全新执行器,目前已支持scan,projection,filter,aggregation等基本操作,性能提升约10倍。不支持的操作会fallback到老的执行器 * 支持ORC外部存储格式,结合新的执行器,外部存储的性能提升10-50倍 * 支持新一代可插拔存储框架,只需编写几个函数就可以添加一个外部数据源 * 支持PostGIS,提供了存储、查询和修改空间关系的能力 * 关键Bug fixes 下表给出了Oushu Database 3.0版本支持的操作系统。 .. list-table:: :widths: auto :header-rows: 1 * - 操作系统 - 版本 * - Redhat/Centos - 7.0 * - Redhat/Centos - 7.1 * - Redhat/Centos - 7.2 * - Redhat/Centos - 7.3 体验新执行器 ++++++++++++++++++++++++++ 本章节通过TPCH lineitem 表来展示新执行器的使用。 建立lineitem 外部表用来生成TPCH lineitem 数据, :: CREATE EXTERNAL WEB TABLE E_LINEITEM ( L_ORDERKEY INT8 , L_PARTKEY INTEGER , L_SUPPKEY INTEGER , L_LINENUMBER INTEGER , L_QUANTITY FLOAT , L_EXTENDEDPRICE FLOAT , L_DISCOUNT FLOAT , L_TAX FLOAT , L_RETURNFLAG VARCHAR(1) , L_LINESTATUS VARCHAR(1) , L_SHIPDATE TEXT , L_COMMITDATE TEXT , L_RECEIPTDATE TEXT , L_SHIPINSTRUCT CHAR(25) , L_SHIPMODE VARCHAR(10) , L_COMMENT VARCHAR(44) ) EXECUTE 'bash -c "$GPHOME/bin/dbgen -b $GPHOME/bin/dists.dss -T L -s 1 -N 6 -n $((GP_SEGMENT_ID + 1))"' on 6 format 'text' (delimiter '|'); 创建orc表, 其中localhost:9000 指向 "hawq_dfs_url"。 :: CREATE EXTERNAL TABLE lineitem ( L_ORDERKEY INT8, L_PARTKEY INTEGER, L_SUPPKEY INTEGER, L_LINENUMBER INTEGER, L_QUANTITY FLOAT, L_EXTENDEDPRICE FLOAT, L_DISCOUNT FLOAT, L_TAX FLOAT, L_RETURNFLAG TEXT, L_LINESTATUS TEXT, L_SHIPDATE TEXT, L_COMMITDATE TEXT, L_RECEIPTDATE TEXT, L_SHIPINSTRUCT TEXT, L_SHIPMODE TEXT, L_COMMENT TEXT) LOCATION ('hdfs://localhost:9000/hawq_default/lineitem') FORMAT 'orc'; 插入数据 :: INSERT INTO lineitem SELECT * FROM e_lineitem; 使用新执行器(注:现在新执行器只支持ORC格式,不支持的功能会自动fallback到老执行器) :: -----获取表行数------ postgres=# set new_executor = on; SET postgres=# SELECT COUNT(*) FROM lineitem; count --------- 6001215 (1 row) Time: 17.006 ms postgres=# set new_executor = off; SET postgres=# SELECT COUNT(*) FROM lineitem; count --------- 6001215 (1 row) Time: 213.248 ms -----TPCH 查询 1 ------ postgres=# set new_executor = on; SET postgres=# SELECT l_returnflag, l_linestatus, sum(l_quantity)::bigint as sum_qty, sum(l_extendedprice)::bigint as sum_base_price, sum(l_extendedprice * (1 - l_discount))::bigint as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax))::bigint as sum_charge, avg(l_quantity)::bigint as avg_qty, avg(l_extendedprice)::bigint as avg_price, avg(l_discount)::bigint as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= '1998-08-20' GROUP BY l_returnflag, l_linestatus; l_returnflag | l_linestatus | sum_qty | sum_base_price | sum_disc_price | sum_charge | avg_qty | avg_price | avg_disc | count_order --------------+--------------+----------+----------------+----------------+--------------+---------+-----------+----------+------------- R | F | 37719753 | 56568041381 | 53741292685 | 55889619120 | 26 | 38251 | 0 | 1478870 N | F | 991417 | 1487504710 | 1413082168 | 1469649223 | 26 | 38284 | 0 | 38854 A | F | 37734107 | 56586554401 | 53758257135 | 55909065223 | 26 | 38273 | 0 | 1478493 N | O | 73808911 | 110700990251 | 105167436999 | 109377979031 | 26 | 38248 | 0 | 2894278 (4 rows) Time: 234.376 ms postgres=# set new_executor = off; SET postgres=# SELECT l_returnflag, l_linestatus, sum(l_quantity)::bigint as sum_qty, sum(l_extendedprice)::bigint as sum_base_price, sum(l_extendedprice * (1 - l_discount))::bigint as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax))::bigint as sum_charge, avg(l_quantity)::bigint as avg_qty, avg(l_extendedprice)::bigint as avg_price, avg(l_discount)::bigint as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= '1998-08-20' GROUP BY l_returnflag, l_linestatus; l_returnflag | l_linestatus | sum_qty | sum_base_price | sum_disc_price | sum_charge | avg_qty | avg_price | avg_disc | count_order --------------+--------------+----------+----------------+----------------+--------------+---------+-----------+----------+------------- R | F | 37719753 | 56568041381 | 53741292685 | 55889619120 | 26 | 38251 | 0 | 1478870 N | F | 991417 | 1487504710 | 1413082168 | 1469649223 | 26 | 38284 | 0 | 38854 A | F | 37734107 | 56586554401 | 53758257135 | 55909065223 | 26 | 38273 | 0 | 1478493 N | O | 73808911 | 110700990251 | 105167436999 | 109377979031 | 26 | 38248 | 0 | 2894278 (4 rows) Time: 2341.147 ms 从 2.2.0 升级到 3.0.0 +++++++++++++++++++++++++++++++++++++++++ 1. 在master节点关闭hawq 2.2.0集群 :: sudo su gpadmin hawq stop cluster -a 2. 在所有hawq节点上备份 hawq2.2.0的设置 :: hawq ssh -f hostfile -e 'cp -rf $GPHOME/etc ~/' 3. 在每个hawq节点上安装hawq 3.0.0的yum源 :: hawq ssh -f hostfile -e 'sudo wget -O /etc/yum.repos.d/oushu-database.repo $获取的repo地址' hawq ssh -f hostfile -e 'sudo wget -O /etc/yum.repos.d/oushu-database-utils.repo $获取的repo地址' hawq ssh -f hostfile -e 'sudo yum makecache' 4. 在每个hawq节点上安装hawq 3.0.0的rpm包 :: hawq ssh -f hostfile -e 'sudo yum remove -y hawq' sudo yum install -y hawq hawq ssh -f hostfile -e 'sudo yum install -y hawq' 5. 配置每个hawq节点 :: hawq ssh -f hostfile -e 'cp -rf ~/etc/* $GPHOME/etc/' 6. 安装ORC 1) 在master节点上配置集群到升级模式 :: hawq start cluster hawq config -c upgrade_mode -v on --skipvalidation hawq restart cluster -a 2) 在master节点上的template1库的pg_catalog.pg_proc表上安装ORC,在$GPHOME/etc/hawq-site.xml文件中找到$hawq_master_address_port的值,用其实际值替换命令中的$hawq_master_address_port :: PGOPTIONS='-c gp_session_role=utility' psql -a -p $hawq_master_address_port -d template1 -f $GPHOME/share/postgresql/orc_install.sql > orc_install.out 2>&1 3) 在每个segment节点的template1库的pg_catalog.pg_proc表上安装ORC,在$GPHOME/etc/hawq-site.xml文件中找到$hawq_segment_address_port的值,用其实际值替换命令中的$hawq_segment_address_port :: source /usr/local/hawq/greenplum_path.sh PGOPTIONS='-c gp_session_role=utility' psql -a -p $hawq_segment_address_port -d template1 -f $GPHOME/share/postgresql/orc_install.sql > orc_install.out 2>&1 4) 在master节点上将集群配置为普通模式 :: hawq config -c upgrade_mode -v off --skipvalidation hawq restart cluster -a 5) 在master节点获取数据库用户名称 :: psql -a -d template1 -c "select datname from pg_database where datname not in ('hcatalog', 'template0', 'template1') order by datname;" 6) 在master节点的每个数据库的pg_catalog.pg_proc表中安装ORC :: psql -a -d $user_database_name -f $GPHOME/share/postgresql/orc_install.sql > orc_install.out 2>&1 7) 在master节点重启集群 :: hawq restart cluster -a