pgpool: Features and Development

adorepump 1,903 views 38 slides Aug 23, 2009
Slide 1
Slide 1 of 38
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38

About This Presentation

No description available for this slideshow.


Slide Content

pgpool: Features and
Development
Tatsuo Ishii
pgpool Global Development Group
SRA OSS, Inc. Japan

Agenda
●Developers
●History
●Existing pgpool project
●Ongoing pgpool-II project
●Demonstration

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 3
Who are we?
●pgpool Global Development Group
–Tatsuo Ishii(SRA OSS, Inc. Japan)
–Devrim Gunduz(Command Prompt, Inc.)
–Yoshiyuki Asaba(SRA OSS, Inc. Japan)
–Taiki Yamaguchi(SRA OSS, Inc. Japan)
–pgpoo-II development team
●In addition to Tatsuo, Yoshiyuki and Taiki:
–Tomoaki Sato, Yoshiharu Mori, Kaori Inaba (all from SRA
OSS, Inc. Japan)

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 4
Developers!
Yoshiyuki
Project leader
pgpool
developer
Yoshiharu
Query rewriting
Parallel execution
engin
Kaori
Project manager
System DB
Tatsuo
Enhance
pgpool-I
pgpool
developer
Taiki
PCP
Query Cache
pgpool
developer
Tomoaki
Communication
manager
Our Boss
“Green Turtle”

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 5
pgpool-I: the history
●V0.1: Started as a personal project
(2003/6/27)
●V1.0: Synchronous replication (2004/3)
●V2.0: Load balance (2004/6)
●V3.0: pgpool Global Development
Group(2006/2)

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 6
Why pgpool?
●No general purpose connection pooling
software was available
–Java has its own, but PHP does not...
●No small to mid scale/light weight
synchronous replication tool was
available

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 7
pgpool
parent process
pgpool
child process
pre-fork
PostgreSQL
backend process
pgpool process architecture
pgpool.conf

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 8
pgpool functionality:
connection pooling
●Reduce connection overhead
●Limit maximum number of connections to
the PostgreSQL backend
– New incoming connections are queued in
the kernel if all pgpool processes are busy

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 9
How does connection pooling
work?(1)
PostgreSQL
pgpool
user1/db1
user1/db1
PostgreSQL
user1/db1
user1/db1
PostgreSQL
user1/db1
PostgreSQL
user1/db1

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 10
How does connection pooling
work?(2)
pgpool
user1/db1
user2/db2
PostgreSQL
user2/db2
user2/db2
PostgreSQL
user3/db3
user3/db3
user2/db2
PostgreSQL
user3/db3

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 11
pgpool understands
frontend/backend protocol
●Pgpool is transparent to both applications
and PostgreSQL
●Virtually no modifications to applications
are needed
●No special APIs are needed
●Can be used with existing language APIs
●Can be more efficient than libpq because
of smaller controlling granuality

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 12
Limitations of current
implementation
●resetting pooled connection may not be
perfect(need modification to applications)
–temp tables
–need help from PostgreSQL
●SSL is not supported
–fall back to non SSL mode
●no pg_hba.conf like IP based
authentication

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 13
pgpool functionality: replication
●Queries are duplicated and sent to
PostgreSQL servers
pgpool
PostgreSQL
PostgreSQL
query
query

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 14
Dead lock problem
session 1 session 2 session 1 session 2
master secondary
lock
lock
lock
lock
t
wait
wait

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 15
“Strict” mode in replication
to avoid deadlock problem
pgpool
PostgreSQL
PostgreSQL
Query
packet
pgpool
PostgreSQL
PostgreSQL
Wait until
something
returns
pgpool
PostgreSQL
PostgreSQL
Query
pgpool
PostgreSQL
PostgreSQL
Wait until
something
returns
reply back
the result
Query

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 16
pgpool functionality: load
balanace
●SELECT queries are sent to randomly
chosen backend
●The ratio for load balancing can be
changed
pgpool
PostgreSQL
PostgreSQL

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 17
Limitations of current
implementation in replication
●CURRENT_TIMESTAMP and server- dependent-
value-returning-functions cannot be replicated
●MD5 and crypt authentication are not supported
●Sequences and SERIAL needs table locking if
there are more than 1 connections
●Functions having side effects cannot be load
balanced

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 18
pgpool-II project
●Information-Technology Promotion
Agency, Japan (IPA: http://www.ipa.go.jp)
granted project
●Started in February 2006, expected to
release the first version in September
2006 under BSD license
●Features including parallel query and
enhancement to pgpool
●Successor to pgpool

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 19
Goal of pgpool-II project
●Implement parallel query processing
●Enhance pgpool
–Allow to have more than 2 DB nodes
–More precise control using shared memory
–Easy to manage
●Control port/protocol
●Detailed statistics on node status
●GUI management tool

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 20
pgpool-II architecture overview
Communication
Manager
PCP lib
PCP command
SQL
Parser
pgpool
Catalog
PostgreSQL
Query
Rewriting
Parallel
execution
engine
PostgreSQL
shared
memory
manager
pgpoolAdmin
Replication/load
balance
engine
pgpool-II
System DB
DB node
Client
Query
Cache

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 21
pgpool-I and pgpool-II mode
pgpoo-I
mode
1000
2000
3000
4000
1000
2000
3000
4000
pgpoo-II
mode
1000
2000
3000
4000
replication
load balance
fail over
virtually compatible with pgpool
parallel query

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 22
Table partitioning control
CREATE TABLE dist_def (
dbname TEXT, -- database name
schema_name TEXT, --schema name
table_name TEXT, -- table name
col_name TEXT, -- key col name
col_list TEXT[], -- col names
type_list TEXT[], -- col types
dist_def_func TEXT, -- function name
PRIMARY KEY (dbname, schema_name, table_name)
);
INSERT INTO dist_def VALUES ('y-mori','public','accounts','aid',
ARRAY['aid','bid','abalance','filler'],
ARRAY['integer','integer','integer','character(84)'],'dist_def_accounts'
);
CREATE OR REPLACE FUNCTION dist_def_accounts (val ANYELEMENT)
RETURNS INTEGER AS '
SELECT CASE WHEN $1 >= 0 and $1 < 100001 THEN 0
WHEN $1 >= 100001 and $1 < 200001 THEN 1
WHEN $1 >= 200001 and $1 < 300000 THEN 2
END' LANGUAGE SQL;

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 23
Simple parallel query
SELECT * FROM accounts
WHERE aid = 1000;
pgpool
PostgreSQL
PostgreSQL
SELECT * FROM accounts
WHERE aid = 1000;
SELECT * FROM accounts
WHERE aid = 1000;

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 24
Complex query example
SELECT * FROM accounts
WHERE aid = 1000
ORDER BY aid;
pgpool
PostgreSQL
SELECT * FROM accounts
WHERE aid = 1000;
System DB
PostgreSQL
SELECT * dblink('con',
'SELECT pool_parallel('SELECT * FROM accounts
WHERE aid = 1000')') AS foo(...)
ORDER BY aid;
pgpool SELECT pool_parallel('SELECT * FROM accounts
WHERE aid = 1000')')

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 25
DML
●INSERT recognize partition key value in a
query and INSERT into appropreate DB
node
●UPDATE/DELETE simply issues the same
query to all DB nodes

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 26
INSERT
INSERT INTO accounts(aid, bid)
VALUES(500,100);
SELECT
dist_def_accounts(500);
call
return
DB node = 1
Syetm DB
DB node 0 DB node 1 DB node 2
aid = 0-499 aid = 500-999 aid = 1000-1499
INSERT

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 27
Query Cache
CREATE TABLE pgpool_catalog.query_cache (
hash TEXT, -- query string(MD5 hashed)
query TEXT, -- query string
value bytea, -- query result(RowDescription and
DataRow packets)
dbname TEXT, -- database name
create_time TIMESTAMP WITH TIME ZONE,-- cache creation time
PRIMARY KEY(hash)
);
●Caches query result
●Caches can be validated by pgpoolAdmin

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 28
Using parallel query and
replication together
pgpool-II
with parallel query
pgpool-II
with replication
pgpool-II
with replication
PostgreSQL PostgreSQL PostgreSQL PostgreSQL
Parallel query
layer
Replication
and load
balance
layer

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 29
pgpoolAdmin
●Web based pgpool
management tool
●Apache/PHP/Smarty
●functions
–Stopping pgpool
–Switch over
–Monitoring connection pool status
–Monitoring process status
–Editing configuration file
–Query Cache management

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 30
Benchmarks
●Hitachi Blade Symphony BS1000
–10 blades
–Xeon 3.0GHz x 1
–1GB Mem
–UL320 SCSI Disks
●Cent OS 4.3
●PostgreSQL 8.1.4/pgbench
●Please note that these results are measured on
pre-alpha version of pgpool-II and maybe slightly
different from the future official release version

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 31
Sequencial Scan case(mid size)
1 2 4 8 16 32
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
PostgreSQLpgpool-II
concurrent users
T
P
S
●scale factor = 90
(90M rows)
●SELECT * FROM
accounts WHERE
aid = :aid
●pgpool-II is 7 times
faster at the best
(32 concurrent
users)

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 32
Index Scan case(mid size)
1 2 4 8 16 32
0
250
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
3250
3500
3750
PostgreSQLpgpool-II
concurrent users
T
P
S
●scale factor = 90
(90M rows)
●SELECT * FROM
accounts WHERE
aid = :aid
●pgpool-II is 9 times
faster at the best
(16 concurrent
users)

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 33
Index Scan case(large size)
1 2 4 8 16 32
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
PostgreSQLpgpool-II
concurrent users
T
P
S
●scale factor = 900
(900M rows)
●SELECT * FROM
accounts WHERE
aid = :aid
●pgpool-II is 18
times faster at the
best (32
concurrent users)

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 34
complex query case
1 2 4 8
0
5
10
15
20
25
30
35
40
45
PostgreSQLpgpool-II
concurrent users
T
P
S
●scale factor = 900
(900M rows)
●SELECT a1.abalance
FROM accounts as a1
,accounts as a2
WHERE a1.aid = :aid1
and a2.aid = :aid2
●pgpool-II is faster
only when at 8
concurrent users

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 35
pgpool-II restrictions
●SQL restrictions
–COPY is not supported
–SELECT INTO, INSERT INTO ... SELECT is not supported
–Extended protocol is not supported
–INSERT requires explicit values if target column is the key for data
patitioning
–UPDATE with WHERE clause including function call is not supported
–CREATE TABLE/ALTER TABLE requires pgpool restarting
●Transactions
–If a DB node fails during INSERT/UPDATE/DELETE, data consistency will
be broken
–SQL commands via dblink will be executed as separate transactions

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 36
Future plans(TODO)
●More intelligent query rewriting
●Remove SQL restrictions
●Employ two phase commit to keep consistency
among DB nodes. However this technique can
be used only for queries outside transaction
block since PREPARE TRANSACTION closes
current transaction block
●More intelligent cache validation

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 37
References
●pgpool page
–http://pgpool.projects.postgresql.org/
● pgpool-II page
–English page
●http://pgpool.projects.postgresql.org/pgpool-II/en/
–Japanese page
●http://pgpool.projects.postgresql.org/pgpool-II/ja/

2006/07/09 Tronto Copyright(c) 2006 pgpool DG 38
Thank you!