Pourquoi un hash join alors que la table ne tient pas en mémoire

rigaux · 31/12/2015 11:32:12

Bonjour tout le monde,

Je cherche à comprendre le fonctionnement de l'optimiseur (parce ce que je vais être amené à l'expliquer moi-même).

J'ai créé une base assez volumineuse dont le schéma est donné ci-dessous. J'ai 5M de films, 20M d'artistes, la taille des tables est supérieure au GO, j'ai limité la taille du cache à 256MO (effective_cache_size = 256MB).

J'exécute la requête suivante: explain select * from Film as f, Artiste as a where id_realisateur = a.id

Je m'attends à un parcours séquentiel de Film, combiné avec accès à l'index d'artiste. Mais Postgres applique un hash join.

"Hash Join (cost=709777.86..1838467.53 rows=5729727 width=402)"
" Hash Cond: (a.id = f.id_realisateur)"
" -> Seq Scan on artiste a (cost=0.00..340480.70 rows=20514170 width=21)"
" -> Hash (cost=352788.27..352788.27 rows=5729727 width=381)"
" -> Seq Scan on film f (cost=0.00..352788.27 rows=5729727 width=381)"

Je ne comprends pas vraiment pourquoi, la table ne tient pas en mémoire. De fait l'exécution prend une éternité. Si j'interdis le hash join, c'est le bon algo (selon moi) qui est choisi et le temps de réponse devient correct.

Je découvre postgres, jusqu'à présent j'étais plutôt sur Oracle. Merci d'avance pour toute suggestion!

CREATE TABLE Artiste (
id integer NOT NULL,
nom varchar(30) NOT NULL,
prenom varchar(30) NOT NULL,
annee_naissance integer DEFAULT NULL,
primary key (id)
) ;

CREATE TABLE Film (
id integer NOT NULL,
titre varchar(50) NOT NULL,
annee integer NOT NULL,
id_realisateur integer NOT NULL,
genre varchar(30) NOT NULL,
resume text,
code_pays varchar(4) NOT NULL,
version integer,
primary key (id),
foreign key (id_realisateur) references Artiste(id),
foreign key (code_pays) references Pays(code)
) ;

rjuju · 31/12/2015 12:07:27

Bonjour,

le paramètre effective_cache_size ne sert qu'à donner à postgres une estimation de la taille du cache du système d'exploitation. Concrètement, diminuer ce paramètre aura pour conséquence de moins utiliser les index.

Quel est le plan qui selon vous est le bon ? Si ce plan est plus efficace, c'est à priori parce que la configuration de postgres ne reflète pas les performances de votre système (je pense principalement aux paramètres effective_cache_size et random_page cost). Sans beaucoup plus de détails (quantité de mémoire, type de disque, type de raid, configuration postgres, différents plans d'exécution etc) difficile de vous aider plus.

Marc Cousin · 31/12/2015 14:55:23

Par ailleurs, un hash join n'a pas besoin que la plus petite table tienne entièrement en mémoire. C'est mieux (plus performant), mais il s'en sort en partitionnant les tables de hachage (l'algo utilisé est celui-ci: https://en.wikipedia.org/wiki/Hash_join … _hash_join)

On peut voir le nombre de partitions utilisées à l'exécution avec explain analyze.

gleu · 01/01/2016 10:47:14

Un "EXPLAIN (ANALYZE, BUFFERS)" de la requête pour les deux plans permettrait d'en dire plus, surtout si c'est accompagné de la configuration complète de PostgreSQL.

rigaux · 02/01/2016 14:32:45

Bonjour,

Merci beaucoup de votre aide. Pour moi, le "bon" plan est le suivant: le système parcourt une à une les lignes de la table Film, et pour chaque ligne accède à l'index sur la clé primaire de Artiste. Cela permet d'obtenir quasi instantanément la première ligne du résultat, et les lignes suivantes très rapidement à chaque appel du "fetch". C'est ce qui est appliqué par oracle en tout cas.

L'inconvénient du hash join c'est qu'il faut hacher toute la table Film (2 GO quand même) dans la phase de préparation. Ne connaissant pas bien Postgres, je ne sais pas pourquoi c'est la solution choisie. Il se peut que je cherche à optimiser le temps de réponse, alors que PG optimise le temps d'exécution global?

Ci-dessous le PEX avec analyze et buffers, et la configuration

"Hash Join (cost=709777.86..1838467.53 rows=5729727 width=402) (actual time=118145.485..198654.683 rows=5400000 loops=1)"
" Hash Cond: (a.id = f.id_realisateur)"
" Buffers: shared hit=294 read=430539, temp read=381444 written=379398"
" -> Seq Scan on artiste a (cost=0.00..340480.70 rows=20514170 width=21) (actual time=0.018..18057.600 rows=20200000 loops=1)"
" Buffers: shared hit=130 read=135209"
" -> Hash (cost=352788.27..352788.27 rows=5729727 width=381) (actual time=118059.100..118059.100 rows=5400000 loops=1)"
" Buckets: 1024 Batches: 1024 Memory Usage: 2318kB"
" Buffers: shared hit=161 read=295330, temp written=269755"
" -> Seq Scan on film f (cost=0.00..352788.27 rows=5729727 width=381) (actual time=0.452..40701.681 rows=5400000 loops=1)"
" Buffers: shared hit=161 read=295330"
"Planning time: 20.733 ms"
"Execution time: 199343.278 ms"

Fichier de configuration: je n'ai changé que effective_cache_size = 256MB. Il me semble que réduire la taille du cache encourage l'utilisation des index...

# -----------------------------
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
# name = value
#
# (The "=" is optional.) Whitespace may be used. Comments are introduced with
# "#" anywhere on a line. The complete list of parameter names and allowed
# values can be found in the PostgreSQL documentation.
#
# The commented-out settings shown in this file represent the default values.
# Re-commenting a setting is NOT sufficient to revert it to the default value;
# you need to reload the server.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal. If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use "pg_ctl reload". Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#
# Any parameter can also be given as a command-line option to the server, e.g.,
# "postgres -c log_connections=on". Some parameters can be changed at run time
# with the "SET" SQL command.
#
# Memory units: kB = kilobytes Time units: ms = milliseconds
# MB = megabytes s = seconds
# GB = gigabytes min = minutes
# TB = terabytes h = hours
# d = days

#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------

# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir' # use data in another directory
# (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
# (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
# (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = '' # write an extra PID file
# (change requires restart)

#------------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#------------------------------------------------------------------------------

# - Connection Settings -

listen_addresses = '*' # what IP address(es) to listen on;
# comma-separated list of addresses;
# defaults to 'localhost'; use '*' for all
# (change requires restart)
port = 5432 # (change requires restart)
max_connections = 100 # (change requires restart)
# Note: Increasing max_connections costs ~400 bytes of shared memory per
# connection slot, plus lock space (see max_locks_per_transaction).
#superuser_reserved_connections = 3 # (change requires restart)
#unix_socket_directories = '/tmp' # comma-separated list of directories
# (change requires restart)
#unix_socket_group = '' # (change requires restart)
#unix_socket_permissions = 0777 # begin with 0 to use octal notation
# (change requires restart)
#bonjour = off # advertise server via Bonjour
# (change requires restart)
#bonjour_name = '' # defaults to the computer name
# (change requires restart)

# - Security and Authentication -

#authentication_timeout = 1min # 1s-600s
#ssl = off # (change requires restart)
#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL' # allowed SSL ciphers
# (change requires restart)
#ssl_prefer_server_ciphers = on # (change requires restart)
#ssl_ecdh_curve = 'prime256v1' # (change requires restart)
#ssl_renegotiation_limit = 0 # amount of data between renegotiations
#ssl_cert_file = 'server.crt' # (change requires restart)
#ssl_key_file = 'server.key' # (change requires restart)
#ssl_ca_file = '' # (change requires restart)
#ssl_crl_file = '' # (change requires restart)
#password_encryption = on
#db_user_namespace = off

# GSSAPI using Kerberos
#krb_server_keyfile = ''
#krb_caseins_users = off

# - TCP Keepalives -
# see "man 7 tcp" for details

#tcp_keepalives_idle = 0 # TCP_KEEPIDLE, in seconds;
# 0 selects the system default
#tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds;
# 0 selects the system default
#tcp_keepalives_count = 0 # TCP_KEEPCNT;
# 0 selects the system default

#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------

# - Memory -

shared_buffers = 128MB # min 128kB
# (change requires restart)
#huge_pages = try # on, off, or try
# (change requires restart)
#temp_buffers = 8MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
# (change requires restart)
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
#work_mem = 4MB # min 64kB
#maintenance_work_mem = 64MB # min 1MB
#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
#max_stack_depth = 2MB # min 100kB
dynamic_shared_memory_type = posix # the default is the first option
# supported by the operating system:
# posix
# sysv
# windows
# mmap
# use none to disable dynamic shared memory

# - Disk -

#temp_file_limit = -1 # limits per-session temp file space
# in kB, or -1 for no limit

# - Kernel Resource Usage -

#max_files_per_process = 1000 # min 25
# (change requires restart)
#shared_preload_libraries = '' # (change requires restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0 # 0-100 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#vacuum_cost_limit = 200 # 1-10000 credits

# - Background Writer -

#bgwriter_delay = 200ms # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round

# - Asynchronous Behavior -

#effective_io_concurrency = 1 # 1-1000; 0 disables prefetching
#max_worker_processes = 8

#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------

# - Settings -

#wal_level = minimal # minimal, archive, hot_standby, or logical
# (change requires restart)
#fsync = on # turns forced synchronization on or off
#synchronous_commit = on # synchronization level;
# off, local, remote_write, or on
#wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
# fdatasync (default on Linux)
# fsync
# fsync_writethrough
# open_sync
#full_page_writes = on # recover from partial page writes
#wal_log_hints = off # also do full page writes of non-critical updates
# (change requires restart)
#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers
# (change requires restart)
#wal_writer_delay = 200ms # 1-10000 milliseconds

#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000

# - Checkpoints -

#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s # 0 disables

# - Archiving -

#archive_mode = off # allows archiving to be done
# (change requires restart)
#archive_command = '' # command to use to archive a logfile segment
# placeholders: %p = path of file to archive
# %f = file name only
# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0 # force a logfile segment switch after this
# number of seconds; 0 disables

#------------------------------------------------------------------------------
# REPLICATION
#------------------------------------------------------------------------------

# - Sending Server(s) -

# Set these on the master and on any standby that will send replication data.

#max_wal_senders = 0 # max number of walsender processes
# (change requires restart)
#wal_keep_segments = 0 # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s # in milliseconds; 0 disables

#max_replication_slots = 0 # max number of replication slots
# (change requires restart)

# - Master Server -

# These settings are ignored on a standby server.

#synchronous_standby_names = '' # standby servers that provide sync rep
# comma-separated list of application_name
# from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed

# - Standby Servers -

# These settings are ignored on a master server.

#hot_standby = off # "on" allows queries during recovery
# (change requires restart)
#max_standby_archive_delay = 30s # max delay before canceling queries
# when reading WAL from archive;
# -1 allows indefinite delay
#max_standby_streaming_delay = 30s # max delay before canceling queries
# when reading streaming WAL;
# -1 allows indefinite delay
#wal_receiver_status_interval = 10s # send replies at least this often
# 0 disables
#hot_standby_feedback = off # send info from standby to prevent
# query conflicts
#wal_receiver_timeout = 60s # time that receiver waits for
# communication from master
# in milliseconds; 0 disables

#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#seq_page_cost = 1.0 # measured on an arbitrary scale
#random_page_cost = 4.0 # same scale as above
#cpu_tuple_cost = 0.01 # same scale as above
#cpu_index_tuple_cost = 0.005 # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
effective_cache_size = 256MB

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5 # range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0 # selects default based on effort
#geqo_selection_bias = 2.0 # range 1.5-2.0
#geqo_seed = 0.0 # range 0.0-1.0

# - Other Planner Options -

#default_statistics_target = 100 # range 1-10000
#constraint_exclusion = partition # on, off, or partition
#cursor_tuple_fraction = 0.1 # range 0.0-1.0
#from_collapse_limit = 8
#join_collapse_limit = 8 # 1 disables collapsing of explicit
# JOIN clauses

#------------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#------------------------------------------------------------------------------

# - Where to Log -

log_destination = 'stderr' # Valid values are combinations of
# stderr, csvlog, syslog, and eventlog,
# depending on platform. csvlog
# requires logging_collector to be on.

# This is used when logging to stderr:
logging_collector = on # Enable capturing of stderr and csvlog
# into log files. Required to be on for
# csvlogs.
# (change requires restart)

# These are only used if logging_collector is on:
#log_directory = 'pg_log' # directory where log files are written,
# can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
# can include strftime() escapes
#log_file_mode = 0600 # creation mode for log files,
# begin with 0 to use octal notation
#log_truncate_on_rotation = off # If on, an existing log file with the
# same name as the new log file will be
# truncated rather than appended to.
# But such truncation only occurs on
# time-driven rotation, not on restarts
# or size-driven rotation. Default is
# off, meaning append to existing files
# in all cases.
#log_rotation_age = 1d # Automatic rotation of logfiles will
# happen after that time. 0 disables.
#log_rotation_size = 10MB # Automatic rotation of logfiles will
# happen after that much log output.
# 0 disables.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

# This is only relevant when logging to eventlog (win32):
#event_source = 'PostgreSQL'

# - When to Log -

#client_min_messages = notice # values in order of decreasing detail:
# debug5
# debug4
# debug3
# debug2
# debug1
# log
# notice
# warning
# error

#log_min_messages = warning # values in order of decreasing detail:
# debug5
# debug4
# debug3
# debug2
# debug1
# info
# notice
# warning
# error
# log
# fatal
# panic

#log_min_error_statement = error # values in order of decreasing detail:
# debug5
# debug4
# debug3
# debug2
# debug1
# info
# notice
# warning
# error
# log
# fatal
# panic (effectively off)

#log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this number
# of milliseconds

# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = on
#log_checkpoints = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_error_verbosity = default # terse, default, or verbose messages
#log_hostname = off
log_line_prefix = '%t ' # special values:
# %a = application name
# %u = user name
# %d = database name
# %r = remote host and port
# %h = remote host
# %p = process ID
# %t = timestamp without milliseconds
# %m = timestamp with milliseconds
# %i = command tag
# %e = SQL state
# %c = session ID
# %l = session line number
# %s = session start timestamp
# %v = virtual transaction ID
# %x = transaction ID (0 if none)
# %q = stop here in non-session
# processes
# %% = '%'
# e.g. '<%u%%%d> '
#log_lock_waits = off # log lock waits >= deadlock_timeout
#log_statement = 'none' # none, ddl, mod, all
#log_temp_files = -1 # log temporary files equal or larger
# than the specified size in kilobytes;
# -1 disables, 0 logs all temp files
log_timezone = 'Europe/Paris'

#------------------------------------------------------------------------------
# RUNTIME STATISTICS
#------------------------------------------------------------------------------

# - Query/Index Statistics Collector -

#track_activities = on
#track_counts = on
#track_io_timing = off
#track_functions = none # none, pl, all
#track_activity_query_size = 1024 # (change requires restart)
#update_process_title = on
#stats_temp_directory = 'pg_stat_tmp'

# - Statistics Monitoring -

#log_parser_stats = off
#log_planner_stats = off
#log_executor_stats = off
#log_statement_stats = off

#------------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#------------------------------------------------------------------------------

#autovacuum = on # Enable autovacuum subprocess? 'on'
# requires track_counts to also be on.
#log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and
# their durations, > 0 logs only
# actions running at least this number
# of milliseconds.
#autovacuum_max_workers = 3 # max number of autovacuum subprocesses
# (change requires restart)
#autovacuum_naptime = 1min # time between autovacuum runs
#autovacuum_vacuum_threshold = 50 # min number of row updates before
# vacuum
#autovacuum_analyze_threshold = 50 # min number of row updates before
# analyze
#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum
# (change requires restart)
#autovacuum_multixact_freeze_max_age = 400000000 # maximum multixact age
# before forced vacuum
# (change requires restart)
#autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for
# autovacuum, in milliseconds;
# -1 means use vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for
# autovacuum, -1 means use
# vacuum_cost_limit

#------------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#------------------------------------------------------------------------------

# - Statement Behavior -

#search_path = '"$user",public' # schema names
#default_tablespace = '' # a tablespace name, '' uses the default
#temp_tablespaces = '' # a list of tablespace names, '' uses
# only default tablespace
#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off
#default_transaction_deferrable = off
#session_replication_role = 'origin'
#statement_timeout = 0 # in milliseconds, 0 is disabled
#lock_timeout = 0 # in milliseconds, 0 is disabled
#vacuum_freeze_min_age = 50000000
#vacuum_freeze_table_age = 150000000
#vacuum_multixact_freeze_min_age = 5000000
#vacuum_multixact_freeze_table_age = 150000000
#bytea_output = 'hex' # hex, escape
#xmlbinary = 'base64'
#xmloption = 'content'
#gin_fuzzy_search_limit = 0

# - Locale and Formatting -

datestyle = 'iso, mdy'
#intervalstyle = 'postgres'
timezone = 'Europe/Paris'
#timezone_abbreviations = 'Default' # Select the set of available time zone
# abbreviations. Currently, there are
# Default
# Australia (historical usage)
# India
# You can create your own file in
# share/timezonesets/.
#extra_float_digits = 0 # min -15, max 3
#client_encoding = sql_ascii # actually, defaults to database
# encoding

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'C' # locale for system error message
# strings
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting

# default configuration for text search
default_text_search_config = 'pg_catalog.english'

# - Other Defaults -

#dynamic_library_path = '$libdir'
#local_preload_libraries = ''
#session_preload_libraries = ''

#------------------------------------------------------------------------------
# LOCK MANAGEMENT
#------------------------------------------------------------------------------

#deadlock_timeout = 1s
#max_locks_per_transaction = 64 # min 10
# (change requires restart)
# Note: Each lock table slot uses ~270 bytes of shared memory, and there are
# max_locks_per_transaction * (max_connections + max_prepared_transactions)
# lock table slots.
#max_pred_locks_per_transaction = 64 # min 10
# (change requires restart)

#------------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#------------------------------------------------------------------------------

# - Previous PostgreSQL Versions -

#array_nulls = on
#backslash_quote = safe_encoding # on, off, or safe_encoding
#default_with_oids = off
#escape_string_warning = on
#lo_compat_privileges = off
#quote_all_identifiers = off
#sql_inheritance = on
#standard_conforming_strings = on
#synchronize_seqscans = on

# - Other Platforms and Clients -

#transform_null_equals = off

#------------------------------------------------------------------------------
# ERROR HANDLING
#------------------------------------------------------------------------------

#exit_on_error = off # terminate session on any error?
#restart_after_crash = on # reinitialize after backend crash?

#------------------------------------------------------------------------------
# CONFIG FILE INCLUDES
#------------------------------------------------------------------------------

# These options allow settings to be loaded from files other than the
# default postgresql.conf.

#include_dir = 'conf.d' # include files ending in '.conf' from
# directory 'conf.d'
#include_if_exists = 'exists.conf' # include file only if it exists
#include = 'special.conf' # include file

#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------

# Add settings for extensions here

gleu · 02/01/2016 17:41:50

Avec un cache aussi petit (au niveau PostgreSQL comme au niveau noyau), cela me semble logique qu'il ne veuille surtout pas utiliser un index qui est bien plus cher à parcourir qu'une table. Le problème principal du plan est que la table de hachage ne tient pas en mémoire, ce qui cause des écritures sur disque qui ralentissent l'exécution de la requête.

J'aimerais bien voir ce que donnerait le plan d'exécution avec les jointures par hachage désactivées. Faites un enable_hashjoin to off, puis refaites le EXPLAIN et collez le résultat ici. Ça nous permettra d'aller plus loin dans l'analyse.

Et après, il serait bon de revoir la configuration de PostgreSQL. Notamment shared_buffers, work_mem et effective_cache_size. De combien de mémoire dispose le serveur ?

rigaux · 02/01/2016 18:15:46

Voici avec la jointure par hachage désactivée. ca correspond au plan que j'attends

"Merge Join (cost=3992318.43..4807065.34 rows=5729727 width=402)"
" Merge Cond: (a.id = f.id_realisateur)"
" -> Index Scan using artiste_pkey on artiste a (cost=0.44..668878.99 rows=20514170 width=21)"
" -> Materialize (cost=3992315.13..4020963.76 rows=5729727 width=381)"
" -> Sort (cost=3992315.13..4006639.45 rows=5729727 width=381)"
" Sort Key: f.id_realisateur"
" -> Seq Scan on film f (cost=0.00..352788.27 rows=5729727 width=381)"

Ce qui m'échappe c'est pourquoi PGSQL ne choisit pas ce plan. Normalement l'index est petit et est beaucoup plus efficace à utiliser qu'un parcours séquentiel?

Je suis sur un Mac Book pro, 4GO de RAM.

gleu · 02/01/2016 18:55:49

Encore une fois, un "EXPLAIN (ANALYZE, BUFFERS)", plutôt qu'un "EXPLAIN" tout court qui ne dit rien.

Tout ce que l'EXPLAIN nous permet de dire là, c'est que le coût est deux fois plus important avec un Merge Join qu'avec un Hash Join, d'où le choix du Hash Join. Ce coût si important est dû à l'opération de tri nécessaire sur la colonne id_realisateur qui n'est pas indexée. Il serait pertinent qu'elle le soit étant une colonne référencée dans une clé étrangère.

rigaux · 02/01/2016 19:10:47

Ok je comprends mieux. Postgres cherche à fusionner deux listes triées: l'index de artiste et la colonne id_realisateur

En ajoutant set enable_sort=off j'obtiens exactement le plan d'exécution que j'attends, sans avoir à créer un index supplémentaire.

"Nested Loop (cost=0.44..45788415.64 rows=5729727 width=402) (actual time=0.382..57633.488 rows=5400000 loops=1)"
" Buffers: shared hit=21444359 read=483647 written=3204"
" -> Seq Scan on film f (cost=0.00..352788.27 rows=5729727 width=381) (actual time=0.356..7129.131 rows=5400000 loops=1)"
" Buffers: shared read=295491"
" -> Index Scan using artiste_pkey on artiste a (cost=0.44..7.92 rows=1 width=21) (actual time=0.007..0.008 rows=1 loops=5400000)"
" Index Cond: (id = f.id_realisateur)"
" Buffers: shared hit=21444359 read=188156 written=3204"
"Planning time: 7.825 ms"
"Execution time: 58538.947 ms"

Et mon temps d'exécution tombe à 58s, presque 4 fois moins qu'en permettant le tri. Pourquoi postgres ne choisit-il pas cette solution qui semble bien meilleure?

Merci beaucoup en tout cas de vos remarques. Si vous avez des explications je suis preneur, je creuse la question de mon côté.

PR

gleu · 02/01/2016 19:29:39

Hé bien, là-aussi, le coût calculé est prohibitif par rapport au Hash Join. Il est de 45 millions dans le cas du Nested Loop, et de 1,8 million dans le cas du Hash Join. Il n'y a pas photo pour lui :-)

Le coût de 45 millions est principalement dû à la lecture de l'index (7.92*5400000 donne en gros 42.7 millions, à comparer au 45.7 millions globalement). Ce qu'il ne sait pas, c'est que l'index est majoritairement en cache. Si on voulait simuler ça, une idée serait de faire un "SET random_page_cost TO 1", puis de nouveau l'EXPLAIN (après avoir remis enable_hashjoin et enable_sort à on).

De combien de mémoire dispose le serveur ?

rigaux · 02/01/2016 20:47:03

D'accord, ça devient de plus en plus clair.

En fait le hash join optimise le temps total d'exécution, mais le temps de réponse (start-up cost dans la doc postgres) est nettement moins bon à cause de la phase
de pré-processing. Normal.

L'algo basé sur les index optimise le temps de réponse, au détriment du temps total d'exécution. C'est logique car le plan engendre beaucoup d'accès aléatoires, mais c'est mieux si on traite les lignes une à une dans une appli.

En tout cas, j'ai les explications qu'il me faut. Merci encore: je continue mes explorations.

Forums PostgreSQL.fr

#1 31/12/2015 11:32:12

Pourquoi un hash join alors que la table ne tient pas en mémoire

#2 31/12/2015 12:07:27

Re : Pourquoi un hash join alors que la table ne tient pas en mémoire

#3 31/12/2015 14:55:23

Re : Pourquoi un hash join alors que la table ne tient pas en mémoire

#4 01/01/2016 10:47:14

Re : Pourquoi un hash join alors que la table ne tient pas en mémoire

#5 02/01/2016 14:32:45

Re : Pourquoi un hash join alors que la table ne tient pas en mémoire

#6 02/01/2016 17:41:50

Re : Pourquoi un hash join alors que la table ne tient pas en mémoire

#7 02/01/2016 18:15:46

Re : Pourquoi un hash join alors que la table ne tient pas en mémoire

#8 02/01/2016 18:55:49

Re : Pourquoi un hash join alors que la table ne tient pas en mémoire

#9 02/01/2016 19:10:47

Re : Pourquoi un hash join alors que la table ne tient pas en mémoire

#10 02/01/2016 19:29:39

Re : Pourquoi un hash join alors que la table ne tient pas en mémoire

#11 02/01/2016 20:47:03

Re : Pourquoi un hash join alors que la table ne tient pas en mémoire

Pied de page des forums