How to populate a table's foreign keys from other tables
Solution 1
This can be simplified to:
INSERT INTO translation (id, translated, language_id, template_id)
SELECT tmp.id, tmp.translated, l.id, t.id
FROM tmp_table tmp
JOIN language l USING (langname)
JOIN template t USING (tplname, source, domain)
ORDER BY tmp.id
I added an ORDER BY
clause that you don't strictly need, but certain queries may profit if you insert your data clustered that (or some other) way.
If you want to avoid losing rows where you can't find a matching row in language
or template
, make it LEFT JOIN
instead of JOIN
for both tables (provided that language_id
and template_id
can be NULL
.
In addition to what I already listed under the prequel question: If the INSERT is huge and constitutes a large proportion of the target table, it is probably faster to DROP all indexes on the target table and recreate them afterwards. Creating indexes from scratch is a lot faster then updating them incrementally for every row.
Unique indexes additionally serve as constraints, so you'll have to consider whether to enforce the rules later or leave them in place.
Solution 2
insert into translation (id, translated, language_id, template_id)
select tmp.id, tmp.translated, l.id, t.id
from tmp_table tmp, language l, template t
where l.langname = tmp.langname
and t.tplname = tmp.tplname
and t.source = tmp.source
and t.domain = tmp.domain;
Solution 3
I'm not as familiar with PostgreSQL as other RDBMS but it should be something like:
INSERT INTO translation
SELECT s.id, s.translated, l.id, t.id FROM tmp_table s
INNER JOIN language l ON (l.langname = s.langname)
INNER JOIN template t ON (t.tplname = s.tplname)
Looks like someone just posted basically the same answer with slightly different syntax, but keep in mind: If there is no matching langname or tplname in the joined tables the rows from tmp_table will not get inserted at all and this will not make sure you don't create duplicates of translation.id (so make sure you don't run it more than once).
David Planella
I work at GitLab as Director of Community Relations. Before, I worked for Canonical as the former Ubuntu Community Team Manager. As an Open Source contributor, I am mostly involved in app development and localization: I'm the developer of Qreator, former lead the Ubuntu Catalan Translators team and also a GNOME translator. In the past I've contributed to other projects, such as Debian or Mozilla. Blog Google+ Twitter
Updated on August 12, 2022Comments
-
David Planella over 1 year
I've got the following tables, of which
translation
is empty and I'm trying to fill:translation { id translated language_id template_id } language { id langname langcode } template { id tplname source domain total }
The source data to fill
translation
is a temporary table that I've populated from an external CSV file:tmp_table { id translated langname tplname source domain }
What I'd like to do is to fill
translation
with the values fromtmp_table
. Thetranslated
field can be copied directly, but I'm not quite sure how to fetch the rightlanguage_id
(tmp_table.langname could be used to determine language.id) andtemplate_id
(tmp_table.tplname, tmp_table.source, tmp_table.domain together can be used to determine template.id).It might be a trivial question, but I'm quite new to SQL and not sure what the best query should be to populate the
translation
table. Any ideas? -
sam yi about 12 yearsUse joins. This is very difficult to read. hashmysql.org/wiki/Comma_vs_JOIN
-
David Planella about 12 yearsThanks! I'm not too clear on the indexes part, though. The target table was generated from Django models, and looking at it from pgadmin3, it looks like it got created with an index on each foreign key (e.g.
CREATE INDEX translation_language_id ON translation USING btree (language_id );
) - you mean that in order to increase the performance of the insert operation, I should drop all of these indexes before the insert query and then generate each one of them again with the sameCREATE INDEX [...]
query? -
Erwin Brandstetter about 12 years@DavidPlanella: Exactly. This would also be safe, as the nature of the query observes the foreign key rules. If you worry about concurrent operations, do it all in one transaction. If you are not convinced this would be faster, just run a test in a copy of your database. EXPLAIN ANALYZE can be used for timing.