0

I have some Postgres tables:

CREATE TABLE source_redshift.staticprompts (
    id              INT,
    projectid       BIGINT,
    scriptid        INT,
    promptnum       INT,
    prompttype      VARCHAR(20),
    inputs          VARCHAR(2000),
    attributes      VARCHAR(2000),
    text            VARCHAR(2000),
    corpuscode      VARCHAR(2000),
    comment         VARCHAR(2000),
    created         TIMESTAMP,
    modified        TIMESTAMP


);

and 

CREATE TABLE target_redshift.user_input_conf (
    collect_project_id      BIGINT NOT NULL UNIQUE,
    prompt_type             VARCHAR(20),
    prompt_input_desc       VARCHAR(300),
    prompt_input_name       VARCHAR(100),
    no_of_prompt_count      BIGINT,
    prompt_input_value      VARCHAR(100) UNIQUE,
    prompt_input_value_id   BIGSERIAL PRIMARY KEY,
    script_id               BIGINT,
    corpuscode              VARCHAR(20),
    min_recordings          VARCHAR(2000),
    max_recordings          VARCHAR(2000),
    recordings_count        VARCHAR(2000),
    lease_duration          VARCHAR(2000),
    date_created            TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
    date_updated            TIMESTAMP WITHOUT TIME ZONE,
    CONSTRAINT must_be_different UNIQUE (prompt_input_value,collect_project_id)

);

I need copy data from staticprompts to user_input_conf with this rules:

Primary Key : prompt_input_value_id

Unique Values : collect_project_id, prompt_input_value

Data Load Logic :

Insert only when new prompt input value is found for given collect project from source. Inputs column stores the values in JSON format in staticprompts table.

Insert :

Generate unique sequence number for each of the new prompt input value for a collect project id from source and store in prompt_input_value_id.

Update :

If prompt value already exists for a collect project and if there are any value changes on prompt_input_desc or prompt input name or prompt input value then update only those columns.

prompt_input_value_id - Generate unique sequence number for the combination of each prompt_input_value and collect_project_id

prompt_input_value - Inputs.value is stored in the inputs column as JSON text. Create a unique record for each inputs.value. Look at the example below this table.

I try to use this query:

INSERT INTO target_redshift.user_input_conf AS t (
            collect_project_id,
            prompt_type,
            prompt_input_desc,
            prompt_input_name,
            prompt_input_value,
            script_id,
            corpuscode)
        SELECT
            s.projectid,
            s.prompttype,
            s.inputs::jsonb#>>'{inputs,0,desc}' AS desc,
            s.inputs::jsonb#>>'{inputs,0,name}' AS name,
            s.inputs::jsonb#>>'{inputs,0,values}' AS values,
            s.scriptid,
            s.corpuscode
        FROM source_redshift.staticprompts AS s
        ON CONFLICT (collect_project_id, prompt_input_value)
        DO UPDATE SET
            (prompt_input_desc, prompt_input_name, prompt_input_value, date_updated) =
            (EXCLUDED.prompt_input_desc, EXCLUDED.prompt_input_name, EXCLUDED.prompt_input_value, NOW())
        WHERE t.prompt_input_desc != EXCLUDED.prompt_input_desc
            OR t.prompt_input_name != EXCLUDED.prompt_input_name
            OR t.prompt_input_value != EXCLUDED.prompt_input_value;
    """)

But I get an error:

psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "user_input_conf_collect_project_id_key"
DETAIL:  Key (collect_project_id)=(1) already exists.
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263

1 Answers1

1

I think there is a misunderstanding. A unique constraint over two columns does not mean that each of the columns is unique, but that the combination of the two columns is unique.

So your must_be_different is different (and weaker) than the unique constraints on prompt_input_value and collect_project_id. For example, if you have the three rows

 collect_project_id | prompt_input_value
--------------------+--------------------
                  1 | a
                  1 | b
                  2 | b

they will create a conflict with both single-column unique constraints, but nor with must_be_different.

I guess that the underlying problem is that you want to use INSERT ... ON CONFLICT with multiple unique constraints. That cannot be done; see this question for a discussion and potential solutions.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263