I'm streaming data into a BigQuery table building an InsertAllRequest
which is then inserted using the insertAll
-method from com.google.cloud.bigquery.BigQuery
. I git it all to work in the sense that I can insert data into the table but I'm out for a specific behavior: I'd like to implement some kind of a "composite key" in the table.
Here's what the table looks like:
Field name | Type | Mode
--------------------------------------
order_id | STRING | REQUIRED
modified_ts | TIMESTAMP | REQUIRED
order_sum | INTEGER | NULLABLE
order_reference | STRING | NULLABLE
So, I'd like the key to be order_id
and modified_ts
; with other words, I'd like to be able to track changes of an order over time. If an existing key is inserted again, I'd hope for some error - or just ignoring this new row (regarding it as a duplicate) would work fine for me as well.
Unfortunately, I didn't yet succeed in telling BigQuery to do so. Here's the code I tested:
String rowId = String.valueOf("order_id, modified_ts");
InsertAllRequest req = InsertAllRequest.newBuilder(ORDER)
.addRow(rowId, mapOrder(o, modifiedTs))
.build();
InsertAllResponse resp = bigQuery.insertAll(req);
log.info("response was: {}", resp.toString());
ORDER in newBuilder
is a TableId
-object and mapOrder(o, modifiedTs)
maps the incoming object to a Map<String, Object>
.
All works fine if I define rowId
as String.valueOf("order_id")
but obviously all updates of an order just update the existing row, not generating any history. The solution above with comma-separated column-names behaves the same way, simply ignoring modified_ts
.
So, my question is simply: how can I get this to work? What I want is - somewhat simplified - the following:
order_id | modified_ts | data
------------------------------------------
1 | 2020-12-10 | some data
1 | 2020-12-15 | some changed data
2 | 2020-12-15 | some more data