dedup_table - BigQuery

Function

dedup_table(table_name, timestamp_column, unique_column, output_table_suffix)

Description

Dedup Table - Creates a deduplicated version of the table <table_name>, retaining the latest row based on the <timestamp_column>, and outputs it with the suffix <output_table_suffix>. Arguments <timestamp_column>, <unique_column>, <output_table_suffix> are optional.

Example Query

CALL `justfunctions.eu.dedup_table`("your_project.your_dataset.your_table","created_at","user_id","_dedup")
                                            
/*--Output--
your_project.your_dataset.your_table.your_table_dedup
*/

Statement

CREATE OR REPLACE PROCEDURE `your_project_id.your_dataset_id.dedup_table`(`table_name` string, `timestamp_column` string, `unique_column` string, `output_table_suffix` string)
options(
    description = '''Creates a deduplicated version of the table <table_name>, retaining the latest row based on the <timestamp_column>, and outputs it with the suffix <output_table_suffix>. Arguments <timestamp_column>, <unique_column>, <output_table_suffix> are optional.'''
)
BEGIN

DECLARE sql STRING;
DECLARE final_table_name STRING;

SET final_table_name = CONCAT(table_name, IF(output_table_suffix != '', output_table_suffix, '_dedup'));

IF unique_column = '' AND timestamp_column = '' THEN
  SET sql = FORMAT("""
    CREATE OR REPLACE TABLE %s AS
    SELECT DISTINCT *
    FROM %s;
  """, final_table_name, table_name);
ELSEIF unique_column != '' AND timestamp_column = '' THEN
  SET sql = FORMAT("""
    CREATE OR REPLACE TABLE %s AS
    SELECT * EXCEPT(row_num)
    FROM (
      SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY %s) AS row_num
      FROM
        %s
    )
    WHERE row_num = 1;
  """, final_table_name, unique_column, table_name);
ELSE
  SET sql = FORMAT("""
    CREATE OR REPLACE TABLE %s AS
    SELECT * EXCEPT(row_num)
    FROM (
      SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY %s ORDER BY %s DESC) AS row_num
      FROM
        %s
    )
    WHERE row_num = 1;
  """, final_table_name, unique_column, timestamp_column, table_name);
END IF;

EXECUTE IMMEDIATE sql;


END;

Regions

justfunctions.eu.dedup_table(table_name, timestamp_column, unique_column, output_table_suffix),
justfunctions.us.dedup_table(table_name, timestamp_column, unique_column, output_table_suffix)

Type

User Defined SQL Procedure

How to Use

Frequently Asked Questions

User-Defined Functions (UDFs) in Google BigQuery are custom functions that you can create to perform operations that aren't available through the standard SQL functions. These UDFs allow you to extend BigQuery's SQL capabilities to suit your specific data processing needs. JustFunctions is a collection of open-source user-defined functions (UDFs).

JustFunctions is a collection of open-source User-Defined Functions (UDFs) designed to extend the capabilities of Google BigQuery. These functions cover a wide range of applications, including text manipulation, URL processing, date processing, email handling, similarity measures, and more. Moreover, JustFunctions is frequently updated to include more use cases.

We welcome any feedback or questions you may have. You can Contact us or report an issue on Github.

Functions and procedures from JustFunctions can be used directly in any of your projects.
To start, simply click 👆 on any function,  Copy  the 'Example Query' and run it in your BigQuery console.
You can also  Copy  the 'Statement' to create your own private user-defined function.

Yes, JustFunctions is completely free to use.

Yes, currently JustFunctions is only available for Google BigQuery. In the future, we will also support PostgreSQL.

See something wrong? Contact us or report an issue on Github.