fuzzy_nysiis - BigQuery

Function

fuzzy_nysiis(string)

Description

Fuzzy Nysiis - Calculates NYSIIS code for <string>.

Example Query


SELECT `justfunctions.eu.fuzzy_nysiis`("python")
                                            
/*--Output--
pytan
*/

Statement

CREATE OR REPLACE FUNCTION `your_project_id.your_dataset_id.fuzzy_nysiis`(`string` STRING) 
  RETURNS STRING AS (((
select
  string_agg(c, "" order by off asc)
from (select
        c,
        off,
        coalesce(lag(c) over (order by off) = c, false) as same_as_prev_char
      from unnest(split((select
              regexp_replace(regexp_replace(regexp_replace(string_agg(new_c, "" order by off), r"s+$", ""), "ay$", "y"), "a+$", "") as step_7
            from (select
                    c,
                    if(off = 0, c, 
                    case 
                      when off != 0 and c = 'e' and next_c = 'v' then 'a'
                      when off != 1 and c = 'v' and prev_c = 'e' then 'f'

                      when c in ('a', 'e', 'i', 'o', 'u') then 'a'

                      when c = 'q' then 'g'
                      when c = 'z' then 's'
                      when c = 'm' then 'n'

                      when c = 'k' and next_c = 'n' then 'n'
                      when off != 1 and prev_c = 'k' and c = 'n' then null
                      when c = 'k' then 'c'

                      when c = 's' and next_c = 'c' and next2_c = 'h' then 's'
                      when off != 1 and prev_c = 's' and c = 'c' and next_c = 'h' then 's'
                      when off != 2 and prev2_c = 's' and prev_c = 'c' and c = 'h' then 's'

                      when c = 'p' and next_c = 'h' then 'f'
                      when off != 1 and prev_c = 'p' and c = 'h' then 'f'

                      when off != 1 and (c = 'h' and (prev_c not in ('a', 'e', 'i', 'o', 'u') or next_c not in ('a', 'e', 'i', 'o', 'u'))) then prev_c
                      when off != 1 and c = 'w' and prev_c in ('a', 'e', 'i', 'o', 'u') then prev_c
                      else c
                    end) as new_c,
                    off
                  from (select
                          c,
                          off,
                          lag(c) over (order by off) as prev_c,
                          lag(c, 2) over (order by off) as prev2_c,
                          lead(c) over (order by off) as next_c,
                          lead(c, 2) over (order by off) as next2_c
                        from unnest(split(regexp_replace(regexp_replace(regexp_replace(
                                          regexp_replace(regexp_replace(regexp_replace(
                                          regexp_replace(lower(string), "^mac", "mcc"),
                                                          "^kn", "nn"),
                                                          "^k", "c"),
                                                          "^p(h|f)", "ff"),
                                                          "^sch", "sss"),
                                                          "(e|i)e$", "y"),
                                                          "(dt|rt|rd|nt|nd)$", "d")
                                      , "")) as c with offset off -- starts at 0
                        )
                  )
            where new_c is not null
            ), "")) as c with offset off
      )
where ((not same_as_prev_char) or (same_as_prev_char and off = 1))
)))
  OPTIONS ( description = '''Calculates NYSIIS code for <string>.''')

Regions

justfunctions.eu.fuzzy_nysiis(string),
justfunctions.us.fuzzy_nysiis(string)

Source

Type

SQL User Defined Function (SQL UDF)

Related Functions

How to Use

Frequently Asked Questions

User-Defined Functions (UDFs) in Google BigQuery are custom functions that you can create to perform operations that aren't available through the standard SQL functions. These UDFs allow you to extend BigQuery's SQL capabilities to suit your specific data processing needs. JustFunctions is a collection of open-source user-defined functions (UDFs).

JustFunctions is a collection of open-source User-Defined Functions (UDFs) designed to extend the capabilities of Google BigQuery. These functions cover a wide range of applications, including text manipulation, URL processing, date processing, email handling, similarity measures, and more. Moreover, JustFunctions is frequently updated to include more use cases.

We welcome any feedback or questions you may have. You can Contact us or report an issue on Github.

Functions and procedures from JustFunctions can be used directly in any of your projects.
To start, simply click 👆 on any function,  Copy  the 'Example Query' and run it in your BigQuery console.
You can also  Copy  the 'Statement' to create your own private user-defined function.

Yes, JustFunctions is completely free to use.

Yes, currently JustFunctions is only available for Google BigQuery. In the future, we will also support PostgreSQL.

See something wrong? Contact us or report an issue on Github.