Product Promotion
0x5a.live
for different kinds of informations and explorations.
GitHub - datanymizer/datanymizer: Powerful database anonymizer with flexible rules. Written in Rust.
Powerful database anonymizer with flexible rules. Written in Rust. - datanymizer/datanymizer
Visit SiteGitHub - datanymizer/datanymizer: Powerful database anonymizer with flexible rules. Written in Rust.
Powerful database anonymizer with flexible rules. Written in Rust. - datanymizer/datanymizer
Powered by 0x5a.live 💗
[Data]nymizer
Powerful database anonymizer with flexible rules. Written in Rust.
Datanymizer is created & supported by Evrone. See what else we develop with Rust.
More information you can find in articles in English and Russian.
How it works
Database -> Dumper (+Faker) -> Dump.sql
You can import or process your dump with supported database without 3rd-party importers.
Datanymizer generates database-native dump.
Installation
There are several ways to install pg_datanymizer
, choose a more convenient option for you.
Pre-compiled binary
# Linux / macOS / Windows (MINGW and etc). Installs it into ./bin/ by default
$ curl -sSfL https://raw.githubusercontent.com/datanymizer/datanymizer/main/cli/pg_datanymizer/install.sh | sh -s
# Or more shorter way
$ curl -sSfL https://git.io/pg_datanymizer | sh -s
# Specify installation directory and version
$ curl -sSfL https://git.io/pg_datanymizer | sudo sh -s -- -b /usr/local/bin v0.2.0
# Alpine Linux (wget)
$ wget -q -O - https://git.io/pg_datanymizer | sh -s
Homebrew / Linuxbrew
# Installs the latest stable release
$ brew install datanymizer/tap/pg_datanymizer
# Builds the latest version from the repository
$ brew install --HEAD datanymizer/tap/pg_datanymizer
Docker
$ docker run --rm -v `pwd`:/app -w /app datanymizer/pg_datanymizer
Getting started with CLI dumper
First, inspect your database schema, choose fields with sensitive data, and create a config file based on it.
# config.yml
tables:
- name: markets
rules:
name_translations:
template:
format: '{"en": "{{_1}}", "ru": "{{_2}}"}'
rules:
- words:
min: 1
max: 2
- words:
min: 1
max: 2
- name: franchisees
rules:
operator_mail:
template:
format: user-{{_1}}-{{_2}}
rules:
- random_num: {}
- email:
kind: Safe
operator_name:
first_name: {}
operator_phone:
phone:
format: +###########
name_translations:
template:
format: '{"en": "{{_1}}", "ru": "{{_2}}"}'
rules:
- words:
min: 2
max: 3
- words:
min: 2
max: 3
- name: users
rules:
first_name:
first_name: {}
last_name:
last_name: {}
- name: customers
rules:
email:
template:
format: user-{{_1}}-{{_2}}
rules:
- random_num: {}
- email:
kind: Safe
uniq:
required: true
try_count: 5
phone:
phone:
format: +7##########
uniq: true
city:
city: {}
age:
random_num:
min: 10
max: 99
first_name:
first_name: {}
last_name:
last_name: {}
birth_date:
datetime:
from: 1990-01-01T00:00:00+00:00
to: 2010-12-31T00:00:00+00:00
And then start to make dump from your database instance:
pg_datanymizer -f /tmp/dump.sql -c ./config.yml postgres://postgres:postgres@localhost/test_database
It creates new dump file /tmp/dump.sql
with native SQL dump for Postgresql database.
You can import fake data from this dump into new Postgresql database with command:
psql -U postgres -d new_database < /tmp/dump.sql
Dumper can stream dump to STDOUT
like pg_dump
and you can use it in other pipelines:
pg_datanymizer -c ./config.yml postgres://postgres:postgres@localhost/test_database > /tmp/dump.sql
Additional options
Tables filter
You can specify which tables you choose or ignore for making dump.
For dumping only public.markets
and public.users
data.
# config.yml
#...
filter:
only:
- public.markets
- public.users
For ignoring those tables and dump data from others.
# config.yml
#...
filter:
except:
- public.markets
- public.users
You can also specify data and schema filters separately.
This is equivalent to the previous example.
# config.yml
#...
filter:
data:
except:
- public.markets
- public.users
For skipping schema and data from other tables.
# config.yml
#...
filter:
schema:
only:
- public.markets
- public.users
For skipping schema for markets
table and dumping data only from users
table.
# config.yml
#...
filter:
data:
only:
- public.users
schema:
except:
- public.markets
You can use wildcards in the filter
section:
?
matches exactly one occurrence of any character;*
matches arbitrary many (including zero) occurrences of any character.
Dump conditions and limit
You can specify conditions (SQL WHERE
statement) and limit for dumped data per table:
# config.yml
tables:
- name: people
query:
# don't dump some rows
dump_condition: "last_name <> 'Sensitive'"
# select maximum 100 rows
limit: 100
Transform conditions and limit
As the additional option, you can specify SQL conditions that define which rows will be transformed (anonymized):
# config.yml
tables:
- name: people
query:
# don't dump some rows
dump_condition: "last_name <> 'Sensitive'"
# preserve original values for some rows
transform_condition: "NOT (first_name = 'John' AND last_name = 'Doe')"
# select maximum 100 rows
limit: 100
You can use the dump_condition
, transform_condition
and limit
options in any combination (only
transform_condition
; transform_condition
and limit
; etc).
Global variables
You can specify global variables available from any template
rule.
# config.yml
tables:
users:
bio:
template:
format: "User bio is {{var_a}}"
age:
template:
format: {{_0 | float * global_multiplicator}}
#...
globals:
var_a: Global variable 1
global_multiplicator: 6
Available rules
Rule | Description |
---|---|
email |
Emails with different options |
ip |
IP addresses. Supports IPv4 and IPv6 |
words |
Lorem words with different length |
first_name |
First name generator |
last_name |
Last name generator |
city |
City names generator |
phone |
Generate random phone with different format |
pipeline |
Use pipeline to generate more complicated values |
capitalize |
Like filter, it capitalizes input value |
template |
Template engine for generate random text with included rules |
digit |
Random digit (in range 0..9 ) |
random_num |
Random number with min and max options |
password |
Password with different length options (support max and min options) |
datetime |
Make DateTime strings with options (from and to ) |
more than 70 rules in total... |
For the complete list of rules please refer this document.
Uniqueness
You can specify that result values must be unique (they are not unique by default). You can use short or full syntax.
Short:
uniq: true
Full:
uniq:
required: true
try_count: 5
Uniqueness is ensured by re-generating values when they are same.
You can customize the number of attempts with try_count
(this is an optional field, the default number of tries
depends on the rule).
Currently, uniqueness is supported by: email
, ip
, phone
, random_num
.
Locales
You can specify the locale for individual rules:
first_name:
locale: RU
The default locale is EN
but you can specify a different default locale:
tables:
# ........
default:
locale: RU
We also support ZH_TW
(traditional chinese) and RU
(translation in progress).
Referencing row values from templates
You can reference values of other row fields in templates.
Use prev
for original values and final
- for anonymized:
tables:
- name: some_table
# You must specify the order of rule execution when using `final`
rule_order:
- greeting
- options
rules:
first_name:
first_name: {}
greeting:
template:
# Keeping the first name, but anonymizing the last name
format: "Hello, {{ prev.first_name }} {{ final.last_name }}!"
options:
template:
# Using the anonymized value again
format: "{greeting: \"{{ final.greeting }}\"}"
You must specify the order of rule execution when using final
with rule_order
.
All rules not listed will be placed at the beginning (i.e. you must list only rules with final
).
Sharing information between rows
We implemented a built-in key-value store that allows information to be exchanged between anonymized rows.
It is available via the special functions in templates.
Take a look at an example:
tables:
- name: users
rules:
name:
template:
# Save a name to the store as a side effect, the key is `user_names.<USER_ID>`
format: "{{ _1 }}{{ store_write(key='user_names.' ~ prev.id, value=_1) }}"
rules:
- person_name: {}
- name: user_operations
rules:
user_name:
template:
# Using the saved value again
format: "{{ store_read(key='user_names.' ~ prev.user_id) }}"
Supported databases
- Postgresql
- MySQL or MariaDB (TODO)
Documentation
- pg_datanymizer CLI application manual.
- config.yml file specification.
- Full list of transformation rules.
- Integration testing manual.
Sponsors
License
Development
Cross compilation
Mac to Linux
rustup target add x86_64-unknown-linux-gnu
brew tap messense/macos-cross-toolchains
brew install x86_64-unknown-linux-gnu
CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-linux-gnu-gcc cargo build --target x86_64-unknown-linux-gnu --release --features openssl/vendored
Rust Resources
are all listed below.
Made with ❤️
to provide different kinds of informations and resources.