Useful Data Tools
These are utilities that are useful for data format handling.
| Tool | Purpose | | ---------- | -------------------- | ----------------------------------------------------------------- | | jq | JSON manipulation | | Ebay Utils | CSV/TSV manipulation | | Re2 | Regex testing | | Regexper | Regex pattern visualization |
Go DateTime Tools⚓︎
Hydrolix uses Go's date parsing for ingesting datetimes into its system. These are useful in checking this datetime format.
| Resource | Link |
|---|---|
| GoTime Docs | https://golang.org/pkg/time/ |
| Go Playground | https://play.golang.org/ |
Use the Go PlayGround to check parsing of datetime formats. Use this to check how datetime format is being correctly parsed if you are having trouble with it.
In the Go Playground box add the following:
Set layout to a timestamp format. Once clicking Run, the tool will tell you if it parses or not.
Catch-All Transform⚓︎
Sometimes you won't know the format of your incoming data until you actually see it streaming into your Hydrolix cluster. This makes it very difficult to write a transform for that data.
One way around this problem is to dump incoming data into a table for inspection. Then you can write your transform based on what you find. Assuming the data you're receiving is in JSON format, here's a transform that will do that for you, applying a timestamp to each line:
Another, more complex catch-all transform is available on the Test and Validate Transforms page.
Anonymizing Datasets⚓︎
The following technique can be used to anonymize customer data while maintaining data "shape," preserving cardinality and length of columns. This could be used to obfuscate PII (personally identifiable information).
In the example below, we anonymize two fields: name and age. We use the SHA256 function to shift all ints and strings to different characters in the same way.
SHA256 is a good step towards anonymization, but not perfect. Feel free to experiment with other ClickHouse hash functions, but keep an eye on compute and memory overhead.