# node-unicode-data [![Build status](https://travis-ci.org/mathiasbynens/node-unicode-data.svg?branch=master)](https://travis-ci.org/mathiasbynens/node-unicode-data)

JavaScript-compatible Unicode data generator. Arrays of code points, arrays of symbols, and regular expressions for every Unicode version’s categories, scripts, script extensions, blocks, bidi data, and other properties — neatly packaged into a separate npm package per Unicode version.

## Using the data in your scripts

To use the generated data, simply install one of [the npm modules generated by this script](https://npmjs.org/browse/keyword/unicode-data). Separate packages are available for each Unicode version. This allows you to do stuff like:

```js
// Get an array of all code points with the `White_Space` property:
const codePoints = require('unicode-6.3.0/Binary_Property/White_Space/code-points');
// Get an array of strings (containing one symbol each) in the `Lu` category:
const symbols = require('unicode-6.3.0/General_Category/Uppercase_Letter/symbols');
// Get a regular expression that matches any symbol in the `Aegean Numbers` block:
const regex = require('unicode-6.3.0/Block/Aegean_Numbers/regex');
// Get an array of all code points in the `Egyptian_Hieroglyphs` script:
const hieroglyphs = require('unicode-6.3.0/Script/Egyptian_Hieroglyphs/code-points');
// Get the canonical category a given code point belongs to:
// (Note: U+0041 is LATIN CAPITAL LETTER A)
const category = require('unicode-6.3.0/General_Category').get(0x41);
// Get an array of all code points with a given bidi class:
const lre = require('unicode-6.3.0/Bidi_Class/Left_To_Right_Embedding/code-points');
// Get the directionality of a given code point:
const directionality = require('unicode-6.3.0/Bidi_Class').get(0x41);
// What glyph is the mirror image of `«` (U+00AB)?
const mirrored = require('unicode-6.3.0/Bidi_Mirroring_Glyph').get(0xAB);
// Get a regular expression that matches all opening brackets:
const openingBrackets = require('unicode-6.3.0/Bidi_Paired_Bracket_Type/Open/regex');
// …you get the idea.
```

For more information, see the README for the package you’re interested in. [Here’s the full list of npm packages generated by this script](https://npmjs.org/browse/keyword/unicode-data):

* [_unicode-1.1.5_](https://npmjs.org/package/unicode-1.1.5#readme) ([repository](https://github.com/mathiasbynens/unicode-1.1.5#readme))
* [_unicode-2.0.14_](https://npmjs.org/package/unicode-2.0.14#readme) ([repository](https://github.com/mathiasbynens/unicode-2.0.14#readme))
* [_unicode-2.1.2_](https://npmjs.org/package/unicode-2.1.2#readme) ([repository](https://github.com/mathiasbynens/unicode-2.1.2#readme))
* [_unicode-2.1.5_](https://npmjs.org/package/unicode-2.1.5#readme) ([repository](https://github.com/mathiasbynens/unicode-2.1.5#readme))
* [_unicode-2.1.8_](https://npmjs.org/package/unicode-2.1.8#readme) ([repository](https://github.com/mathiasbynens/unicode-2.1.8#readme))
* [_unicode-2.1.9_](https://npmjs.org/package/unicode-2.1.9#readme) ([repository](https://github.com/mathiasbynens/unicode-2.1.9#readme))
* [_unicode-3.0.0_](https://npmjs.org/package/unicode-3.0.0#readme) ([repository](https://github.com/mathiasbynens/unicode-3.0.0#readme))
* [_unicode-3.0.1_](https://npmjs.org/package/unicode-3.0.1#readme) ([repository](https://github.com/mathiasbynens/unicode-3.0.1#readme))
* [_unicode-3.1.0_](https://npmjs.org/package/unicode-3.1.0#readme) ([repository](https://github.com/mathiasbynens/unicode-3.1.0#readme))
* [_unicode-3.2.0_](https://npmjs.org/package/unicode-3.2.0#readme) ([repository](https://github.com/mathiasbynens/unicode-3.2.0#readme))
* [_unicode-4.0.0_](https://npmjs.org/package/unicode-4.0.0#readme) ([repository](https://github.com/mathiasbynens/unicode-4.0.0#readme))
* [_unicode-4.0.1_](https://npmjs.org/package/unicode-4.0.1#readme) ([repository](https://github.com/mathiasbynens/unicode-4.0.1#readme))
* [_unicode-4.1.0_](https://npmjs.org/package/unicode-4.1.0#readme) ([repository](https://github.com/mathiasbynens/unicode-4.1.0#readme))
* [_unicode-5.0.0_](https://npmjs.org/package/unicode-5.0.0#readme) ([repository](https://github.com/mathiasbynens/unicode-5.0.0#readme))
* [_unicode-5.1.0_](https://npmjs.org/package/unicode-5.1.0#readme) ([repository](https://github.com/mathiasbynens/unicode-5.1.0#readme))
* [_unicode-5.2.0_](https://npmjs.org/package/unicode-5.2.0#readme) ([repository](https://github.com/mathiasbynens/unicode-5.2.0#readme))
* [_unicode-6.0.0_](https://npmjs.org/package/unicode-6.0.0#readme) ([repository](https://github.com/mathiasbynens/unicode-6.0.0#readme))
* [_unicode-6.1.0_](https://npmjs.org/package/unicode-6.1.0#readme) ([repository](https://github.com/mathiasbynens/unicode-6.1.0#readme))
* [_unicode-6.2.0_](https://npmjs.org/package/unicode-6.2.0#readme) ([repository](https://github.com/mathiasbynens/unicode-6.2.0#readme))
* [_unicode-6.3.0_](https://npmjs.org/package/unicode-6.3.0#readme) ([repository](https://github.com/mathiasbynens/unicode-6.3.0#readme))
* [_unicode-7.0.0_](https://npmjs.org/package/unicode-7.0.0#readme) ([repository](https://github.com/mathiasbynens/unicode-7.0.0#readme))
* [_unicode-8.0.0_](https://npmjs.org/package/unicode-8.0.0#readme) ([repository](https://github.com/mathiasbynens/unicode-8.0.0#readme))
* [_unicode-9.0.0_](https://npmjs.org/package/unicode-9.0.0#readme) ([repository](https://github.com/mathiasbynens/unicode-9.0.0#readme))
* [_unicode-10.0.0_](https://npmjs.org/package/unicode-10.0.0#readme) ([repository](https://github.com/mathiasbynens/unicode-10.0.0#readme))
* [_unicode-11.0.0_](https://npmjs.org/package/unicode-11.0.0#readme) ([repository](https://github.com/mathiasbynens/unicode-11.0.0#readme))
* [_unicode-12.0.0_](https://npmjs.org/package/unicode-12.0.0#readme) ([repository](https://github.com/mathiasbynens/unicode-12.0.0#readme))

Note that these READMEs are auto-generated by this script, too – they describe all the data that is available for that particular Unicode version. To programmatically get this list of available categories, scripts, script extensions, blocks, and properties for a given Unicode version, just `require` the main module for that version:

```js
> require('unicode-6.3.0');
{
	'Binary_Property': [
		'Alphabetic', 'Any', 'ASCII', 'ASCII_Hex_Digit', 'Assigned', …
	],
	'General_Category': [
		'Cased_Letter','Close_Punctuation','Connector_Punctuation', …
	],
	'Script': [
		'Arabic', 'Armenian', 'Avestan', …
	],
	'Script_Extensions': [
		'Arabic', 'Armenian', 'Avestan', …
	],
	'Block': [
		'Aegean Numbers', 'Alchemical Symbols', …
	],
	'Case_Folding': [
		'C', 'F', 'S', 'T'
	],
	'Bidi_Class': [
		'Arabic_Letter', 'Arabic_Number', 'Boundary_Neutral', …
	],
	'Bidi_Mirroring_Glyph': [],
	'Bidi_Paired_Bracket_Type': [
		'Close', 'None', 'Open'
	]
}
```

## Generating the data

`npm run-script download` (re-)downloads the Unicode source files for all the Unicode versions defined in `data/resources.js`, saving them in the `data` folder.

`npm run-script build` generates data for categories, scripts, blocks, and properties for all the Unicode versions defined in `data/resources.js`. This may take a few minutes… In total, roughly 1.5 GB of data is generated. The regular expressions are generated using [Regenerate](https://mths.be/regenerate).

## Testing

`npm test` generates the data for the oldest and latest available Unicode version. This is a good way to test changes to the generator scripts before running `npm run-script generate`.

`npm run-script cover` generates [the code coverage report](http://rawgithub.com/mathiasbynens/node-unicode-data/master/coverage/index.html).

## Author

| [![twitter/mathias](https://gravatar.com/avatar/24e08a9ea84deb17ae121074d0f17125?s=70)](https://twitter.com/mathias "Follow @mathias on Twitter") |
|---|
| [Mathias Bynens](https://mathiasbynens.be/) |

## License

This module is available under the [MIT](https://mths.be/mit) license.
