use unicode

Constants

Name Value Description
A 65 Letter: Uppercase A.
a 97 Letter: Lowercase a.
ampersand 38 Punctuation: Ampersand (&).
asterisk 42 Punctuation: Asterisk (*).
at 64 Punctuation: At sign (@).
B 66 Letter: Uppercase B.
b 98 Letter: Lowercase b.
backslash 92 Punctuation: Backslash ().
backtick 96 Punctuation: Backtick (`).
C 67 Letter: Uppercase C.
c 99 Letter: Lowercase c.
caret 94 Punctuation: Caret (^).
colon 58 Punctuation: Colon (:).
comma 44 Punctuation: Comma (,).
cr 13 Control: Carriage return.
d 100 Letter: Lowercase d.
D 68 Letter: Uppercase D.
digit0 48 Digit = 0.
digit1 49 Digit = 1.
digit2 50 Digit = 2.
digit3 51 Digit = 3.
digit4 52 Digit = 4.
digit5 53 Digit = 5.
digit6 54 Digit = 6.
digit7 55 Digit = 7.
digit8 56 Digit = 8.
digit9 57 Digit = 9.
dollar 36 Punctuation: Dollar sign ($).
dot 46 Punctuation: Period/dot (.).
dquote 34 Punctuation: Double quote (").
e 101 Letter: Lowercase e.
E 69 Letter: Uppercase E.
equals 61 Punctuation: Equals (=).
esc 27 Control: Escape character.
exclaim 33 Punctuation: Exclamation mark (!).
f 102 Letter: Lowercase f.
F 70 Letter: Uppercase F.
g 103 Letter: Lowercase g.
G 71 Letter: Uppercase G.
gt 62 Punctuation: Greater than (>).
h 104 Letter: Lowercase h.
H 72 Letter: Uppercase H.
hash 35 Punctuation: Hash (#).
i 105 Letter: Lowercase i.
I 73 Letter: Uppercase I.
j 106 Letter: Lowercase j.
J 74 Letter: Uppercase J.
k 107 Letter: Lowercase k.
K 75 Letter: Uppercase K.
l 108 Letter: Lowercase l.
L 76 Letter: Uppercase L.
lbrace 123 Punctuation: Left brace ({).
lbracket 91 Punctuation: Left bracket ([).
lparen 40 Punctuation: Left parenthesis.
lt 60 Punctuation: Less than (<).
m 109 Letter: Lowercase m.
M 77 Letter: Uppercase M.
minus 45 Punctuation: Minus/hyphen (-).
n 110 Letter: Lowercase n.
N 78 Letter: Uppercase N.
newline 10 Control: Newline (line feed).
nul 0 Unicode character constants and classification. Control: Null character.
o 111 Letter: Lowercase o.
O 79 Letter: Uppercase O.
p 112 Letter: Lowercase p.
P 80 Letter: Uppercase P.
percent 37 Punctuation: Percent (%).
pipe 124 Punctuation: Pipe (
plus 43 Punctuation: Plus sign (+).
q 113 Letter: Lowercase q.
Q 81 Letter: Uppercase Q.
question 63 Punctuation: Question mark (?).
r 114 Letter: Lowercase r.
R 82 Letter: Uppercase R.
rbrace 125 Punctuation: Right brace (}).
rbracket 93 Punctuation: Right bracket (]).
rparen 41 Punctuation: Right parenthesis.
s 115 Letter: Lowercase s.
S 83 Letter: Uppercase S.
semicolon 59 Punctuation: Semicolon (;).
slash 47 Punctuation: Forward slash (/).
space 32 Punctuation: Space.
squote 39 Punctuation: Single quote (').
t 116 Letter: Lowercase t.
T 84 Letter: Uppercase T.
tab 9 Control: Tab character.
tilde 126 Punctuation: Tilde (~).
u 117 Letter: Lowercase u.
U 85 Letter: Uppercase U.
underscore 95 Punctuation: Underscore (_).
v 118 Letter: Lowercase v.
V 86 Letter: Uppercase V.
w 119 Letter: Lowercase w.
W 87 Letter: Uppercase W.
x 120 Letter: Lowercase x.
X 88 Letter: Uppercase X.
y 121 Letter: Lowercase y.
Y 89 Letter: Uppercase Y.
z 122 Letter: Lowercase z.
Z 90 Letter: Uppercase Z.

Functions

fn digit_value

Convert digit character to its numeric value.

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code (must be a digit)
Output Type Description
result i64 Numeric value (0-9), or -1 if not a digit

Example:

53 unicode::digit_value print  // 5

fn hex_digit_value

Convert hex digit character to its numeric value.

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code (must be a hex digit)
Output Type Description
result i64 Numeric value (0-15), or -1 if not a hex digit

Example:

65 unicode::hex_digit_value print  // 10 (A)

fn is_alnum

Check if character is alphanumeric (A-Z, a-z, or 0-9).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if alphanumeric, 0 otherwise

Example:

65 unicode::is_alnum print  // 1

fn is_alpha

Check if character is an alphabetic letter (A-Z or a-z).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if alphabetic, 0 otherwise

Example:

65 unicode::is_alpha print  // 1

fn is_ascii

Check if character is ASCII (0-127).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if ASCII, 0 otherwise

Example:

65 unicode::is_ascii print  // 1

fn is_control

Check if character is a control character (ASCII 0-31 and 127).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if control character, 0 otherwise

Example:

10 unicode::is_control print  // 1 (newline)

fn is_digit

Check if character is a digit (0-9).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if digit, 0 otherwise

Example:

48 unicode::is_digit print  // 1

fn is_hex_digit

Check if character is a hexadecimal digit (0-9, A-F, a-f).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if hex digit, 0 otherwise

Example:

65 unicode::is_hex_digit print  // 1 (A)

fn is_ident_cont

Check if character is a valid identifier continuation (letter, digit, or underscore).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if valid identifier continuation, 0 otherwise

Example:

95 unicode::is_ident_cont print  // 1 (_)

fn is_ident_start

Check if character is a valid identifier start (letter or underscore).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if valid identifier start, 0 otherwise

Example:

95 unicode::is_ident_start print  // 1 (_)

fn is_lower

Check if character is a lowercase letter (a-z).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if lowercase, 0 otherwise

Example:

97 unicode::is_lower print  // 1

fn is_print

Check if character is printable (space through tilde, ASCII 32-126).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if printable, 0 otherwise

Example:

65 unicode::is_print print  // 1

fn is_punct

Check if character is punctuation (printable but not alphanumeric or space).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if punctuation, 0 otherwise

Example:

33 unicode::is_punct print  // 1 (!)

fn is_space

Check if character is whitespace (space, tab, newline, carriage return).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if whitespace, 0 otherwise

Example:

32 unicode::is_space print  // 1

fn is_upper

Check if character is an uppercase letter (A-Z).

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 1 if uppercase, 0 otherwise

Example:

65 unicode::is_upper print  // 1

fn is_utf8_cont

Check if byte is a UTF-8 continuation byte (10xxxxxx).

Signature: (b:i64 -- result:i64)

Parameter Type Description
b i64 Byte value
Output Type Description
result i64 1 if continuation byte, 0 otherwise

Example:

0x80 unicode::is_utf8_cont print  // 1

fn is_utf8_start

Check if byte is the start of a UTF-8 sequence.

Signature: (b:i64 -- result:i64)

Parameter Type Description
b i64 Byte value
Output Type Description
result i64 1 if start byte, 0 otherwise

Example:

0xC2 unicode::is_utf8_start print  // 1

fn is_valid_codepoint

Check if a codepoint is valid Unicode.

Signature: (cp:i64 -- valid:i64)

Parameter Type Description
cp i64 Codepoint to check
Output Type Description
valid i64 1 if valid, 0 otherwise

Example:

0x1F600 unicode::is_valid_codepoint print  // 1 (😀)

fn to_lower

Convert uppercase letter to lowercase.

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 Lowercase character code (unchanged if not uppercase)

Example:

65 unicode::to_lower print  // 97

fn to_upper

Convert lowercase letter to uppercase.

Signature: (c:i64 -- result:i64)

Parameter Type Description
c i64 Character code
Output Type Description
result i64 Uppercase character code (unchanged if not lowercase)

Example:

97 unicode::to_upper print  // 65

fn utf8_encode_len

Get the number of UTF-8 bytes needed to encode a codepoint.

Signature: (cp:i64 -- len:i64)

Parameter Type Description
cp i64 Unicode codepoint
Output Type Description
len i64 Number of bytes needed (1-4), or 0 if invalid

Example:

0x00E9 unicode::utf8_encode_len print  // 2 (é)

fn utf8_seq_len

Get the number of bytes in a UTF-8 sequence based on the first byte. Returns 1 for ASCII, 2-4 for multi-byte sequences, 0 for invalid.

Signature: (b:i64 -- len:i64)

Parameter Type Description
b i64 First byte of UTF-8 sequence
Output Type Description
len i64 Number of bytes (1-4), or 0 if invalid

Example:

0xC2 unicode::utf8_seq_len print  // 2