regex

Backtracking regex engine written in pure Jda. No external dependencies.

Supported Syntax

PatternDescription
.Any character
*Zero or more (greedy)
+One or more (greedy)
?Zero or one
[abc]Character class
[a-z]Character range
[^abc]Negated character class
^Start-of-string anchor
$End-of-string anchor
\dDigit [0-9]
\wWord character [a-zA-Z0-9_]
\sWhitespace [ \t\n\r]
\.Escaped literal dot

Usage

import regex

fn main() {
    // Full match: does the entire string match?
    print(regex_match("hello", 5, "hello", 5))   // 1
    print(regex_match("hel+o", 5, "hello", 5))   // 1

    // Search: find first match position
    let pos = regex_search("\\d+", 3, "abc123def", 9)
    print(pos)   // 3

    // Count: non-overlapping matches
    let n = regex_count("[a-z]+", 6, "foo bar baz", 11)
    print(n)     // 3
}

Function Reference

FunctionSignatureDescription
regex_match(pat: &i8, pl: i64, text: &i8, tl: i64) -> i64Full match (entire text)
regex_search(pat: &i8, pl: i64, text: &i8, tl: i64) -> i64Find first match position
regex_count(pat: &i8, pl: i64, text: &i8, tl: i64) -> i64Count non-overlapping matches
re_match_at(pat: &i8, pl: i64, text: &i8, tp: i64, tl: i64) -> i64Match at specific position

Detailed API

regex_match

fn regex_match(pat: &i8, pl: i64, text: &i8, tl: i64) -> i64

Test whether the pattern matches the entire text string. The match must consume all bytes.

// Exact match
print(regex_match("abc", 3, "abc", 3))       // 1
print(regex_match("abc", 3, "abcd", 4))      // 0

// With quantifiers
print(regex_match("a.*c", 4, "abbc", 4))     // 1

// Character classes
print(regex_match("[0-9]+", 6, "42", 2))      // 1
print(regex_match("[0-9]+", 6, "abc", 3))     // 0

Parameters:

  • pat – pattern byte buffer
  • pl – pattern length
  • text – text byte buffer
  • tl – text length

Returns: 1 if the pattern matches the entire text, 0 otherwise.

fn regex_search(pat: &i8, pl: i64, text: &i8, tl: i64) -> i64

Search for the first occurrence of the pattern anywhere in the text. If the pattern starts with ^, it only tries matching at position 0.

// Find digits in a string
print(regex_search("\\d+", 3, "item42price", 11))   // 4

// Anchored search
print(regex_search("^abc", 4, "abc123", 6))          // 0
print(regex_search("^abc", 4, "xabc", 4))            // -1

// Not found
print(regex_search("xyz", 3, "hello", 5))            // -1

Returns: Index of the first match position, or -1 if no match found.

regex_count

fn regex_count(pat: &i8, pl: i64, text: &i8, tl: i64) -> i64

Count all non-overlapping matches of the pattern in the text. After each match, scanning advances by the match length (or by 1 if the match is zero-length).

// Count words
print(regex_count("[a-z]+", 6, "one two three", 13))   // 3

// Count digits
print(regex_count("\\d", 2, "a1b2c3", 6))              // 3

// Count specific pattern
print(regex_count("ab", 2, "ababab", 6))                // 3

Returns: Number of non-overlapping matches.

re_match_at

fn re_match_at(pat: &i8, pl: i64, text: &i8, tp: i64, tl: i64) -> i64

Try to match the pattern at a specific position tp in the text. This is the low-level matching function used by regex_match, regex_search, and regex_count.

// Match "abc" starting at position 3
let mlen = re_match_at("abc", 3, "xxxabc", 3, 6)
print(mlen)   // 3 (matched 3 bytes)

// No match at this position
let mlen2 = re_match_at("abc", 3, "xxxabc", 0, 6)
print(mlen2)  // -1 (no match)

Returns: Number of bytes matched, or -1 if no match at that position.


re_match_escape

fn re_match_escape(pp: i64, ch: i64) -> i64

Internal: Check if character ch matches an escape sequence at pattern position pp. Supports \d (digit), \w (word), and \s (whitespace).


re_match_class

fn re_match_class(pp: i64, ch: i64) -> i64

Internal: Check if character ch matches the character class [...] starting at pattern position pp. Handles ranges [a-z] and negation [^...].


re_skip_class

fn re_skip_class(pp: i64) -> i64

Internal: Skip past a character class [...] in the pattern string. Returns the new pattern position.


re_skip_elem

fn re_skip_elem(pp: i64) -> i64

Internal: Skip one pattern element (literal, dot, or class) and its following quantifier (*, +, ?) if present.


re_char_match

fn re_char_match(pp: i64, ch: i64) -> i64

Internal: Match a single pattern element at pp (literal, dot, or class) against a single text character ch.


re_match_star

fn re_match_star(ep: i64, rp: i64, tp: i64) -> i64

Internal: Greedy matcher for the * quantifier. Matches as many occurrences of the element at ep as possible, then backtracks to match the rest of the pattern at rp.


re_match_plus

fn re_match_plus(ep: i64, rp: i64, tp: i64) -> i64

Internal: Matcher for the + quantifier. Requires at least one match of the element at ep, then behaves like *.


re_match_quest

fn re_match_quest(ep: i64, rp: i64, tp: i64) -> i64

Internal: Matcher for the ? quantifier. Tries matching the element at ep once, and if that fails (or if the rest of the pattern fails), tries matching zero occurrences.


re_match_here

fn re_match_here(pp: i64, tp: i64) -> i64

Internal: The core recursive backtracking matcher. Tries to match the pattern starting at pp against the text starting at tp.


Internal Functions

FunctionSignatureDescription
re_match_escape(pp: i64, ch: i64) -> i64Check \d, \w, \s escapes
re_match_class(pp: i64, ch: i64) -> i64Check character class membership
re_skip_class(pp: i64) -> i64Skip past [...] in pattern
re_skip_elem(pp: i64) -> i64Skip one pattern element
re_char_match(pp: i64, ch: i64) -> i64Match one pattern element against one char
re_match_star(ep: i64, rp: i64, tp: i64) -> i64Greedy * quantifier
re_match_plus(ep: i64, rp: i64, tp: i64) -> i64+ quantifier (one or more)
re_match_quest(ep: i64, rp: i64, tp: i64) -> i64? quantifier (zero or one)
re_match_here(pp: i64, tp: i64) -> i64Core recursive matcher