Validating Strings as Hiragana Using Regex in PHP
This article explains how to use regular expressions in PHP to check if a string consists only of Hiragana characters.
Basic Pattern
If a string consists of one or more Hiragana characters, you can use the following regular expression pattern:
In this pattern, "^" represents the start of the string, "$" represents the end of the string, and "[ぁ-ん]" specifies the range for Hiragana characters. The + quantifier indicates that the pattern matches one or more occurrences. The final "u" enables UTF-8 mode for proper recognition of multibyte characters like Japanese.
Special Cases: Iterative Characters and Diacritics
If your string contains special Hiragana characters like "こゝろ" or "みすゞ", you can use this pattern:
However, note that this pattern may also match isolated diacritical marks like ゛ (voicing mark) or ゜ (semi-voicing mark).
Unicode Property Escape
If your string contains long vowels or unique Hiragana characters such as "けーき", "ゕ", "ゖ", "ゔ", or "ゟ", you can use a Unicode property escape pattern:
Additional Notes
This article assumes UTF-8 encoding.
Note that the way modifiers are specified might differ depending on the regular expression library you are using. Refer to the library's documentation for specific details.
Below is a summary of commonly used regular expression patterns in PHP. Use these depending on your requirements:
# | Matching Condition | Regular Expression Pattern |
---|---|---|
1 | Contains only hiragana | ^[ぁ-ん]+$ ^[\p{Hiragana}]+$ |
2 | Fixed length of n hiragana | ^[ぁ-ん]{n}$ ^[\p{Hiragana}]{n}$ |
3 | At least n hiragana | ^[ぁ-ん]{n,}$ ^[\p{Hiragana}]{n,}$ |
4 | No more than m hiragana | ^[ぁ-ん]{1,m}$ ^[\p{Hiragana}]{1,m}$ |
5 | Between n and m hiragana | ^[ぁ-ん]{n,m}$ ^[\p{Hiragana}]{1,m}$ |
Source Code
Next, we introduce a PHP function to determine if an input string consists only of Hiragana characters. This function uses Unicode property escapes for flexibility and future-proofing. It can validate based on the following conditions:
- When minimum length is omitted: Verifies the string consists only of Hiragana characters within the specified maximum length.
- When maximum length is omitted: Ensures the string has only Hiragana characters and meets the minimum length requirement.
- When both minimum and maximum lengths are omitted: Verifies that the entire string contains only Hiragana characters.
/**
* Checks whether a given string consists only of Hiragana characters.
*
* @param string $str The input string.
* @param ?int $minLength The minimum number of characters (treated as 1 if null).
* @param ?int $maxLength The maximum number of characters (treated as the length of the string if null).
* @return bool Returns true if the input string meets the conditions; otherwise, returns false.
* @throws InvalidArgumentException Throws an exception if the minimum length is less than 1,
* or if the maximum length is less than the minimum length.
*/
function isHiragana(string $str, ?int $minLength = null, ?int $maxLength = null): bool {
// Set default values
$min = $minLength ?? 1;
$max = $maxLength ?? mb_strlen($str);
// Validate arguments
if ($min < 1) {
// Throw an exception if the minimum length is less than 1
throw new InvalidArgumentException('The minimum length must be an integer greater than or equal to 1.');
}
if (!is_null($maxLength) && $max < $min) {
// Throw an exception if the maximum length is less than the minimum length
throw new InvalidArgumentException('The minimum length must not exceed the maximum length.');
}
// Construct the regular expression pattern for Hiragana characters
$pattern = is_null($maxLength)
? sprintf('/^[\p{Hiragana}]{%d,}$/u', (int)$min) // No limit on the maximum number of characters
: sprintf('/^[\p{Hiragana}]{%d,%d}$/u', (int)$min, (int)$max);
// Perform the validation using regular expressions
return (bool)preg_match($pattern, $str);
}
Validation
Specify the range in terms of characters (not bytes).