The quoting of weird names is fixed now too, as expected. The preview fix should be easy as well.
I'm working on the image resize fix. It seems that the forum is generating 50x50 thumbnails, but the theme is using css to make them 40x40, which causes them to look like shit.
Quoting the two most fucked up usernames:
Okay, so presumably "ElJoe0 (v2.8)" and "¡It Is GünterTime!" do not pass validation.
What's your validation regex? From what I can tell, you've allowed spaces and periods in addition to the standard alphanumeric chars (letters/digits/underscores).
It seems to me that ElJoe's name ought to be considered acceptable, though Gunter's name is clearly pretty marginal.
Okay, so presumably "ElJoe0 (v2.8)" and "¡It Is GünterTime!" do not pass validation.
What's your validation regex? From what I can tell, you've allowed spaces and periods in addition to the standard alphanumeric chars (letters/digits/underscores).
I need to allow parentheses for ElJoe. And I need to somehow allow all sorts of utf8 characters that \w apparently does not include.
Well, there's two approaches that can be taken with the regex: 1) Sequentially allow bits and pieces as people ask for them. 2) Allow absolutely everything except for HTML tags or the like.
With regards to Unicode and regex, apparently you have to add the '/u' option to the regex string to make PHP match in Unicode matching mode - see here. There is likely plenty of regex throughout the vanilla forums software that would require updating in this way.
Instead of '\w', it seems that the standard regex for what is considered to be a Unicode 'letter' is '\p{L}' or, more verbosely, '\p{Letter}' - see here.
So, for Unicode support you'll probably need to add in a lot of '/u' and '\p{L}' all around the shop.
Well, there's two approaches that can be taken with the regex: 1) Sequentially allow bits and pieces as people ask for them. 2) Allow absolutely everything except for HTML tags or the like.
With regards to Unicode and regex, apparently you have to add the '/u' option to the regex string to make PHP match in Unicode matching mode - see here. There is likely plenty of regex throughout the vanilla forums software that would require updating in this way.
I tried adding \X to the regex, and the quoting plugin does add the /u, but it still won't let me put in a username full of umlauts and such.
Fixed gravatar and some other problems. What happened was I backed up the plugins folder, and then used all the backed up plugins. However, a bunch of those plugins are actually delivered with Vanilla itself, so I was using old plugin code with new Vanilla. Now it's fixed.
From what I can tell, I think if you're matching in unicode mode using '\X', you're matching absolutely everything anyway, so it probably should just be '\X' by itself in that situation. It's probably worth a try to see what will happen.
If there's some things you don't want to match, don't use '\X'.
From what I can tell, I think if you're matching in unicode mode using '\X', you're matching absolutely everything anyway, so it probably should just be '\X' by itself in that situation.
If there's some things you don't want to match, don't use '\X'.
I took out the \X when I read that it matches newlines. Need to find another solution. What they should have done is just made a blacklist instead of a white list. Use a regex just to match the bad and evil characters, and allow all the other ones.
Well, you can still override their ValidateUsername function if you so choose. It's not the best of solutions, but it would work, and you don't have to directly change the Vanilla code since, as they say in their documentation here on function overrides.
However, since in this situation it's nicer to come up with a Unicode regex, I'll come up with one shortly.
If we want stuff like Umlauts and such, it appears we need to match two different kinds of Unicode: '\p{L}' for Unicode letters, and '\p{M}' for markup code points like accents and suchlike.
Have you tried adding '\p{L}\p{M}' to the regex?
I'd suggest trying it by itself just to see if it allows basic words, possibly with umlauts and such, and then try expanding it with spaces, digits, underscores, and whatever else we want.
Oh, and I think I get why '\X' wouldn't work. The regex you're giving it is being put inside a regex character class (i.e. it ends up as '[\X\w]{3,20}' or something like that) - but \X isn't actually allowable within a character class because it isn't actually a single character! What it is is a Unicode grapheme which can in fact consist of multiple code points with combining marks and suchlike. Hence \X cannot be used here, but stuff like \p{L} and \p{M} should work, because those things are always single code points.
My suggestion would be this: ' \p{L}\p{M}\p{N}\p{P}' which consists of five elements 1) Standard single space 2) All letters 3) All marks (accents, umlauts and such) 4) All numbers 5) All punctuation
Here we specifically choose to leave out \p{C} (control characters) and \p{Z} (whitespace) for obvious reasons.
Unfortunately I think we can't just allow \p{S} (symbols, including math and such) because although it would be nice to, it is pretty critical that we leave out the less than sign in order to stop people from sticking HTML into their usernames.
Does anyone know of something we're missing with ' \p{L}\p{M}\p{N}\p{P}'?
Unfortunately, it's not a very robust approach, but it's the best way to do things without overriding the Vanilla code. However, if you do override the Vanilla code, then you can use a blacklist approach instead. Basically, all you'd have to do is invalidate HTML code, newlines, and the like - that would be more robust and wouldn't need lots of stuff added in, but on the other hand you could break stuff with future updates or other plugins due to overriding the Vanilla code.
Did you look into the preview thing, Scott? It seems pretty easy.
Hells yeah!
Also, since the < sign is currently disallowed, I think it will also prevent people from having stuff like a <div> in their username unless we're missing something.
Hells yeah!
Also, since the < sign is currently disallowed, I think it will also prevent people from having stuff like a <div> in their username unless we're missing something.
Well, my name is now 3 double quotes. That's pretty fucked up, but I don't know if it's a problem. The cool thing is that it still quotes properly.
EDIT: There's definitely a problem. My public profile page isn't accessible.
Scott, do you know what characters the profile URL is capable of handling? It works okay for weird ones like ElJoe and Gunter, but it can't seem to handle mine at the moment.
Okay, based on what's being done by ValidateUrlStringRelaxed, we ought to be disallowing forward slashes, backslashes, single and double quotes, and the less than and greater than signs.
The \p{P} approach to punctuation therefore clearly doesn't work, since it allows a bunch of things that it shouldn't. However, do not despair! I've come up with an even better solution anyway!
There's an easy way to switch it to a blacklist. For example, if you make it '^abc' then it's going to end up as '[^abc]{3,20}' and so it'll end up matching 3-20 characters as long as none of them are the letter a b or c.
My current suggestion for the regex string is this: '^/\\<>\'"\p{Zl}\p{Zp}\p{C}' which ought to be everything except the following: 1) forward slash 2) backslash (needs backslash escaping within the string) 3) less than sign 4) greater than sign 5) single quote (needs backslash escaping within the string) 6) double quote 7) All kinds of line separator 8) All kinds of paragraph separator 9) All kinds of control character
Does that sound good? Are there any other notable "dangerous" characters?
Actually, if we get the regex right, we might also be able to make it reasonably able to handle stuff like @UserName in a reasonable manner, since it will use the regex to determine whether it's a username. There shouldn't be too many clashes where something could be both @UserName and a smiley.
Also, I've realised that I really should be commenting on the GitHub forum about this stuff, not here.
Comments
What's your validation regex? From what I can tell, you've allowed spaces and periods in addition to the standard alphanumeric chars (letters/digits/underscores).
It seems to me that ElJoe's name ought to be considered acceptable, though Gunter's name is clearly pretty marginal.
EDIT: ElJoe0 is okay now; I guess you updated it.
I also need to fix the unicode in general.
1) Sequentially allow bits and pieces as people ask for them.
2) Allow absolutely everything except for HTML tags or the like.
With regards to Unicode and regex, apparently you have to add the '/u' option to the regex string to make PHP match in Unicode matching mode - see here. There is likely plenty of regex throughout the vanilla forums software that would require updating in this way.
Instead of '\w', it seems that the standard regex for what is considered to be a Unicode 'letter' is '\p{L}' or, more verbosely, '\p{Letter}' - see here.
So, for Unicode support you'll probably need to add in a lot of '/u' and '\p{L}' all around the shop.
If there's some things you don't want to match, don't use '\X'.
However, since in this situation it's nicer to come up with a Unicode regex, I'll come up with one shortly.
'\p{L}' for Unicode letters, and '\p{M}' for markup code points like accents and suchlike.
Have you tried adding '\p{L}\p{M}' to the regex?
I'd suggest trying it by itself just to see if it allows basic words, possibly with umlauts and such, and then try expanding it with spaces, digits, underscores, and whatever else we want.
My suggestion would be this:
' \p{L}\p{M}\p{N}\p{P}'
which consists of five elements
1) Standard single space
2) All letters
3) All marks (accents, umlauts and such)
4) All numbers
5) All punctuation
Here we specifically choose to leave out \p{C} (control characters) and \p{Z} (whitespace) for obvious reasons.
Unfortunately I think we can't just allow \p{S} (symbols, including math and such) because although it would be nice to, it is pretty critical that we leave out the less than sign in order to stop people from sticking HTML into their usernames.
Does anyone know of something we're missing with ' \p{L}\p{M}\p{N}\p{P}'?
However, if you do override the Vanilla code, then you can use a blacklist approach instead. Basically, all you'd have to do is invalidate HTML code, newlines, and the like - that would be more robust and wouldn't need lots of stuff added in, but on the other hand you could break stuff with future updates or other plugins due to overriding the Vanilla code.
Did you look into the preview thing, Scott? It seems pretty easy.
<div>
in their username unless we're missing something.EDIT: There's definitely a problem. My public profile page isn't accessible.
Scott, do you know what characters the profile URL is capable of handling? It works okay for weird ones like ElJoe and Gunter, but it can't seem to handle mine at the moment.
The \p{P} approach to punctuation therefore clearly doesn't work, since it allows a bunch of things that it shouldn't. However, do not despair! I've come up with an even better solution anyway!
There's an easy way to switch it to a blacklist. For example, if you make it
'^abc' then it's going to end up as '[^abc]{3,20}' and so it'll end up matching 3-20 characters as long as none of them are the letter a b or c.
My current suggestion for the regex string is this:
'^/\\<>\'"\p{Zl}\p{Zp}\p{C}'
which ought to be everything except the following:
1) forward slash
2) backslash (needs backslash escaping within the string)
3) less than sign
4) greater than sign
5) single quote (needs backslash escaping within the string)
6) double quote
7) All kinds of line separator
8) All kinds of paragraph separator
9) All kinds of control character
Does that sound good? Are there any other notable "dangerous" characters?
Here's a correction, adding in # and @ since Vanilla thinks they're special too:
'^\/\<">#@\'\p{Zl}\p{Zp}\p{C}'
Also, I've realised that I really should be commenting on the GitHub forum about this stuff, not here.