I guess the Scunthorpe problem is still a thing. https://en.wikipedia.org/wiki/S...

tyingq · on Jan 25, 2021

My particular favorite is this one: "In October 2020 a profanity filter banned the word bone at a paleontology conference."

https://www.vice.com/en/article/dyzamj/a-profanity-filter-ba...

koheripbal · on Jan 25, 2021

Why are we letting tech companies treat us like children?

Once I confirm to them I'm an adult, I should be able to choose to see everything.

int_19h · on Jan 25, 2021

A long time ago, I saw a guy unable to post on a corporate blog of his own team. Turned out that his name was flagged by a filter.

What made this particularly egregious is that the name in question: "Hui" - wasn't even a swear word in either his own native language - Chinese - nor in English. But it closely resembles a Russian profanity. Turned out that the filter was "multilingual", and applied rules for all languages to all posts...

grishka · on Jan 25, 2021

Why the hell would it even apply Russian filters to something that isn't written in Cyrillic? And this isn't the best English transliteration of that word either... That's really some dedication.

emayljames · on Jan 25, 2021

Yeah, they would have to do literal translation based on phonetics. That is just insane.

thaumasiotes · on Jan 25, 2021

> they would have to do literal translation based on phonetics.

That seems pretty unlikely to have happened here; I don't know the Russian word in question, but the Chinese "hui" rhymes with English "clay". (It also rhymes with the more sensibly spelled Chinese "wei"; the 'e' is only omitted when the syllable begins with a consonant. Compare "feng shui".) I'd be surprised if that were a possible reading of any Russian that might be transliterated "hui".

int_19h · on Jan 25, 2021

A native Russian speaker who is familiar with how Russian is usually transliterated, but unfamiliar with Chinese, would read it very similar to the Russian word in question ("khoo-y").

As to why the filter was applied to Latin characters - I'm not sure, but I'm assuming that's to prevent people from using translit to sneak in profanities. Of course, this ends up being a pointless game of whack-a-mole - there's so many possible ways to spell something like that with Unicode...

thaumasiotes · on Jan 26, 2021

Huh, I looked up the word. хуй?

Looks like Russians and Americans can find common ground on thinking Chinese last names look like "penis", even if we're making fun of Wang and they're making fun of Hui.

Clewza313 · on Jan 26, 2021

In Chinese socialism is shè huì zhǔ yì, which had to be intentionally misromanized as шэхуэйчжуи to avoid dick jokes.

thaumasiotes · on Jan 26, 2021

What's the mistake? Look up; хуэй is a much better representation of the pronunciation of 会 than хуй would be. The pinyin spelling "hui" omits the primary vowel of the syllable.

grishka · on Jan 26, 2021

As another native Russian speaker, "i" isn't the most common transliteration for "й", and that's what bothers me here. "Hui" would be a plural, with an "и". Й is usually written as "y" or "j". Except when you're getting an international passport, then there's a good chance your name will end with "ii" because the federal migration service hates you.

int_19h · on Jan 26, 2021

It's not the most common transliteration, but it's common enough; and even in Cyrillic, if you see "и" where "й" would normally be expected, you'd usually read it like the latter; e.g. "йод" is sometimes spelled "иод", but everybody will read it the same. Given that the written distinction between и/й dates back to Peter's civil script reform, and that it wasn't even considered a separate letter of the alphabet until the 1918 spelling reform, it's not really surprising.

grishka · on Jan 26, 2021

Hm. I thought Й was being used in the old style (pre-1918) writing as well? At least this[1] translator keeps it in masculine adjectives. Though it doesn't keep the dots on a Ё. I've never seen И substituted for Й, but Ё -> Е is common, especially in names (for example some people write "Артем" but everyone still reads it as if there's a "ё").

[1] http://slavenica.com

int_19h · on Jan 27, 2021

It was used before 1918 - it was first standardized in the Civil Script (1710). But it wasn't considered a separate letter until 1918 - so e.g. the standard alphabetic sorting ignored the distinction. For this reason, it wasn't always used consistently, although it was still much more consistent than Ё. And even today, "иод" is still considered valid spelling; indeed, it's the preferred one in scientific context.

This still shows up in some contexts - e.g. Й, like Ё, isn't used in bullet lists; try it in Word - it'll go from И straight to К.

ubermonkey · on Jan 25, 2021

I'm just agog that people are still doing dumb pattern matching for profanity filters. I just assumed that YEARS AGO people realized how dumb it is, but apparently: No.

Majromax · on Jan 25, 2021

This is Google. It's probably very smart pattern matching for profanity.

The neural network may have taken millions of core-hours to learn to be as dumb (here) as a blind keyword search.

ttt0 · on Jan 25, 2021

We had to give up our privacy to create a highly sophisticated technology that doesn't even work half of the time. I love the future, it was totally worth it.

smichel17 · on Jan 25, 2021

Well, obviously. If it were a dumb profanity filter then it would be possible to fix it!

aidos · on Jan 25, 2021

I once had a bug that I traced back to a rule (can’t remember in which part of the stack - though I think it was client controlled IIS) that was striping the “select” from the word “selected” in query string params in an attempt to thwart sql injection. From memory it was naive enough that “sselectelect” was converted nicely in the process.

jimsmart · on Jan 25, 2021

Similar: Yahoo used to (2002) replace any instance of the character sequence 'eval' (and other 'bad' strings) in their emails, in an attempt to prevent Javascript exploits. Needless to say it created a small amount of havoc!

http://news.bbc.co.uk/1/hi/sci/tech/2138014.stm

https://en.wiktionary.org/wiki/medireview

distances · on Jan 25, 2021

I hadn't heard of this and I'm now flabbergasted. Is it even legal for a service provider to secretly change email contents? It's absolutely outlandish to imagine how someone first thought this could be a good idea, and then found someone capable of executing the plan and apparently agree.

simion314 · on Jan 25, 2021

I had similar issues, the software is Mod Security that some hosting companies use and some rules will empty out your POST request field if it contained text like ".... select ...from..." where the 2 keywords were paragraphs apart.

boring_twenties · on Jan 25, 2021

Not super relevant or anything but I just can't help but share my favorite profanity filter story, so here you go.

I worked at a place that had a profanity filter in two parts.

The first part was in C, several pages of if (!strcmp(x, a)) return 0;

After all that, it then invokes popen() to ssh to another machine and run a shell script there, which contains several more pages of string comparisons, this time in shell.

tyingq · on Jan 25, 2021

Doesn't popen() pass strings to a shell? Sounds dangerous, as you would have to escape semicolons, quotes, etc.

ljm · on Jan 25, 2021

I might be wrong but I think it's about censoring the 'hell' in 'shell'. Because some parts of the world consider words like 'hell' and 'damn' to be profane.

siltpotato · on Jan 25, 2021

Hold on, what? Okay, return false if they aren't equal, then open another process to repeat this method once again in the shell... I can't guess the reason. Would you know if there is any reason this might have been done?

boring_twenties · on Jan 26, 2021

I wouldn't know the real reason for sure, but this seems plausible:

1) They got tired of having to modify C code and wait for the deploy cycle to modify the filter

2) Using, for example, the database would be more work than calling a shell script. On top of that, it might actually be beyond the abilities of the programmer involved.

3) The C code executes on an arbitrary machine. Hence the ssh to a specific machine, so that the shell script would only have to be maintained in one place

saagarjha · on Jan 25, 2021

strcmp returns 0 if the strings are equal.

Blikkentrekker · on Jan 25, 2021

A great many places do this and automatically refuse content based on arbitrary “bad words” regardless the context.

I remember being denied to post a forum post containing the phrase “tardive dyskinesia”, as it appears that it rejected anything with the string “tard” in it.

I'm not sure as to whom they think to be helping with that, but it's entirely possible that their advertisement revenue will actually suffer, if the string “tard” be found on their pages.

drzoltar · on Jan 25, 2021

FWIW, general profanity detection is a highly nontrivial problem. It’s true that such subword profanity filters aren’t that great, but slightly more sophisticated ones (eg whole word matching or n-grams) tend to have relatively good precision. You could train a fancy neural network, but the overall return on precision and recall tends to be not that great (compared to the exponential change in speed and cost). The problem almost always crops up in out-of-distribution sentences (such as “bone” at a paleontology conference).

wiml · on Jan 25, 2021

Even humans with full general intelligence and domain knowledge will fail at profanity detection. I think the problem here is not so much that there are false triggers, but that there is no way to deal with the false triggers — no way to appeal to reason or utility.

Blikkentrekker · on Jan 25, 2021

It's a problem with a subjective answer.

One man's profanity is not another man's profanity.

Of course, the personality trait of desiring censoring “bad words” seems to highly correlate with a belief in objective morality. — the others are wrong about what they find profane!

gaius_baltar · on Jan 25, 2021

They just rebranded it as "AI-powered profanity filter" :)

015a · on Jan 25, 2021

Reminiscent to me of Call of Duty Warzone; it has loadouts which you can give custom names (that only you see!) which are protected with a profanity filter. Comically, some of the literal names of the guns are banned as being profane, like "MP5".

mikestew · on Jan 25, 2021

My CoD group of friends still occasionally calls the assault rifle "analsault". Stupid, huh? Not as stupid as an earlier version of CoD (Black Ops 1, IIRC) that wouldn't let you name a load out "assault $WHATEVER", 'cuz you know, "ass". But "anal" is so much better so that was allowed.

They fixed it in later versions, but I still have a "penetration" class because I'm immature that way.

tschwimmer · on Jan 25, 2021

See also Dark Souls multiplayer, in which you can see many “K***hts” running around.

hatsunearu · on Jan 25, 2021

Oh my god that's fucking hilarious.

vaduz · on Jan 25, 2021

Isn't it just to double-plus-ensure that no one "accidentally" uses a name that ActivisionBlizzard did not license from the appropriate gun manufacturer?

I.e. It has little to do with profanity but a lot to prevent someone from making screnshot of a loadout with a gun that looks like MP5, is named by them as "MP5 whatever" and behaves like an MP5 in some type of legal action?

sli · on Jan 25, 2021

I cannot imagine horror of the precedent it would be set if H&K successfully sued AB over copyright infringement for names that are visible only to the player who entered them. Those names are not shown publicly.

vaduz · on Jan 25, 2021

Whilst I agree - and fervently hope we won't have to live in such a world - I thought the same about the API copyrightability and that one is not exactly going the reasonable way at the moment.

H&K has an US trademark consisting of just "MP5" in relation to a ton of things (though not video games!) so they could at least try make a case out of it not being purely nominative use and tie AB in court, if they wished. It would be PR suicide, but still, not the most stupid thing they have done.

015a · on Jan 25, 2021

Unclear. They do absolutely refer to their gun as, say, the "MP5" in-game.

Though, interestingly, in Modern Warfare (2019), many guns have two names; for example, the MP5 is also called SMG Charlie (as in, NATO phonetic alphabet for C). I kind of got the impression that it was laying groundwork for a long-term goal of removing the actual names of the guns; possibly due to licensing fees, or maybe to divorce the ugly reality of killing with video game killing, I don't know.

ufmace · on Jan 25, 2021

It feels like it would be pretty bizarre if a court somewhere actually ruled in favor of a gun manufacturer for lost revenue in a trademark suit because somebody was genuinely confused between a weapon in a video game and ordering an actual physical weapon, that can only be legally ordered by licensed firearm dealers and government organizations.

floatboth · on Jan 26, 2021

Need for Speed Heat doesn't allow you to put "69" or "420" on your car. But "6 9" and "4 2 0" are fine :D Best filter ever, completely defeated by just spaces

wheybags · on Jan 25, 2021

Maybe I'm being dense here, but what possible profane meaning is there in MP5?

015a · on Jan 26, 2021

You're not being dense. Its inexplicable. The only thing I can come up with is that "5" looks like "S", so maybe its banning "MPS", but even that is nearly meaningless; urbandictionary has some explicit things it stands for, though they're not well-upvoted.

CamperBob2 · on Jan 25, 2021

The RIAA is trying to get ahead of various up-and-coming formats that will be used to pirate their content.

emayljames · on Jan 25, 2021

The example where the AFA filtered a news article about Tyson Gay, to replace any instance of his surname to 'homosexual' is an hilarious example of why you need context.

ourcat · on Jan 25, 2021

And the Arsenal pocketwatch.