[Phase 1] HIP 27: Allow 1-character mistakes in displayed addresses

mizu · August 20, 2021, 10:32pm

HIP: 27
title: Allow 1-character mistakes in displayed addresses
author: Mizu
status: Draft Phase 1
created: 2021-08-20

Simple Summary

This proposal would modify the registry policy to allow for limited mistakes or omissions in the address displayed in a profile’s video.

Abstract

Many profiles get challenged due to mistakes affecting a single character in the address displayed by the submitter in their video. This proposal would make it so a single such mistake would no longer be grounds for rejection. This would not reduce the security of the profile or the PoH registry in any practical terms as will be demonstrated below.

Motivation

Submitters will often write down their address by hand instead of printing it or displaying it on another screen. This might be because they don’t have a printer or another device with a screen they can easily display it on, or simply because they find writing the address by hand to be more convenient at that time.

As one might expect, it is often the case that submitters make mistakes when copying their address. Sometimes, these are significant errors such as the omission of a large part of the address, in which case it might be possible for an attacker to generate matching addresses, but often the error will affect only one character. We may distinguish 3 types of errors:

omitted character: a character is omitted from the address (e.g. “abcd” → “abd”)
mistaken character: a different character has been written in place of the one expected in that position (e.g. “abcd” → “ab9d”)
swapped adjacent characters: two characters adjacent to each other have been swapped (e.g. “abcd” → “acbd”)

This proposal would allow at most one of the above three errors in a displayed address. The effects on the security of an address (i.e. on the ability of an attacker to find a private key generating an address which would match the displayed address) would be the following:

omitted character: 9.32 bits: lg(40*16): 40 positions to insert the missing character which has 16 possible values
mistaken character: 9.23 bits: lg(40*15): 40 characters with 15 possible invalid values each
swapped adjacent characters: 5.29 bits: lg(39): 39 possible swaps of adjacent characters, although note that this can be slightly lower still since not all swaps have an effect (e.g. in “abbd”, swapping the two "b"s has no effect).

Note that if a character is missing (case 1.) the two other cases are no longer allowed, and conversely, if the displayed address has all 40 characters (case 2. and 3.), the first case is no longer allowed. As a result the maximum security loss is max(lg(N_1), lg(N_2 + N_3)) = max(9.32, 9.32) = 9.32 bits, reducing the total security of an address from 160 to 150.68 bits. Given the low stakes involved in being able to create a fake profile with an existing PoH registration video and the fact that no one is likely to be able to crack 150 bits of an Ethereum key for the foreseeable future, this should be an acceptable security compromise. (EDIT: I had overestimated the loss of security by adding the security bits of cases 2 and 3 when it is actually the number of possible changes that needs to be added.)

At this point, it is worth noting that there is one disadvantage to this proposal: It will no longer be possible to find a person’s profile from the video alone without trying all possible allowed errors, which is to say 639 trials in the worst case. This is easily remedied with a simple software loop, but something to keep in mind.

It might also be desirable to make the conventional “0x” prefix optional too since it does not really serve a purpose and people have occasionally been penalized for omitting it. I am adding this to the Specification below but it may be removed if it causes controversy.

Specification

The following text will be appended at the end of the first bullet point of subsection 4. of the “List of current required/optional elements and submission rules”:

The displayed address need not contain the prefix “0x”. A single one of the following errors occurring once will be tolerated in the displayed address:

omitted character: a character is omitted from the address (e.g. “abcd” → “abd”)

mistaken character: a different character has been written in place of the one expected in that position (e.g. “abcd” → “ab9d”)

swapped adjacent characters: two characters adjacent to each other have been swapped (e.g. “abcd” → “acbd”, but not “adcb”)

mnrr · August 20, 2021, 11:20pm

First one mistake is allowed… Then two…

mizu · August 20, 2021, 11:29pm

You’re right. Totally changed my mind with your in-depth argumentation. Actually, I now think we should require 512 bit SHA3 hashes of each user’s public key instead of the pathetic 160 bit nonsense Ethereum encourages. This slippery slope ends today, I say!

mnrr · August 20, 2021, 11:45pm

hahaha, I think it is very simple what I mean. Where is the limit?

There are much more constructive proposals (Senga’s work with UI for example) that would avoid resorting to this kind of flexibilization.

mizu · August 21, 2021, 12:04am

Okay, sorry for the snark. I never said I didn’t understand what you meant. My point is this does not actually hurt PoH. If you think it does, please provide a concrete failure case. I would certainly be interested in having @william check my math and give his opinion though. Regarding where we draw the limit, honestly, even 100 bits of security (down from 160) would probably be good enough for PoH but on the other hand we don’t want to make the job of checking addresses too hard for challengers, and one error (145 bits of security) is probably good enough to avoid most rejections due to honest mistakes while not confusing anyone wanting to check the address.

Regarding UI work, I don’t see how you would prevent people from making mistakes when they write down their addresses. Maybe warning them to be more careful could help, but I’m not sure how much. I guess you could integrate some form of OCR into the UI but is that planned? I have no idea how hard that would be nowadays but I’m guessing you would need an OCR server and it could be frustrating if it gives lots of false positives.

mnrr · August 21, 2021, 12:27am

No problem Mizu.
I understand your point of view behind mathematics and security tho I am not as good as I would like in this science, haha.
I keep what I said before, if we flex once, why not twice and so on?

The UI improvement will make simpler and understandable the registration process.
The voucher and vouchee relation its not just pressing a button and facilitating access to the platform.
Vouchers have the responsability to protect the vouchee and look for any mistake which could lead to a challenge. Such as checking if the address is well written.
Not only challengers curate the list. Vouchers should also do it.

mizu · August 21, 2021, 1:48am

That’s why I’m asking for William George’s opinion.

Because at some point, it either becomes insecure or confusing. I don’t see why we would ever want to allow more than two mistakes. I mean, if you have three or more mistakes, then you either haven’t made any effort, in which case you kind of deserve losing your deposit, or you’re dyslexic, in which case you should not be writing down the address yourself.

I totally agree, but unfortunately, there is no lack of empirical data showing that often, vouchers don’t care to check.

So I reiterate my point that implementing this proposal does no harm. And since you’re making a slippery slope argument, you’ll need to justify why you think people would want to continually weaken the standard. Since I would expect the number of challenges for address errors to go down exponentially with each allowed errors, I don’t think the community will have much appetite for going further than allowing 2 errors. Unless I messed up the math again (I edited my first post; the security is actually better than I thought), allowing for 2 errors would weaken the security of the address by at most 19 bits, so down to 141 bits, and once again there’s no way an attack on that would make any economical sense in the context of PoH even if it were possible. To put things into perspective, based on the current hash rate (128EHash/s), all existing bitcoin miners put together running for 100 years wouldn’t even be able to crack 90 bits of SHA-256, and computing a hash is a simpler operation than deriving a public key and then computing its hash, which is what an attacker would need to do here.

mnrr · August 21, 2021, 2:01am

I apologize for not communicating what I wanted to.

My point of view is much broader. Not only with mistakes in writing the addresses.

What I was getting at is if we can be flexible from one side, why not from the other?
I mean, if we allow an error in the address, someone could propose that, idk, 5cm of the face could be covered or not shown propperly, for example.
Am I clearer now?

mizu · August 21, 2021, 2:12am

Oh, I see what you mean now, but once again, I don’t think there’s an actual slippery slope here. The problem with the address is it’s really easy to miss one character and to not see it even after checking several times. When it comes to showing the full face, well you can see straight away whether it’s fully visible or not, so I see that as a quite different problem. Not to mention that being able to see the whole face is really fundamental to the whole concept of PoH and that we will probably have to ask to see more of the face (e.g. different angles) as deepfakes improve.

RoboTeddy · August 21, 2021, 4:02am

I think this is a great proposal.

I don’t think there’s a slippery-slope issue at all: this is backed up by a mathematical analysis, and future HIPs would be as well. If there were a problem with a future HIP, we could reject that one. But this HIP isn’t problematic — it just seems like an improvement!

mizu · August 22, 2021, 8:53am

Now that I think of it, it would also make sense to add a 4th error case:
4. additional character: an additional character has been inserted anywhere in the address (e.g. “abcd” → “abc0d”)

Such a case has actually occurred several times if I recall correctly, including for addresses displayed on screen, where I presume people copy-pasted their address and accidentally hit a key afterwards.

This 4th case has basically no additional security downsides since it only kicks in when there are 41 characters and in that case (and in that case only) will have a security impact of lg(41) = 5.36 bits (41 choices for which character to delete).

Here is the updated specification for clarity:

Specification

The following text will be appended at the end of the first bullet point of subsection 4. of the “List of current required/optional elements and submission rules”:

The displayed address need not contain the prefix “0x”. A single one of the following errors occurring once will be tolerated in the displayed address:

omitted character: a character is omitted from the address (e.g. “abcd” → “abd”)

mistaken character: a different character has been written in place of the one expected in that position (e.g. “abcd” → “ab9d”)

swapped adjacent characters: two characters adjacent to each other have been swapped (e.g. “abcd” → “acbd”, but not “adcb”)

additional character: an additional character has been inserted anywhere in the address (e.g. “abcd” → “abc0d”)

william · August 24, 2021, 3:51am

Your math looks good to me.

In terms of where one would draw the line, if one allows two errors than the kinds of cases @mizu considers here can interact to a greater extent having the consequence that there would be more situations where one has to add security losses together rather than taking maxima. (E.g. one error can be forgetting a character and the other error adding in an incorrect character somewhere else; so as you are then back to 40 characters a potential attacker could include modifications of this form in attack addresses that she is looking for along with addresses where a character is replaced or swapped.) So that makes the math of computing the security loss of going from allowing one error to allowing two errors somewhat more complicated, but one could expect it would be more than twice the loss going from allowing zero errors to one error. Moreover, at some number of errors the challenger’s task of checking if there is some sequence of allowable modifications between the registered address and the address in the video could become quite complicated. However, at one allowable error, the kinds of acceptable modifications to the address are still categorized enough that the challenger’s task is probably not that much harder than currently.