[Phase 2] HIP-41: Allow verbal confirmation for registrating users

nicobilinkis.eth · March 28, 2022, 1:55pm

HIP 41
Title: Allow verbal confirmation for registrating users
authors: avsa.eth and nicobilinkis.eth
Status: Phase 2
Created: 2022-28-3

Simple Summary

This would add the option to have a user, instead of holding a sign with their address, give a verbal confirmation of their public address. Either visual or verbal confirmation would be accepted.

Abstract

The user, when onboarding, should be able to decide either to make a visual or verbal confirmation (or both) of their public address during their video.

The current rules for profile submission require a mandatory sign (can be a screen) showing the full Ethereum address. This has been a point of failure for many people. Although many efforts where made to help resolve those problems, such as HIP-27 allowing for one character mistakes, the sign itself is just one more step to fight deep-fakes. Considering all those facts, we believe that it’s feasible to replace that sign with a verbal confirmation. Instead of showing a sign with the ETH address the submitter would have to say out loud a series of words that confirm ownership of both address and media files.

Motivation

The advantages are clear:

Easier flow, as the user can do all the process in one take in the video by following simple instructions on screen
No props needed. Human doesn’t need to fish around to get a pen and a paper or find a second screen, nor spend a long time trying to slowly copy the address.
More accessible. For people with movement or vision impairment, this would make it easier for them to be included in the system
DOESN’T EXCLUDE SIGNS: users who still prefer to hold signs (maybe someone who is speech impaired, or just shy) can still use visual confirmation of their address.

Implementation

BIP 039 words

The user must look at the camera and say (in one of the officially acepted languages):
"I certify that I am a real human and that I am not already in this registry. My identifier is "

Following by an identifier which is obtained by converting the hexadecimal ethereum address into base 2048, then converting those digits into the official BIP039 dictionary words. The user then must speak the first 6 resulting words in the same same language as the phrase.

Example:
The address 0x17a91203a9e9c3519c2f76210497ef7f4be2352f
Would be spoken as: Able Barrel, Debris Siege, Pretty Inquiry. (commas and spacing are optional)

The BIP039 dictionary MUST be in the same language as the rest of the phrase, so when a future HIP approves a new language, this HIP will not need to be updated. If a future HIP approves a new official Proof of Humanity language that does not have an official BIP039 dictionary yet, then the HIP that adds the new language must also define an official word dictionary for the purposes of this HIP.

NingFid · March 29, 2022, 9:59am

How exactly can future submitters convert the ethereum address into base 2048? Might need to specify that in the submission page should the proposal gets approved.

ludovico · March 29, 2022, 3:07pm

What I think is that for Phase 3, we should consider for the time being having the two registration methods available, and if all goes well, to phase off the “old” method via another HIP.

mizu · March 29, 2022, 11:55pm

Here’s a little analysis of the security of this proposal.

Assumptions I’ll make:

The security granted by each word is about 10 bits rather than 11 bits (log_2(2048)) due to likely mispronunciations by non-native speakers (e.g. “fee” and “feed” are likely to be hard to distinguish).
An upper bound for the cost of hashing can be determined by the hash rate of the bitcoin network, which currently performs 221EH/s (exahashes per second). At a price of 45000USD, with a 6 BTC reward per block, and given bitcoin’s 10 minute block time, that means 221EH/s * 600s costs 6 * 45000USD. So 1EH costs 6 * 45000 / (221 * 600) ~ 2USD.
Generating a usable address is only a matter of hashing some random numbers (and doesn’t require performing other operations orders of magnitude slower). There’s a good chance this is not true and any knowledge I might have had about ECC is long dead. Still, it’s generally good to assume attackers will become more powerful than one can imagine when designing a cryptographic system so let’s just roll with it?

1EH is almost 2^60 hashes, so under these assumptions, it can cost an adversary 2 USD to break 60 bits of security and the time would be on the order of ten seconds (with one tenth of the Bitcoin network’s hashing power lol). So if we use 6 words assumed to provide 10 bits of security each, that’s… 2USD and 10 seconds to generate a matching address. So not nearly enough to discourage an attacker from generating such an address and front-running a submitter.

Every word we add adds 10 bits of security and multiplies the cost of such an attack by 1000 (1024 to be exact). So at 7 words (~70 bits), an attack would cost around 2000USD and a few hours to perform (still not enough IMO), but at 8 words (~80 bits), it would cost around 2M USD and a few months to perform. There’s still a chance that a wealthy attacker would attempt such an attack for the fun of it but it’s unlikely to be profitable (especially considering how easy it is to detect and to remedy, by submitting a removal request). It would also only work on submissions where the submission transaction has been made with extremely low fees and is not expected to be mined for months.

Conclusion: The number of words should be increased to 8.

EDIT: When I said the 2M$ attack was unlikely to be profitable (as opposed to no chance of being profitable) I was thinking of a case where one might acquire the PoH profile of a highly respected individual and use it to perform scams.

mizu · March 30, 2022, 1:02am

Here’s a random thought btw. Instead of the words just being a hash of the address, we could kill two birds with one stone and make them a hash of the address and a recent block (the block number used could be stored in the same json file as the rest of the submission metadata). This would solve the issue we currently have where profiles can be (re)submitted months or years after they were made due to a lack of timestamp without requiring saying or writing a separate thing. Actually, we might as well make it a hash of the whole json file while we’re at it. That way, any additional data we might want to integrate in the future would always be automatically included.
EDIT: Not possible because the json contains the video… Oops.

nicobilinkis.eth · March 30, 2022, 11:36am

Small update to help solve some user doubts: The phrase derived from the address would be generated automatically in the DAPP UI. Also it would be shown on every profile that chose this verification method, so that validators can still curate the list as usual.

korinektomas · March 30, 2022, 1:40pm

Although I like the general idea of simplifying the process for the registrants, I am not sure, whether having two options is actually a good way of doing so. The process is already quite confusing for a lot of people, therefore having more options may work contrary to the original idea (from UX point of view)…

Also I would like to know what is the risk related to deepfakes. I am not expert in that area, but they are getting better and better, so just the video of nothing else than human saying the words does not sound very bulletproof in a long run… I believe adding something extra into the visual content increases the complexity (=protection) a lot…

Considering these things, an idea of combining the two options popped up in my mind… What about changing the process in a way, the sign still would have to be present, but instead of hash address, the person would be expected to write down those BIP039 words? (they would be generated automatically in UI as @nicobilinkis.eth suggested)…

Still not sure whether this is making the process easier. But at the same time it could resolve the problems of typos in the addresses… So I leave the idea here, maybe somebody will find it useful

Mads · March 30, 2022, 2:18pm

I agree that writing down the words would be better, deep-faking a video will be more difficult when you have to address the special case of someone holding a sign with specific words on them. More work to fake.

I believe writing words rather than the address would significantly reduce the possibility of error. The problem is that writing a string of numbers with no meaning to the writer is extremely error-prone. This is not the case when you are writing words down with a definite meaning in your own language.