The Self-Hacking Manual: Casting Kotoamatsukami on Oneself

Disclaimer: The quit‑smoking example is used solely to illustrate the methodology and does not constitute medical advice.

Because I am not Uchiha Shisui, I cannot cast the authentic Kotoamatsukami, i.e. direct belief manipulation, on myself. However, thanks to the rise of Kagaku Ningu (Scientific Ninja Tools), namely the research of Schwartzstein & Sunderam, “Using Models to Persuade” (AER 2021), and Aina’s tailored‑stories framework, I can at least mimic the effect of Kotoamatsukami.

In this post, I make an example by adapting from 4.2 in Aina (2025) tailored stories.

Settings

Suppose I am a smoking addict attempting to quit. When I make this decision, I know that in the next period I am very likely to lose willpower and give in by smoking another cigarette. My prior belief that I have strong willpower with respect to abstaining from smoking is low, say, 0.070.07.

Formally, let the state space to be Ω={H,L}\Omega=\{H,L\}, where HH means I have strong willpower to abstain, and LL means I am weak, with μ0(H)=0.07\mu_{0}(H)=0.07.

The sender is the t=0t=0 self, who decides to quit; the receiver is the t=1t=1 self, who must implement the unpleasant task of smoking cessation. There is also a future self at t=2t=2 who enjoys the health benefits of quitting without suffering, whereas the t=1t=1 self endures the temptation without yet enjoying a healthier body (or mind).

The set of signal realisations is S={G,B}S=\{G,B\}. A good signal GG could be less coughing, feeling more comfortable during my nightly run, or simply feeling less tempted to smoke. A bad signal BB could be severe cravings, worse coughing, and so on, i.e. the opposites of the examples in GG.

All of these examples are observable, consistent with the notion of a signal realisation.

The true signal‑generation process is πtrue(GH)=πtrue(BL)=0.75\pi_{\text{true}}(G\mid H)=\pi_{\text{true}}(B\mid L)=0.75. Even conditional on HH, I might occasionally still be short of breath while running (e.g. because recovery takes time or air quality is poor). Likewise, conditional on LL, my lifestyle might improve by chance, leading to unexpectedly smooth recovery, a mere ¨regression¨ to the mean. Thus the process remains stochastic.

A modelling complication arises: the t=0t=0 self knows the true process, whereas the t=1t=1 self does not (the t=2t=2 self is inactive and merely benefits). I gloss over the epistemic justification, call it the privilege of age and a brain deteriorated by years of smoking.

(As you might already noticed, the t=1t=1 me, or a receiver in this framework generally, is NOT fully (Bayesian) rational: the receiver not only unaware of the true model, but doesn´t even have a prior over possible models; the receiver used only a heurestic rule to choose which model to adopt,…)

Aina refers to GG or BB as a “signal”; I call each an individual signal realisation. The difference is terminological only.

The action space is A={SM,NSM}A=\{SM,NSM\}, meaning “smoke” or “not smoke”.

Although I have said the lucky inactive t=2t=2 self enjoys the benefits, these are ex‑post gains worth, say, v=120v=120. From an expected‑utility perspective, both t=0t=0 and t=1t=1 selves value the discounted benefits and costs. The cost of suffering at t=1t=1 is c=5c=5. I exhibit present bias (Laibson 1997, eq. (3), pp. 442–443). Specifically, the t=0t=0 discount‑weight vector is (1,βδ,βδ2)(1,\beta\delta,\beta\delta^{2}), and the t=1t=1 vector is (1,1,βδ)(1,1,\beta\delta) (or (1,βδ)(1,\beta\delta) after dropping the bygone first coordinate).

Interpretation: From the t=0t=0 viewpoint, current utility is weighted by 1; because δ<1\delta<1, the cost cc (weight βδ\beta\delta) is valued less than the benefit vv (weight βδ2\beta\delta^{2}).

Similarly, from the anticipated t=1t=1 perspective, the suffering cost has weight 1, whereas the future benefit has weight βδ\beta\delta. Therefore the t=0t=0 self expects the t=1t=1 self to feel the pain more acutely than the benefit and perhaps relapse.

We leave the fortunate t=2t=2 self aside; he never decides.

From the t=0t=0 me perspective, taking a=SMa=SM rather than a=NSMa=NSM earn me the icrement

U0(SM)U0(NSM)=βδc+μ0(H)βδ2vU_{0}(SM)-U_{0}(NSM)=-\beta\delta c+\mu_{0}(H)\beta\delta^{2} v

so the t=0t=0 me will choose to quit if U0(SM)U0(NSM)>0U_{0}(SM)-U_{0}(NSM)>0, and that´s why the threshold look likes this μ0(H)ˉ:=cδv\bar{\mu_{0}(H)}:=\frac{c}{\delta v}

suppose δ=0.97\delta=0.97 and β=0.7\beta=0.7.
Then the threshold of the time t=0t=0 me is

cδv=50.97×1200.043\frac{c}{\delta v}=\frac{5}{0.97\times 120}\approx0.043

From the t=1t=1 me perspective, taking a=SMa=SM rather than a=NSMa=NSM earn me the icrement

U1(SM)U1(NSM)=c+μ1(H)βδvU_{1}(SM)-U_{1}(NSM)=-c+\mu_{1}(H)\beta\delta v

so the t=1t=1 me will actually implement quitting if U1(SM)U1(NSM)>0U_{1}(SM)-U_{1}(NSM)>0, and that´s why the threshold look likes this μ1(H)ˉ:=cβδv\bar{\mu_{1}(H)}:=\frac{c}{\beta\delta v}

so for the t=1t=1 me the threshold is

cβδv=50.7×0.97×1200.062\frac{c}{\beta\delta v}=\frac{5}{0.7\times 0.97\times 120}\approx0.062

Reading the numbers, the latter threshold is harder to overcome but it is the one that matters, because it related to the poor boy that actually implement it, the t=1t=1 me.

Tailored Stories: the Kagaku Ningu version Kotoamatsukami

This part is the reason why I name this post the way as it is, for those who are not in the vibe of Naruto thing, you might choose to focus on the methodology itself. I call this trick “Kotoamatsukami on myself” in the line of what Aina’s tailored models.

In time 0, I prepare two narratives/models for later me-s (me in plural). Why just ¨two¨ models? you can check xxx in Aina if interested.

m1:πm1(BH)=0.9,πm1(GH)=0.1πm1(BL)=0.6,πm1(GL)=0.4m_{1}: \pi_{m_{1}}(B∣H)=0.9, \pi_{m_{1}}(G∣H)=0.1;\pi_{m_{1}}(B∣L)=0.6, \pi_{m_{1}}(G∣L)=0.4

m2:πm2(GH)=0.9,πm2(BH)=0.1πm2(GL)=0.6,πm2(BL)=0.4m_{2}: \pi_{m_{2}}(G∣H)=0.9, \pi_{m_{2}}(B∣H)=0.1;\pi_{m_{2}}(G∣L)=0.6, \pi_{m_{2}}(B∣L)=0.4

so, if the time t=1t=1 observe the signal realization GG, the fit of model m1m_{1} will be

Prm1(G)=μ0(H)πm1(GH)+μ0(L)πm1(GL)\text{Pr}_{m_{1}}(G)=\mu_{0}(H)\pi_{m_{1}}(G|H)+\mu_{0}(L)\pi_{m_{1}}(G|L)

while that of model m2m_{2} will be

Prm2(G)=μ0(H)πm2(GH)+μ0(L)πm2(GL)\text{Pr}_{m_{2}}(G)=\mu_{0}(H)\pi_{m_{2}}(G|H)+\mu_{0}(L)\pi_{m_{2}}(G|L)

I omit the algebra of Prm1(B)\text{Pr}_{m_{1}}(B) and Prm2(B)\text{Pr}_{m_{2}}(B).

Notice that the fit of model mim_{i} after observing a signal realization is the marginal of that signal realization using the likelihood from model mim_{i}, which is generally from the ¨objective¨ probability of that signal realization, because the latter is the marginal under the true likelihood, i.e. the signal-realization generating process model mtruem_{\text{true}}.

If the time t=1t=1 adopt model i{m1,m2}i\in\{m_{1},m_{2}\} after obserbing signal realization s{G,B}s\in\{G,B\} (which happens when the fit of model ii is higher, i.e. Prmi(s)>Prm3i(s)\text{Pr}_{m_{i}}(s)>\text{Pr}_{m_{3-i}}(s))

then, the Bayesian updating comes, and the belief (the posterior after observing the signal realization) becomes

μs(HG)=μ0(H)πmi(GH)Prmi(G)\mu_{s}(H|G)=\frac{\mu_{0}(H)\pi_{m_{i}}(G|H)}{\text{Pr}_{m_{i}}(G)}

μs(HB)\mu_{s}(H|B) is derived accordingly

according to Aina:

if μ0(H)>cδv=0.043\mu_{0}(H)>\frac{c}{\delta v}=0.043, then the time t=0t=0 me would like to quit smoking;

(With the choice of μ0(H)=0.07\mu_{0}(H)=0.07, we know that 0.07>0.0430.07>0.043. So okay, at least the time t=0t=0 want to quit smoking.)

if μs(HG)>cβδv=0.062\mu_{s}(H|G)>\frac{c}{\beta\delta v}=0.062, then the time t=1t=1 me would like to implement it after observing a good signal realization;

if μs(HB)>cβδv=0.062\mu_{s}(H|B)>\frac{c}{\beta\delta v}=0.062, then the time t=1t=1 me would like to implement it after observing a bad signal realization.

The essence of this tailored stories thing is: by presenting to the t=1t=1 self two alternatives models m1m_{1} and m2m_{2} (maintaining the assumption that the t=0t=0 me know the mtruem_{\text{true}} while the t=1t=1 me not) before any signal is actually realized, the time t=1t=1 will choose the model with the highest fit conditional on the t=1t=1 me´s observation of the signal realization (while by then the time t=0t=0 is only a history who have no idea what is the actual signal realization, and this interpretation that the past me is a history without the possibility of time travel exactly gives the t=0t=0 me´s presentation of models the necessary commitment power: I will no longer exist by then, so it is impossible for me to distort anything)

Now let´s substitute the μs(HG)\mu_{s}(H|G) and μs(HB)\mu_{s}(H|B) out using the Bayesian updating but without being definite about which model the time t=1t=1 will actually choose, i.e. leave the ii in mim_{i} there.

if μ0(H)>cβδvPrmi(G)πmi(GH)\mu_{0}(H)>\frac{c}{\beta\delta v}\frac{\text{Pr}{m_{i}}(G)}{\pi_{m_{i}}(G|H)}, then the time t=1t=1 me would like to implement it after observing a good signal realization;

if μ0(H)>cβδvPrmi(B)πmi(BH)\mu_{0}(H)>\frac{c}{\beta\delta v}\frac{\text{Pr}{m_{i}}(B)}{\pi_{m_{i}}(B|H)}, then the time t=1t=1 me would like to implement it after observing a bad signal realization;

Now let´s turn to the model selection problem of the t=1t=1 self.

After observing a good signal realization,

Prm1(G)=μ0(H)πm1(GH)+μ0(L)πm1(GL)=0.07×0.1+0.93×0.40=0.379\text{Pr}_{m_{1}}(G)=\mu_{0}(H)\pi_{m_{1}}(G|H)+\mu_{0}(L)\pi_{m_{1}}(G|L)=0.07\times 0.1+0.93\times 0.40=0.379;

Prm2(G)=μ0(H)πm2(GH)+μ0(L)πm2(GL)=0.07×0.9+0.93×0.60=0.621\text{Pr}_{m_{2}}(G)=\mu_{0}(H)\pi_{m_{2}}(G|H)+\mu_{0}(L)\pi_{m_{2}}(G|L)=0.07\times 0.9+0.93\times 0.60=0.621

Since Prm2(G)>Prm1(G)\text{Pr}_{m_{2}}(G)>\text{Pr}_{m_{1}}(G), the t=1t=1 me will choose model m2m_{2} and get μs(HG)=0.07×0.90.621=0.1014\mu_{s}(H|G)=\frac{0.07\times 0.9}{0.621}=0.1014 after observing a good signal realization & choosing to adopt the model with the highest fit, here, model m2m_{2} ;

similarly, the t=1t=1 me will choose model m1m_{1} and get the posterior μs(HB)=0.07×0.90.621=0.1014\mu_{s}(H|B)=\frac{0.07\times 0.9}{0.621}=0.1014 after observing a bad signal realization & choose to adopt model m1m_{1} (of course, we can predict it by my value assigned symmetrically to the two models.)

Note that here the posterior belief μs(Hs)\mu_{s}(H|s) after observing either signal realization, s{G,B}s\in\{G,B\} is always higher than the prior μ0(H)\mu_{0}(H), as-if contradicting Bayes plausibility/the martingale property/the splitting lemma that we typically encountered in the Bayesian persuasion literature. While, in fact, it is inapplicable.

This is because a different signalization might prompt the receiver to adopt a different model. In constrast, in BP, it is always the same model (those {π(sω)}sS,ωΩ\{\pi(s|\omega)\}_{s\in S, \omega\in\Omega} chosen by the sender, alongwith sender´s choice of SS) being used, and in terms of signal-realization generating process, it is always the true model in BP being used to get the marginal of realization: fit.

Here, a different feasibility criterion is used, known as Bayes consistency using harmonic mean, which I´ll leave aside for now.

upon observing different signal realization, the receiver might (and the t=1t=1 me indeed) adopt different models, which is different from the Bayesian persuasion framework where a single information structure is used.

okay, now no matter which signal realization the t=1t=1 me will observe, good or bad, the “narcissistic-kind” of posterior that “I’m strong will on quit smoking” is always higher than the criterion 0.0620.062. Great! it works! now we can assure the t=1t=1 me will actually implement the smoking quit!

Notice that, if the prior is so low, such that even the lower threshold μ0(H)>cβv\mu_{0}(H)>\frac{c}{\beta v} cannot be overcome, then even the t=0t=0 me wouldn’t ever think about quit smoking, so the Kotoamatsukami wouldn’t be initiated; if the prior is so high, such that the higher threshold μ0(H)>cβδv\mu_{0}(H)>\frac{c}{\beta\delta v} is already being passed by the prior, then Kotoamatsukami wouldn’t be needed and the the quit smoking problem becomes trivial. So it is exactly when the prior is between these two threshold the problem will be interesting: without the Kotoamatsukami, the quit of smoking will fail but with the Kotoamatsukami it will succeed.

Appendix

Reverse-engineering m1m_{1} and m2m_{2}

given μ0:=μ0(H)\mu_{0}:=\mu_{0}(H)

m1:πm1(BH)=a,πm1(GH)=1aπm1(BL)=b,πm1(GL)=1b\begin{aligned} m_{1}:\\ \pi_{m_{1}}(B∣H)=a, \\ \pi_{m_{1}}(G∣H)=1-a;\\ \pi_{m_{1}}(B∣L)=b, \\ \pi_{m_{1}}(G∣L)=1-b \end{aligned}

m2:πm2(GH)=1c,πm2(BH)=cπm2(GL)=1d,πm2(BL)=d\begin{aligned} m_{2}: \\ \pi_{m_{2}}(G∣H)=1-c, \\ \pi_{m_{2}}(B∣H)=c;\\ \pi_{m_{2}}(G∣L)=1-d, \\ \pi_{m_{2}}(B∣L)=d \end{aligned}

what we need is to let the receiver
voluntarily
to
choose m1m_{1} after observing signal realization BB and m2m_{2} after observing signal realization GG.

The first requires that

fit(m1)(B)=Probmarginal(B;m1)>Probmarginal(B;m2)=fit(m2)(B)\text{fit}(m_{1})(B)=\text{Prob}_{\text{marginal}}(B;m_{1})>\text{Prob}_{\text{marginal}}(B;m_{2})=\text{fit}(m_{2})(B)

which is

μ0a+(1μ0)b>μ0c+(1μ0)d\mu_{0}a+(1-\mu_{0})b>\mu_{0}c+(1-\mu_{0})d

i.e.

bd>(ca+bd)μ0b-d>(c-a+b-d)\mu_{0}

similartly

fit(m2)(G)=Probmarginal(G;m2)>Probmarginal(G;m1)=fit(m1)(G)\text{fit}(m_{2})(G)=\text{Prob}_{\text{marginal}}(G;m_{2})>\text{Prob}_{\text{marginal}}(G;m_{1})=\text{fit}(m_{1})(G)

Then Bayesian updating gives

μs(HB;m1)=μ0aμ0a+(1μ0)b\mu_{s}(H|B;m_{1})=\frac{\mu_{0}a}{\mu_{0}a+(1-\mu_{0})b}

and

μs(HG;m2)=μ0(1c)μ0(1c)+(1μ0)(1d)\mu_{s}(H|G;m_{2})=\frac{\mu_{0}(1-c)}{\mu_{0}(1-c)+(1-\mu_{0})(1-d)}

Denote the criterion by τ\tau (e.g. the cβδv\frac{c}{\beta\delta v} we´ve seen before)

then we need both

μ0aμ0a+(1μ0)b>τ\frac{\mu_{0}a}{\mu_{0}a+(1-\mu_{0})b}>\tau

and

μ0(1c)μ0(1c)+(1μ0)(1d)>τ\frac{\mu_{0}(1-c)}{\mu_{0}(1-c)+(1-\mu_{0})(1-d)}>\tau

The set of feasible vectors of posteriors

When will a vector of posteriors implementable? In this example, two posteriors are needed, one for the good signal realization and another for the bad realization: They should be higher than the threshould determined by cc,vv,β\beta, and δ\delta. In other words, we need a vector of posteriors with each element sufficiently high. A similar criterion is also prevalent in the Bayesian persuasion literature, for example, a posterior higher than 12\frac{1}{2} might be needed in a binary setting, like the classical prosecutor juror example, and the martingale property (that their average equals the prior) requires another posterior to be sufficiently low. But as I mentioned before, the martinagel property is inapplicable here because a different realizations might prompt the receiver to adopt different models, using their respective likelihood as the weights by no means guarantee that the posteriors average back to the prior. Then, how should we examine whether a vector of posteriors feasible or not? Aina´s Thm 1 gives a harmonic mean criterion, characterizing the set of feasible vectors of posteriors.

My vector of posterios is (μs1(H),μs1(H))(\mu_{s1}(H),\mu_{s1}(H)) and my piror is μ0(H)\mu_{0}(H). Take μs1(H)\mu_{s1}(H), then the movements of this posterior is (μs1(H)μ0(H),1μs1(H)1μ0(H))(\frac{\mu_{s1}(H)}{\mu_{0}(H)},\frac{1-\mu_{s1}(H)}{1-\mu_{0}(H)}) where the first coordinate relates to state HH and the second state LL. So the maximal movement is max{μs1(H)μ0(H),1μs1(H)1μ0(H)}\max\{\frac{\mu_{s1}(H)}{\mu_{0}(H)},\frac{1-\mu_{s1}(H)}{1-\mu_{0}(H)}\}. We denote it by MaxMove(μs1)\text{MaxMove}(\mu_{s1}), meaning that it is the maximal movement (where the maximal is taken across states) of posterior μs1\mu_{s1}. Similarly, we can get a MaxMove(μs2)\text{MaxMove}(\mu_{s2}). Then we take the harmonic mean of MaxMove(μs1)\text{MaxMove}(\mu_{s1}) and MaxMove(μs2)\text{MaxMove}(\mu_{s2}) to verify whether the harmonic mean is smaller or equal to the number of possible signal realizations (hence the maximal number of different stories that might be adopted), here 22 (Good or Bad).

It just happens by my number assignemnt here that the two posteriors are the same: 0.10140.1014, given that my prior is 0.070.07, the maximal movement is the higher number between 0.10140.07\frac{0.1014}{0.07} and 10.101410.07\frac{1-0.1014}{1-0.07}, which is approximately 1.451.45, and then the harmonic mean of 1.451.45 and 1.451.45 is still 1.451.45, being smaller than 22, which ensures that (0.1014,0.1014)(0.1014,0.1014) is feasible.

What if the time t=1t=1 me still remembers the true model?

If the time t=0t=0 me anticiplates that time t=1t=1 me might still remember the true model, as long as what the time t=1t=1 me remembers is juts that there is an additional model with π(GH)=π(BL)=0.75\pi(G\mid H)=\pi(B\mid L)=0.75 but the time t=1t=1 me cannot aasure that this is THE TRUE MODEL, then this Kotoamatsukami trick still works but the time t=0t=0 me has to adjust the two models m1m_{1} and m2m_{2} to meet the additional constraint: the fit of the model adopted given the target signal realization, has to be higher also than the fit of the true model. Here, the fit of the true model is 0.070.75+0.930.250.07*0.75+0.93*0.25 for the Good signal realization and 0.070.25+0.930.750.07*0.25+0.93*0.75 for the Bad signal realization, so at least m1m_{1} has to be adjusted (and perhaps or perhaps not also m2m_{2}) for it to have a higher fit than
0.070.25+0.930.750.07*0.25+0.93*0.75, since m1m_{1} is the model targeting the Bad signal realization, i.e. the time t=0t=0 me wish the time t=1t=1 me, after observing the Bad signal realization, to adopt model m=1m=1. This adjustment can be fulfilled, for example, by increasing πm1(BL)\pi_{m_{1}}(B∣L) and accordingly reducing πm1(GL)\pi_{m_{1}}(G∣L).

However, if what the time t=1t=1 me remember is not only that there is an additional model, beyond m1m_{1} and m2m_{2}, with those particular likelihoods, but also that this model is THE TRUE MODEL, then additional justifications has to be made (or additional naivety of me has to be imposed) to make sure that the time t=1t=1 me will still adopt the model with the highest fit, even it might not be the true model.

– current version: Julio 30.