The Self-Hacking Manual: Casting Kotoamatsukami on Oneself

Disclaimer: The quit‑smoking example is used solely to illustrate the methodology and does not constitute medical advice.

Because I am not Uchiha Shisui, I cannot cast the authentic Kotoamatsukami, i.e. direct belief manipulation, on myself. However, thanks to the rise of Kagaku Ningu (Scientific Ninja Tools), namely the research of Schwartzstein & Sunderam, “Using Models to Persuade” (AER 2021), and Aina’s tailored‑stories framework, I can at least mimic the effect of Kotoamatsukami.

Kotoamatsukami

In this post, I make an example by adapting from 4.2 in Aina (2025) tailored stories.

Settings

Suppose I am a smoking addict attempting to quit. When I make this decision, I know that in the next period I am very likely to lose willpower and give in by smoking another cigarette. My prior belief that I have strong willpower with respect to abstaining from smoking is low, say, $0.07$ .

Formally, let the state space to be $\Omega=\{H,L\}$ , where $H$ means I have strong willpower to abstain, and $L$ means I am weak, with $\mu_{0}(H)=0.07$ .

The sender is the $t=0$ self, who decides to quit; the receiver is the $t=1$ self, who must implement the unpleasant task of smoking cessation. There is also a future self at $t=2$ who enjoys the health benefits of quitting without suffering, whereas the $t=1$ self endures the temptation without yet enjoying a healthier body (or mind).

The set of signal realisations is $S=\{G,B\}$ . A good signal $G$ could be less coughing, feeling more comfortable during my nightly run, or simply feeling less tempted to smoke. A bad signal $B$ could be severe cravings, worse coughing, and so on, i.e. the opposites of the examples in $G$ .

All of these examples are observable, consistent with the notion of a signal realisation.

The true signal‑generation process is $\pi_{\text{true}}(G\mid H)=\pi_{\text{true}}(B\mid L)=0.75$ . Even conditional on $H$ , I might occasionally still be short of breath while running (e.g. because recovery takes time or air quality is poor). Likewise, conditional on $L$ , my lifestyle might improve by chance, leading to unexpectedly smooth recovery, a mere ¨regression¨ to the mean. Thus the process remains stochastic.

A modelling complication arises: the $t=0$ self knows the true process, whereas the $t=1$ self does not (the $t=2$ self is inactive and merely benefits). I gloss over the epistemic justification, call it the privilege of age and a brain deteriorated by years of smoking.

(As you might already noticed, the $t=1$ me, or a receiver in this framework generally, is NOT fully (Bayesian) rational: the receiver not only unaware of the true model, but doesn´t even have a prior over possible models; the receiver used only a heurestic rule to choose which model to adopt,…)

Aina refers to $G$ or $B$ as a “signal”; I call each an individual signal realisation. The difference is terminological only.

The action space is $A=\{SM,NSM\}$ , meaning “smoke” or “not smoke”.

Although I have said the lucky inactive $t=2$ self enjoys the benefits, these are ex‑post gains worth, say, $v=120$ . From an expected‑utility perspective, both $t=0$ and $t=1$ selves value the discounted benefits and costs. The cost of suffering at $t=1$ is $c=5$ . I exhibit present bias (quasi-hyperbolic discounting) (Laibson 1997, eq. (3), pp. 442–443). Specifically, the $t=0$ discount‑weight vector is $(1,\beta\delta,\beta\delta^{2})$ , and the $t=1$ vector is $(1,1,\beta\delta)$ (or $(1,\beta\delta)$ after dropping the bygone first coordinate).

Interpretation: From the $t=0$ viewpoint, current utility is weighted by 1; because $\delta<1$ , the cost $c$ (weight $\beta\delta$ ) is valued less than the benefit $v$ (weight $\beta\delta^{2}$ ).

Similarly, from the anticipated $t=1$ perspective, the suffering cost has weight 1, whereas the future benefit has weight $\beta\delta$ . Therefore the $t=0$ self expects the $t=1$ self to feel the pain more acutely than the benefit and perhaps relapse.

We leave the fortunate $t=2$ self aside; he never decides.

From the $t=0$ me perspective, taking $a=SM$ rather than $a=NSM$ earn me the icrement

$U_{0}(SM)-U_{0}(NSM)=-\beta\delta c+\mu_{0}(H)\beta\delta^{2} v$

so the $t=0$ me will choose to quit if $U_{0}(SM)-U_{0}(NSM)>0$ , and that´s why the threshold look likes this $\bar{\mu_{0}(H)}:=\frac{c}{\delta v}$

suppose $\delta=0.97$ and $\beta=0.7$ .
Then the threshold of the time $t=0$ me is

$\frac{c}{\delta v}=\frac{5}{0.97\times 120}\approx0.043$

From the $t=1$ me perspective, taking $a=SM$ rather than $a=NSM$ earn me the icrement

$U_{1}(SM)-U_{1}(NSM)=-c+\mu_{1}(H)\beta\delta v$

so the $t=1$ me will actually implement quitting if $U_{1}(SM)-U_{1}(NSM)>0$ , and that´s why the threshold look likes this $\bar{\mu_{1}(H)}:=\frac{c}{\beta\delta v}$

so for the $t=1$ me the threshold is

$\frac{c}{\beta\delta v}=\frac{5}{0.7\times 0.97\times 120}\approx0.062$

Reading the numbers, the latter threshold is harder to overcome but it is the one that matters, because it related to the poor boy that actually implement it, the $t=1$ me.

Tailored Stories: the Kagaku Ningu version Kotoamatsukami

This part is the reason why I name this post the way as it is, for those who are not in the vibe of Naruto thing, you might choose to focus on the methodology itself. I call this trick “Kotoamatsukami on myself” in the line of what Aina’s tailored models.

In time 0, I prepare two narratives/models for later me-s (me in plural). Why just ¨two¨ models? you can check xxx in Aina if interested.

$m_{1}: \pi_{m_{1}}(B∣H)=0.9, \pi_{m_{1}}(G∣H)=0.1；\pi_{m_{1}}(B∣L)=0.6, \pi_{m_{1}}(G∣L)=0.4$

$m_{2}: \pi_{m_{2}}(G∣H)=0.9, \pi_{m_{2}}(B∣H)=0.1；\pi_{m_{2}}(G∣L)=0.6, \pi_{m_{2}}(B∣L)=0.4$

so, if the time $t=1$ observe the signal realization $G$ , the fit of model $m_{1}$ will be

$\text{Pr}_{m_{1}}(G)=\mu_{0}(H)\pi_{m_{1}}(G|H)+\mu_{0}(L)\pi_{m_{1}}(G|L)$

while that of model $m_{2}$ will be

$\text{Pr}_{m_{2}}(G)=\mu_{0}(H)\pi_{m_{2}}(G|H)+\mu_{0}(L)\pi_{m_{2}}(G|L)$

I omit the algebra of $\text{Pr}_{m_{1}}(B)$ and $\text{Pr}_{m_{2}}(B)$ .

Notice that the fit of model $m_{i}$ after observing a signal realization is the marginal of that signal realization using the likelihood from model $m_{i}$ , which is generally from the ¨objective¨ probability of that signal realization, because the latter is the marginal under the true likelihood, i.e. the signal-realization generating process model $m_{\text{true}}$ .

If the time $t=1$ adopt model $i\in\{m_{1},m_{2}\}$ after obserbing signal realization $s\in\{G,B\}$ (which happens when the fit of model $i$ is higher, i.e. $\text{Pr}_{m_{i}}(s)>\text{Pr}_{m_{3-i}}(s)$ )

then, the Bayesian updating comes, and the belief (the posterior after observing the signal realization) becomes

$\mu_{s}(H|G)=\frac{\mu_{0}(H)\pi_{m_{i}}(G|H)}{\text{Pr}_{m_{i}}(G)}$

$\mu_{s}(H|B)$ is derived accordingly

according to Aina:

if $\mu_{0}(H)>\frac{c}{\delta v}=0.043$ , then the time $t=0$ me would like to quit smoking;

(With the choice of $\mu_{0}(H)=0.07$ , we know that $0.07>0.043$ . So okay, at least the time $t=0$ want to quit smoking.)

if $\mu_{s}(H|G)>\frac{c}{\beta\delta v}=0.062$ , then the time $t=1$ me would like to implement it after observing a good signal realization;

if $\mu_{s}(H|B)>\frac{c}{\beta\delta v}=0.062$ , then the time $t=1$ me would like to implement it after observing a bad signal realization.

The essence of this tailored stories thing is: by presenting to the $t=1$ self two alternatives models $m_{1}$ and $m_{2}$ (maintaining the assumption that the $t=0$ me know the $m_{\text{true}}$ while the $t=1$ me not) before any signal is actually realized, the time $t=1$ will choose the model with the highest fit conditional on the $t=1$ me´s observation of the signal realization (while by then the time $t=0$ is only a history who have no idea what is the actual signal realization, and this interpretation that the past me is a history without the possibility of time travel exactly gives the $t=0$ me´s presentation of models the necessary commitment power: I will no longer exist by then, so it is impossible for me to distort anything)

Now let´s substitute the $\mu_{s}(H|G)$ and $\mu_{s}(H|B)$ out using the Bayesian updating but without being definite about which model the time $t=1$ will actually choose, i.e. leave the $i$ in $m_{i}$ there.

if $\mu_{0}(H)>\frac{c}{\beta\delta v}\frac{\text{Pr}{m_{i}}(G)}{\pi_{m_{i}}(G|H)}$ , then the time $t=1$ me would like to implement it after observing a good signal realization;

if $\mu_{0}(H)>\frac{c}{\beta\delta v}\frac{\text{Pr}{m_{i}}(B)}{\pi_{m_{i}}(B|H)}$ , then the time $t=1$ me would like to implement it after observing a bad signal realization;

Now let´s turn to the model selection problem of the $t=1$ self.

After observing a good signal realization,

$\text{Pr}_{m_{1}}(G)=\mu_{0}(H)\pi_{m_{1}}(G|H)+\mu_{0}(L)\pi_{m_{1}}(G|L)=0.07\times 0.1+0.93\times 0.40=0.379$ ;

$\text{Pr}_{m_{2}}(G)=\mu_{0}(H)\pi_{m_{2}}(G|H)+\mu_{0}(L)\pi_{m_{2}}(G|L)=0.07\times 0.9+0.93\times 0.60=0.621$

Since $\text{Pr}_{m_{2}}(G)>\text{Pr}_{m_{1}}(G)$ , the $t=1$ me will choose model $m_{2}$ and get $\mu_{s}(H|G)=\frac{0.07\times 0.9}{0.621}=0.1014$ after observing a good signal realization & choosing to adopt the model with the highest fit, here, model $m_{2}$ ;

similarly, the $t=1$ me will choose model $m_{1}$ and get the posterior $\mu_{s}(H|B)=\frac{0.07\times 0.9}{0.621}=0.1014$ after observing a bad signal realization & choose to adopt model $m_{1}$ (of course, we can predict it by my value assigned symmetrically to the two models.)

Note that here the posterior belief $\mu_{s}(H|s)$ after observing either signal realization, $s\in\{G,B\}$ is always higher than the prior $\mu_{0}(H)$ , as-if contradicting Bayes plausibility/the martingale property/the splitting lemma that we typically encountered in the Bayesian persuasion literature. While, in fact, it is inapplicable.

This is because a different signalization might prompt the receiver to adopt a different model. In constrast, in BP, it is always the same model (those $\{\pi(s|\omega)\}_{s\in S, \omega\in\Omega}$ chosen by the sender, alongwith sender´s choice of $S$ ) being used, and in terms of signal-realization generating process, it is always the true model in BP being used to get the marginal of realization: fit.

Here, a different feasibility criterion is used, known as feasible vector of posteriors, which I´ll leave aside for now.

upon observing different signal realization, the receiver might (and the $t=1$ me indeed) adopt different models, which is different from the Bayesian persuasion framework where a single information structure is used.

okay, now no matter which signal realization the $t=1$ me will observe, good or bad, the “narcissistic-kind” of posterior that “I’m strong will on quit smoking” is always higher than the criterion $0.062$ . Great! it works! now we can assure the $t=1$ me will actually implement the smoking quit!

Notice that, if the prior is so low, such that even the lower threshold $\mu_{0}(H)>\frac{c}{\beta v}$ cannot be overcome, then even the $t=0$ me wouldn’t ever think about quit smoking, so the Kotoamatsukami wouldn’t be initiated; if the prior is so high, such that the higher threshold $\mu_{0}(H)>\frac{c}{\beta\delta v}$ is already being passed by the prior, then Kotoamatsukami wouldn’t be needed and the the quit smoking problem becomes trivial. So it is exactly when the prior is between these two threshold the problem will be interesting: without the Kotoamatsukami, the quit of smoking will fail but with the Kotoamatsukami it will succeed.

Appendix

Reverse-engineering $m_{1}$ and $m_{2}$

given $\mu_{0}:=\mu_{0}(H)$

$\begin{aligned} m_{1}:\\ \pi_{m_{1}}(B∣H)=a, \\ \pi_{m_{1}}(G∣H)=1-a；\\ \pi_{m_{1}}(B∣L)=b, \\ \pi_{m_{1}}(G∣L)=1-b \end{aligned}$

$\begin{aligned} m_{2}: \\ \pi_{m_{2}}(G∣H)=1-c, \\ \pi_{m_{2}}(B∣H)=c；\\ \pi_{m_{2}}(G∣L)=1-d, \\ \pi_{m_{2}}(B∣L)=d \end{aligned}$

what we need is to let the receiver
voluntarily
to
choose $m_{1}$ after observing signal realization $B$ and $m_{2}$ after observing signal realization $G$ .

The first requires that

$\text{fit}(m_{1})(B)=\text{Prob}_{\text{marginal}}(B;m_{1})>\text{Prob}_{\text{marginal}}(B;m_{2})=\text{fit}(m_{2})(B)$

which is

$\mu_{0}a+(1-\mu_{0})b>\mu_{0}c+(1-\mu_{0})d$

i.e.

$b-d>(c-a+b-d)\mu_{0}$

similartly

$\text{fit}(m_{2})(G)=\text{Prob}_{\text{marginal}}(G;m_{2})>\text{Prob}_{\text{marginal}}(G;m_{1})=\text{fit}(m_{1})(G)$

Then Bayesian updating gives

$\mu_{s}(H|B;m_{1})=\frac{\mu_{0}a}{\mu_{0}a+(1-\mu_{0})b}$

and

$\mu_{s}(H|G;m_{2})=\frac{\mu_{0}(1-c)}{\mu_{0}(1-c)+(1-\mu_{0})(1-d)}$

Denote the criterion by $\tau$ (e.g. the $\frac{c}{\beta\delta v}$ we´ve seen before)

then we need both

$\frac{\mu_{0}a}{\mu_{0}a+(1-\mu_{0})b}>\tau$

and

$\frac{\mu_{0}(1-c)}{\mu_{0}(1-c)+(1-\mu_{0})(1-d)}>\tau$

The set of feasible vectors of posteriors

When is a vector of posteriors implementable? In this example, two posteriors are required, one for the good signal realization and another for the bad realization, and each must exceed a threshold determined by $c$ , $v$ , $\beta$ , and $\delta$ . In other words, we need a vector of posteriors whose components are each sufficiently high. A similar criterion is also prevalent in the Bayesian persuasion literature, for example, a posterior higher than $\frac{1}{2}$ might be needed in a binary setting, like the classical prosecutor juror example, and the martingale property (that their average equals the prior) requires another posterior to be sufficiently low. However, as noted earlier, the martingale property is inapplicable here because different realizations may prompt the receiver to adopt different models, and weighting the posteriors by their respective likelihoods does not guarantee that they average back to the prior. How, then, should we determine whether a vector of posteriors is feasible? Aina´s Thm 1 gives a harmonic mean criterion, characterizing the set of feasible vectors of posteriors.

My vector of posterios is $(\mu_{s1}(H),\mu_{s1}(H))$ and my piror is $\mu_{0}(H)$ . Take $\mu_{s1}(H)$ , then the movements of this posterior is $(\frac{\mu_{s1}(H)}{\mu_{0}(H)},\frac{1-\mu_{s1}(H)}{1-\mu_{0}(H)})$ where the first coordinate relates to state $H$ and the second state $L$ . So the maximal movement is $\max\{\frac{\mu_{s1}(H)}{\mu_{0}(H)},\frac{1-\mu_{s1}(H)}{1-\mu_{0}(H)}\}$ . We denote it by $\text{MaxMove}(\mu_{s1})$ , meaning that it is the maximal movement (where the maximal is taken across states) of posterior $\mu_{s1}$ . Similarly, we can get a $\text{MaxMove}(\mu_{s2})$ . Then we take the harmonic mean of $\text{MaxMove}(\mu_{s1})$ and $\text{MaxMove}(\mu_{s2})$ to verify whether the harmonic mean is smaller or equal to the number of possible signal realizations (hence the maximal number of different stories that might be adopted), here $2$ (Good or Bad).

It just happens by my number assignemnt here that the two posteriors are the same: $0.1014$ , given that my prior is $0.07$ , the maximal movement is the higher number between $\frac{0.1014}{0.07}$ and $\frac{1-0.1014}{1-0.07}$ , which is approximately $1.45$ , and then the harmonic mean of $1.45$ and $1.45$ is still $1.45$ , being smaller than $2$ , which ensures that $(0.1014,0.1014)$ is feasible.

What if the time $t=1$ me still remembers the true model?

If the time $t=0$ me anticiplates that time $t=1$ me might still remember the true model, as long as what the time $t=1$ me remembers is juts that there is an additional model with $\pi(G\mid H)=\pi(B\mid L)=0.75$ but the time $t=1$ me cannot aasure that this is THE TRUE MODEL, then this Kotoamatsukami trick still works but the time $t=0$ me has to adjust the two models $m_{1}$ and $m_{2}$ to meet the additional constraint: the fit of the model adopted given the target signal realization, has to be higher also than the fit of the true model. Here, the fit of the true model is $0.07*0.75+0.93*0.25$ for the Good signal realization and $0.07*0.25+0.93*0.75$ for the Bad signal realization, so at least $m_{1}$ has to be adjusted (and perhaps or perhaps not also $m_{2}$ ) for it to have a higher fit than
$0.07*0.25+0.93*0.75$ , since $m_{1}$ is the model targeting the Bad signal realization, i.e. the time $t=0$ me wish the time $t=1$ me, after observing the Bad signal realization, to adopt model $m=1$ . This adjustment can be fulfilled, for example, by increasing $\pi_{m_{1}}(B∣L)$ and accordingly reducing $\pi_{m_{1}}(G∣L)$ .

However, if what the time $t=1$ me remember is not only that there is an additional model, beyond $m_{1}$ and $m_{2}$ , with those particular likelihoods, but also that this model is THE TRUE MODEL, then additional justification must be provided (or additional naivety on my part must be assumed) to ensure that the time $t=1$ me will still adopt the model with the highest fit, even if it might not be the true model.

– current version： Julio 30.