-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathinstructions.txt
41 lines (33 loc) · 2.71 KB
/
instructions.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Let's make clear the goal of this research.
The present NN-based poem generation is mostly based on semantic embedding. The most advantage of
this approach is that it can generate fluent sentences as only the most frequent event will be
well represented by the model. A key problem, however, is that the generation is from
left-to-right, so a perturbation in the generation process may lead to fail of the entire generation.
This perturbation is not scarce when the music rule (rhythm) is applied.
We therefore hope to do something like 'polishing'. There are several ways to do so:
1. Someone has proposed a re-iterative NN generation. I think this is fine, but it can not really solve
the essential problem, as each generation is still one-directional, and the rule is always applied,
leading to odd generation. This approach, so, can not provide guarantee of a good generation.
2. We can also design a 'style variable' and 'rhythm variable' for the model, to let the model knows
which style and rhythm we are generating. This will let the generation mostly rule-complied,
leading to less possibility of rule-forced breakdown. However, we need more data for
each style. In particular, some rhythm has little training data. Jiyuan did something towards this
direction, but I'm not sure if he did correctly.
3. We can use BERT model to find incorrect words. More interestingly,
BERT is trained with 'corrupted neighbors', leading to good polishing even if the
first generation is noisy. However, no rules is integrated into a full probabilistic framework, and the
polishing is not globally optimal.
4. We can use the maximum entropy (ME) model to do the above character-by-character polishing. This
model can put the rules as features, so can take into account both semantic and rules into a full
probabilistic framework. A BERT + ME is also fine. A key advantage is that we can re-sample many
rule-complied poems; if the sampling is infinite, it will generate the entire poem space.
5. Another way is the AE/VAE polishing. Treat the poem as a picture, and using the de-nosing model
as in image processing to find the polished version. Noisy training is fine. This is like a global
BERT.
Again, rules are not easy to be added, it is fully data driven.
6. So we want a framework (1) can take into account both semantics and music rules; (2) take do
global inference. A proposal is that we can use NN (or other LM model, but NN is more easier to
involve style info) for semantic embedding, while rules (related to music) as knowledge (features
designed). Finally, these features are used in a chain CRF framework. Note we probably do not
need a 2D CRF. The only concern is that if 3k words can be learned and inerred efficiently by
CRF.