Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If you need advise how to train your model #5

Open
philipperemy opened this issue Jun 13, 2019 · 3 comments
Open

If you need advise how to train your model #5

philipperemy opened this issue Jun 13, 2019 · 3 comments

Comments

@philipperemy
Copy link

Ping me

@Fredrum
Copy link

Fredrum commented Feb 8, 2020

Ok! :) @philipperemy

I'm trying to replicate the web article on turning this code into a Attention demonstration.
https://wanasit.github.io/attention-based-sequence-to-sequence-in-keras.html

I think I have done the right things but when running the fit I get the error message:

ValueError: Error when checking input: expected Inp_dec to have 2 dimensions, but got array with shape (103400, 20, 65)

this is the tail end of my code:

# We are predicting the next character.
# Thus, the decode’s input is the expected output shifted by START char
training_decoder_input = np.zeros_like(training_decoder_output)
training_decoder_input[:, 1:] = training_decoder_output[:,:-1]
training_decoder_input[:, 0] = encoding.CHAR_CODE_START

training_decoder_output = np.eye(output_dict_size)[encoded_training_output.astype('int')]

# model.fit(x=[training_encoder_input, training_decoder_input], y=[training_decoder_output], …)
# training_decoder_input expects 2 dimensions, but got array with shape (103400, 20, 65)

print("Enc Shape: {}".format(training_encoder_input.shape))  # Enc Shape: (103400, 20)
print("Dec Shape: {}".format(training_decoder_input.shape))  # Dec Shape: (103400, 20, 65)

seq2seq_model.fit(
    x=[training_encoder_input, training_decoder_input],
    y=[training_decoder_output],
    validation_data=(
        [validation_encoder_input, validation_decoder_input], [validation_decoder_output]),
    verbose=2,
    batch_size=64,
    epochs=30)

Any ideas what I'm missing?

Cheers
Fred

@wanasit
Copy link
Owner

wanasit commented Feb 9, 2020

Hmm. I'd need to see how did you build the model to know why it expects a different input dimension.

Also, I wrote that article a while ago. Could you try following the updated version in Medium instead?
https://medium.com/@wanasit/english-to-katakana-with-sequence-to-sequence-in-tensorflow-a03a16ac19be

@Fredrum
Copy link

Fredrum commented Feb 9, 2020

Thanks so much for replying!

I modified the model.py file the way I understood it from the web page. Of course I'm a beginner at this so might have missed some obvious things. This is the function that assembles the model:

(I will take a look at your updated article and see if I can spot what to do)
UPDATE: I had a look at your new article but it didn't seem to feature the Attention part, which was the specific bit that I was interested in trying.

def create_model(
        input_dict_size,
        output_dict_size,
        input_length=DEFAULT_INPUT_LENGTH,
        output_length=DEFAULT_OUTPUT_LENGTH):

    encoder_input = Input(shape=(input_length,), name="Inp_enc")
    decoder_input = Input(shape=(output_length,), name="Inp_dec")

    encoder = Embedding(input_dict_size, 64, input_length=input_length, mask_zero=True)(encoder_input)
    encoder = LSTM(64, return_sequences=True)(encoder)  # WAS: False,   could use unroll=True
    encoder_last = encoder[:, -1, :]  # ATTENTION add

    decoder = Embedding(output_dict_size, 64, input_length=output_length, mask_zero=True)(decoder_input)
    decoder = LSTM(64, return_sequences=True)(decoder, initial_state=[encoder_last, encoder_last])  # WAS:  initial_state=[encoder, encoder]


    # Here comes Attention bits from:
    # https://wanasit.github.io/attention-based-sequence-to-sequence-in-keras.html

    attention = dot([decoder, encoder], axes=[2, 2])
    attention = Activation('softmax')(attention)

    context = dot([attention, encoder], axes=[2, 1])
    decoder_combined_context = concatenate([context, decoder])

    # Has another weight + tanh layer as described in equation (5) of the paper
    output = TimeDistributed(Dense(64, activation="tanh"))(decoder_combined_context)  # equation (5) of the paper
    output = TimeDistributed(Dense(output_dict_size, activation="softmax"))(output)  # equation (6) of the paper


    # Final Model
    model = Model(inputs=[encoder_input, decoder_input], outputs=[output])  # Was:  outputs=[decoder]
    model.compile(optimizer='adam', loss='binary_crossentropy')

    return model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants