http://dirko.github.io/Bidirectional-LSTMs-with-Keras/ http://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/ https://www.reddit.com/r/MachineLearning/comments/3upodo/using_keras_lstm_rnn_for_variable_length_sequence/ model = Sequential() model.add(Masking(0, input_shape=(260, 100))) model.add(LSTM(input_dim=100, output_dim=128, return_sequences=True)) model.add(TimeDistributedDense(input_dim=128, output_dim=4)) model.add(Activation('time_distributed_softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam') model.fit(x_train, y_train, nb_epoch=40, batch_size=16) score = model.evaluate(x_test, y_test, batch_size=16, show_accuracy=True) batch_size: integer. Number of samples per gradient update. Variable Length RNN: I see there was an issue filed last year about this. The author recommends zero-padding or batches of size 1: https://github.com/fchollet/keras/issues/40 1. Zero-padding X = keras.preprocessing.sequence.pad_sequences(sequences, maxlen=100) model.fit(X, y, batch_size=32, nb_epoch=10) 2. Batches of size 1 for seq, label in zip(sequences, y): model.train(np.array([seq]), [label])