http://dirko.github.io/Bidirectional-LSTMs-with-Keras/
http://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/
https://www.reddit.com/r/MachineLearning/comments/3upodo/using_keras_lstm_rnn_for_variable_length_sequence/

model = Sequential()
model.add(Masking(0, input_shape=(260, 100)))
model.add(LSTM(input_dim=100, output_dim=128, return_sequences=True))
model.add(TimeDistributedDense(input_dim=128, output_dim=4))
model.add(Activation('time_distributed_softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

model.fit(x_train, y_train, nb_epoch=40, batch_size=16)
score = model.evaluate(x_test, y_test, batch_size=16, show_accuracy=True)



batch_size: integer. Number of samples per gradient update.



Variable Length RNN:

I see there was an issue filed last year about this. The author recommends zero-padding or batches of size 1:

https://github.com/fchollet/keras/issues/40



1. Zero-padding

X = keras.preprocessing.sequence.pad_sequences(sequences, maxlen=100)
model.fit(X, y, batch_size=32, nb_epoch=10)


2. Batches of size 1

for seq, label in zip(sequences, y):
   model.train(np.array([seq]), [label])