What is a chatbot?
A chatbot, or chatterbot, is a computer program aiming at simulating a written conversation with a human user.
Why making one?
Well, first… because it’s fun! Since Alan Turing, chatbot programming has been a way to test computer’s ability to pretend like they are human (see Turing test).
Also, chatbots can be have very useful applications, such as helping users on a website, teaching a language, etc.
The question this program will answer is: given a user input sentence, which output should we produce?
For this simple example, we are not going to try to extract the meaning of the sentences written by the user. That would be a lot of work, and it is not needed for what we want to achieve.
The program will have two distinct parts:
- Learning – when the user types a message, it is understood as an answer to previous statement made by the chatbot. The sentence typed by the human will then be associated with the words present in the previous message.
- Answering – the human message is decomposed in words. The program will try to identify which sentences correspond best to those words, according to its previous “experience”.
import re import sqlite3 from collections import Counter from string import punctuation from math import sqrt # initialize the connection to the database connection = sqlite3.connect('chatbot.sqlite') cursor = connection.cursor() # create the tables needed by the program create_table_request_list = [ 'CREATE TABLE words(word TEXT UNIQUE)', 'CREATE TABLE sentences(sentence TEXT UNIQUE, used INT NOT NULL DEFAULT 0)', 'CREATE TABLE associations (word_id INT NOT NULL, sentence_id INT NOT NULL, weight REAL NOT NULL)', ] for create_table_request in create_table_request_list: try: cursor.execute(create_table_request) except: pass def get_id(entityName, text): """Retrieve an entity's unique ID from the database, given its associated text. If the row is not already present, it is inserted. The entity can either be a sentence or a word.""" tableName = entityName + 's' columnName = entityName cursor.execute('SELECT rowid FROM ' + tableName + ' WHERE ' + columnName + ' = ?', (text,)) row = cursor.fetchone() if row: return row else: cursor.execute('INSERT INTO ' + tableName + ' (' + columnName + ') VALUES (?)', (text,)) return cursor.lastrowid def get_words(text): """Retrieve the words present in a given string of text. The return value is a list of tuples where the first member is a lowercase word, and the second member the number of time it is present in the text.""" wordsRegexpString = '(?:\w+|[' + re.escape(punctuation) + ']+)' wordsRegexp = re.compile(wordsRegexpString) wordsList = wordsRegexp.findall(text.lower()) return Counter(wordsList).items() B = 'Hello!' while True: # output bot's message print('B: ' + B) # ask for user input; if blank line, exit the loop H = raw_input('H: ').strip() if H == '': break # store the association between the bot's message words and the user's response words = get_words(B) words_length = sum([n * len(word) for word, n in words]) sentence_id = get_id('sentence', H) for word, n in words: word_id = get_id('word', word) weight = sqrt(n / float(words_length)) cursor.execute('INSERT INTO associations VALUES (?, ?, ?)', (word_id, sentence_id, weight)) connection.commit() # retrieve the most likely answer from the database cursor.execute('CREATE TEMPORARY TABLE results(sentence_id INT, sentence TEXT, weight REAL)') words = get_words(H) words_length = sum([n * len(word) for word, n in words]) for word, n in words: weight = sqrt(n / float(words_length)) cursor.execute('INSERT INTO results SELECT associations.sentence_id, sentences.sentence, ?*associations.weight/(4+sentences.used) FROM words INNER JOIN associations ON associations.word_id=words.rowid INNER JOIN sentences ON sentences.rowid=associations.sentence_id WHERE words.word=?', (weight, word,)) # if matches were found, give the best one cursor.execute('SELECT sentence_id, sentence, SUM(weight) AS sum_weight FROM results GROUP BY sentence_id ORDER BY sum_weight DESC LIMIT 1') row = cursor.fetchone() cursor.execute('DROP TABLE results') # otherwise, just randomly pick one of the least used sentences if row is None: cursor.execute('SELECT rowid, sentence FROM sentences WHERE used = (SELECT MIN(used) FROM sentences) ORDER BY RANDOM() LIMIT 1') row = cursor.fetchone() # tell the database the sentence has been used once more, and prepare the sentence B = row cursor.execute('UPDATE sentences SET used=used+1 WHERE rowid=?', (row,))
Starting with a “naïve” bot (i.e., on an empty database), here is the first “conversation” I had with M. Chatterbot:
B: Hello! H: Hi! B: Hi! H: How are you? B: How are you? H: Very well, thank you. B: Very well, thank you. H: Do you like music? B: Very well, thank you. H: Good. B: Do you like music? H: Yes, a lot! B: How are you? H: In a very good mood. B: Good. H: Indeed. B: Indeed. H: What is the weather like? B: Yes, a lot! H: Okay... B: What is the weather like? H: Quite rainy today. B: Indeed. H: I have to go. Goodbye! B: How are you? H: In a good mood. B: Indeed. H: See you later! B: Okay... H: Bye, I'll talk to you later :) B: Do you like music? H: I like it. B: I have to go. Goodbye! H: Bye! B: How are you? H:
Okay, this is not very exciting. But, right after this short phase of learning, here is what we can get:
B: Hello! H: Hi! B: How are you? H: Very well, thank you. How are you? B: In a very good mood. H: Glad to hear that. B: Bye! H: Goodbye! B: Hi! H:
B: Hello! H: Hi! How are you? B: In a good mood. H: Great! How do like chocolate? B: I like it. H: Really? Me too! What is the weather like at your place? B: Quite rainy today. H: Oh. B: See you later! H: Bye! B: Goodbye! H: