To open type sqlite3
in the terminal.
If you already have a db file, you can type sqlite3 mydatabase.db
or .open mydatabase.db
after typing the sqlite3
command.
Create a table with
CREATE TABLE users (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE
);
INSERT INTO users (name, email) VALUES ('John Doe', 'john@example.com');
Remember that a database can have multiple tables.
Verify your database with commands like:
.tables -- List all tables
.schema users -- Show the schema for the users table
SELECT * FROM users; -- Query your data
Use .exit
or .quit
to properly exit the SQLite shell when you're done.
Load a csv into a database
- First, create the
clean_quiz_text
table with two text columns:
CREATE TABLE clean_quiz_text (
quiz_id TEXT,
clean_quiz_text TEXT
);
- Then import your CSV file into this table:
.mode csv
.import csvs/cleanQuizText_feb_2025_clean_2L_v2.csv clean_quiz_text
If your CSV has headers (column names in the first row), you should use:
.mode csv
.headers on
.import csvs/cleanQuizText_feb_2025_clean_2L_v2.csv clean_quiz_text
- You can verify the import was successful by checking the first few rows:
SELECT * FROM clean_quiz_text LIMIT 5;
If you encounter any path-related issues, you might need to use the full path to your CSV file. Also, if you're not already in the directory that contains the "csvs" folder, you'll need to adjust the path accordingly.
Python script to connect to the database
Usng this code , you can expose a server on top of this database.
import sqlite3
import pandas as pd
def connect_to_database(db_path):
"""
Connect to the SQLite database and return a connection object.
Args:
db_path (str): Path to the SQLite database file
Returns:
sqlite3.Connection: A connection object to the database
"""
try:
# Connect to the database
conn = sqlite3.connect(db_path)
print(f"Successfully connected to database: {db_path}")
return conn
except sqlite3.Error as e:
print(f"Error connecting to database: {e}")
return None
def query_data(conn, query):
"""
Execute a query and return the results.
Args:
conn (sqlite3.Connection): Connection to the database
query (str): SQL query to execute
Returns:
pandas.DataFrame: Results of the query
"""
try:
# Execute query and return results as a DataFrame
df = pd.read_sql_query(query, conn)
return df
except sqlite3.Error as e:
print(f"Error executing query: {e}")
return None
def main():
# Path to your database file
db_path = "mydatabase.db" # Replace with your actual database path
# Connect to the database
conn = connect_to_database(db_path)
if conn is None:
return
# Example: Query the first 5 rows from the clean_quiz_text table
query = "SELECT * FROM clean_quiz_text LIMIT 5"
results = query_data(conn, query)
if results is not None:
print("\nSample data from clean_quiz_text table:")
print(results)
# Close the connection
conn.close()
print("\nDatabase connection closed.")
if __name__ == "__main__":
main()
Primary Key vs. Index in SQLite
Primary Key
A primary key is a special kind of index with these characteristics:
- Uniqueness: Forces all values in the column to be unique
- NOT NULL: Cannot contain NULL values
- Single Identity: Only one primary key allowed per table
- Automatic Indexing: Creates an index automatically
- Row Identity: Serves as the unique identifier for each row
Index
An index is a data structure that improves query performance:
- No Constraints: Doesn't enforce uniqueness or NOT NULL
- Multiple Allowed: Can create multiple indexes on different columns
- Optional: Not required for table functionality
Performance Impact
Primary Key Performance Benefits:
- Extremely fast lookups when querying by the primary key (O(log n))
- Efficient for JOIN operations when joining on primary key columns
- Automatic enforcement of data integrity (no duplicates)
Index Performance Benefits:
- Faster SELECT queries when filtering or sorting by indexed columns
- Improved JOIN performance on indexed columns
- Enhanced performance for ORDER BY and GROUP BY operations
Performance Costs (Both):
- Slower INSERT, UPDATE, DELETE operations (index maintenance overhead)
- Additional disk space requirements
- Index maintenance during write operations
When to Use What:
- Primary Key: When you need a unique identifier for rows with enforced constraints
- Index: When you frequently query/filter/sort by specific columns but don't need uniqueness
For your specific case, if quiz_id values are unique and identify each quiz, a primary key makes sense. If they might contain duplicates but you still want faster lookups, a regular index is more appropriate.
Method 1: Add an index to the existing table
If you've already created your table and loaded your data, you can add an index with:
CREATE INDEX idx_quiz_id ON clean_quiz_text(quiz_id);
This SQL command creates an index named "idx_quiz_id" on the "quiz_id" column of your "clean_quiz_text" table. This will improve query performance when searching or filtering by quiz_id.
The command CREATE INDEX idx_quiz_id ON clean_quiz_text(quiz_id);
does not create a new column or a new table.
Instead, it creates a separate database object called an index. This index is associated with the existing clean_quiz_text
table but exists as its own structure in the database. It's essentially a lookup table that helps SQLite find rows quickly when you query by the quiz_id
column.
When you create an index:
- SQLite builds a data structure (typically a B-tree) that contains the indexed column values and pointers to the actual table rows
- This structure is stored separately from your table data
- The original table remains unchanged - no new columns are added to it
- The index is automatically maintained by SQLite whenever you insert, update, or delete rows in the table
You can think of it like the index at the back of a book - it doesn't change the content of the book (your table data), but it gives you a faster way to find specific information.
The index is invisible to your SQL queries - you'll still write queries exactly the same way, but SQLite will automatically use the index when appropriate to speed up your queries.
Create a primary key in the table
-- Create a new table with PRIMARY KEY
CREATE TABLE clean_quiz_text_new (
quiz_id TEXT PRIMARY KEY,
clean_quiz_text TEXT
);
-- Copy data from old table to new table
INSERT INTO clean_quiz_text_new SELECT * FROM clean_quiz_text;
-- Drop the old table
DROP TABLE clean_quiz_text;
-- Rename the new table to the original name
ALTER TABLE clean_quiz_text_new RENAME TO clean_quiz_text;
Does your table reside in the RAM
SQLite can operate in both in-memory and disk-based modes, depending on how you configure it:
Disk-based Operation (Default)
When you create a database with a filename (like sqlite3 mydatabase.db
), SQLite operates in disk-based mode:
- The database is stored as a file on your disk
- Data persists between sessions
- Subject to disk I/O speed limitations
- Can handle databases larger than available RAM
In-memory Operation
When you create a database without specifying a file or using special syntax:
- Temporary in-memory database:
sqlite3 :memory:
Or when you just type sqlite3
without parameters (as in your first message), you get a "transient in-memory database"
- Characteristics:
- Database exists entirely in RAM
- Much faster operations (no disk I/O)
- Data is lost when the connection is closed
- Limited by available RAM
From your first message:
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
This indicates you were initially connected to an in-memory database. However, when you created your clean_quiz_text
table and imported data from a CSV file, you would have needed to switch to a disk-based database (using .open FILENAME
) for that data to persist.
If you want to keep your data between sessions, make sure you're using a disk-based database by specifying a filename when opening SQLite.