To open type sqlite3 in the terminal.

If you already have a db file, you can type sqlite3 mydatabase.db or .open mydatabase.db after typing the sqlite3 command.

Create a table with

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT UNIQUE
);

INSERT INTO users (name, email) VALUES ('John Doe', 'john@example.com');

Remember that a database can have multiple tables.

Verify your database with commands like:

.tables            -- List all tables
.schema users      -- Show the schema for the users table
SELECT * FROM users;  -- Query your data

Use .exit or .quit to properly exit the SQLite shell when you're done.

 

Load a csv into a database

  1. First, create the clean_quiz_text table with two text columns:
CREATE TABLE clean_quiz_text (
    quiz_id TEXT,
    clean_quiz_text TEXT
);
  1. Then import your CSV file into this table:
.mode csv
.import csvs/cleanQuizText_feb_2025_clean_2L_v2.csv clean_quiz_text

If your CSV has headers (column names in the first row), you should use:

.mode csv
.headers on
.import csvs/cleanQuizText_feb_2025_clean_2L_v2.csv clean_quiz_text
  1. You can verify the import was successful by checking the first few rows:
SELECT * FROM clean_quiz_text LIMIT 5;

If you encounter any path-related issues, you might need to use the full path to your CSV file. Also, if you're not already in the directory that contains the "csvs" folder, you'll need to adjust the path accordingly.

 

Python script to connect to the database

Usng this code , you can expose a server on top of this database.

import sqlite3
import pandas as pd

def connect_to_database(db_path):
    """
    Connect to the SQLite database and return a connection object.
    
    Args:
        db_path (str): Path to the SQLite database file
        
    Returns:
        sqlite3.Connection: A connection object to the database
    """
    try:
        # Connect to the database
        conn = sqlite3.connect(db_path)
        print(f"Successfully connected to database: {db_path}")
        return conn
    except sqlite3.Error as e:
        print(f"Error connecting to database: {e}")
        return None

def query_data(conn, query):
    """
    Execute a query and return the results.
    
    Args:
        conn (sqlite3.Connection): Connection to the database
        query (str): SQL query to execute
        
    Returns:
        pandas.DataFrame: Results of the query
    """
    try:
        # Execute query and return results as a DataFrame
        df = pd.read_sql_query(query, conn)
        return df
    except sqlite3.Error as e:
        print(f"Error executing query: {e}")
        return None

def main():
    # Path to your database file
    db_path = "mydatabase.db"  # Replace with your actual database path
    
    # Connect to the database
    conn = connect_to_database(db_path)
    if conn is None:
        return
    
    # Example: Query the first 5 rows from the clean_quiz_text table
    query = "SELECT * FROM clean_quiz_text LIMIT 5"
    results = query_data(conn, query)
    
    if results is not None:
        print("\nSample data from clean_quiz_text table:")
        print(results)
    
    # Close the connection
    conn.close()
    print("\nDatabase connection closed.")

if __name__ == "__main__":
    main()

Primary Key vs. Index in SQLite

Primary Key

A primary key is a special kind of index with these characteristics:

  1. Uniqueness: Forces all values in the column to be unique
  1. NOT NULL: Cannot contain NULL values
  1. Single Identity: Only one primary key allowed per table
  1. Automatic Indexing: Creates an index automatically
  1. Row Identity: Serves as the unique identifier for each row

Index

An index is a data structure that improves query performance:

  1. No Constraints: Doesn't enforce uniqueness or NOT NULL
  1. Multiple Allowed: Can create multiple indexes on different columns
  1. Optional: Not required for table functionality

Performance Impact

Primary Key Performance Benefits:

  • Extremely fast lookups when querying by the primary key (O(log n))
  • Efficient for JOIN operations when joining on primary key columns
  • Automatic enforcement of data integrity (no duplicates)

Index Performance Benefits:

  • Faster SELECT queries when filtering or sorting by indexed columns
  • Improved JOIN performance on indexed columns
  • Enhanced performance for ORDER BY and GROUP BY operations

Performance Costs (Both):

  • Slower INSERT, UPDATE, DELETE operations (index maintenance overhead)
  • Additional disk space requirements
  • Index maintenance during write operations

When to Use What:

  • Primary Key: When you need a unique identifier for rows with enforced constraints
  • Index: When you frequently query/filter/sort by specific columns but don't need uniqueness

For your specific case, if quiz_id values are unique and identify each quiz, a primary key makes sense. If they might contain duplicates but you still want faster lookups, a regular index is more appropriate.

Method 1: Add an index to the existing table

If you've already created your table and loaded your data, you can add an index with:

CREATE INDEX idx_quiz_id ON clean_quiz_text(quiz_id);

This SQL command creates an index named "idx_quiz_id" on the "quiz_id" column of your "clean_quiz_text" table. This will improve query performance when searching or filtering by quiz_id.

The command CREATE INDEX idx_quiz_id ON clean_quiz_text(quiz_id); does not create a new column or a new table.

Instead, it creates a separate database object called an index. This index is associated with the existing clean_quiz_text table but exists as its own structure in the database. It's essentially a lookup table that helps SQLite find rows quickly when you query by the quiz_id column.

When you create an index:

  1. SQLite builds a data structure (typically a B-tree) that contains the indexed column values and pointers to the actual table rows
  1. This structure is stored separately from your table data
  1. The original table remains unchanged - no new columns are added to it
  1. The index is automatically maintained by SQLite whenever you insert, update, or delete rows in the table

You can think of it like the index at the back of a book - it doesn't change the content of the book (your table data), but it gives you a faster way to find specific information.

The index is invisible to your SQL queries - you'll still write queries exactly the same way, but SQLite will automatically use the index when appropriate to speed up your queries.

Create a primary key in the table

-- Create a new table with PRIMARY KEY
CREATE TABLE clean_quiz_text_new (
    quiz_id TEXT PRIMARY KEY,
    clean_quiz_text TEXT
);

-- Copy data from old table to new table
INSERT INTO clean_quiz_text_new SELECT * FROM clean_quiz_text;

-- Drop the old table
DROP TABLE clean_quiz_text;

-- Rename the new table to the original name
ALTER TABLE clean_quiz_text_new RENAME TO clean_quiz_text;
 

Does your table reside in the RAM

SQLite can operate in both in-memory and disk-based modes, depending on how you configure it:

Disk-based Operation (Default)

When you create a database with a filename (like sqlite3 mydatabase.db), SQLite operates in disk-based mode:

  • The database is stored as a file on your disk
  • Data persists between sessions
  • Subject to disk I/O speed limitations
  • Can handle databases larger than available RAM

In-memory Operation

When you create a database without specifying a file or using special syntax:

  1. Temporary in-memory database:
    1. sqlite3 :memory:
      

      Or when you just type sqlite3 without parameters (as in your first message), you get a "transient in-memory database"

  1. Characteristics:
      • Database exists entirely in RAM
      • Much faster operations (no disk I/O)
      • Data is lost when the connection is closed
      • Limited by available RAM

From your first message:

Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

This indicates you were initially connected to an in-memory database. However, when you created your clean_quiz_text table and imported data from a CSV file, you would have needed to switch to a disk-based database (using .open FILENAME) for that data to persist.

If you want to keep your data between sessions, make sure you're using a disk-based database by specifying a filename when opening SQLite.