Danialebrat's picture
Adding Learning path page and improving HelpScout dashboard
599973c

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade

Musora Sentiment Analysis Dashboard

A Streamlit dashboard for visualising sentiment analysis results from social media comments (Facebook, Instagram, YouTube, Twitter), the Musora internal app, HelpScout customer support conversations, and Learning Paths lesson engagement data across brands (Drumeo, Pianote, Guitareo, Singeo, Playbass).


Table of Contents

  1. Project Structure
  2. How Data Flows
  3. Data Loading Strategy
  4. Pages
  5. Global Filters & Session State
  6. Snowflake Queries
  7. Authentication
  8. PDF Reports
  9. AI Agents
  10. Adding or Changing Things
  11. Running the App
  12. Configuration Reference

Project Structure

visualization/
β”œβ”€β”€ app.py                              # Entry point β€” routing, sidebar, session state
β”œβ”€β”€ config/
β”‚   └── viz_config.json                 # Colors, query strings, dashboard settings
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ data_loader.py                  # Comment Snowflake queries and caching
β”‚   └── helpscout_data_loader.py        # HelpScout Snowflake queries and caching
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ auth.py                         # Login page, authentication helpers
β”‚   β”œβ”€β”€ data_processor.py               # Pandas aggregations (intent dist, content summary, etc.)
β”‚   β”œβ”€β”€ metrics.py                      # KPI calculations (sentiment score, urgency, etc.)
β”‚   β”œβ”€β”€ pdf_exporter.py                 # DashboardPDFExporter (comment dashboard PDF)
β”‚   β”œβ”€β”€ helpscout_utils.py              # Pure helpers: parse_topics, explode_topics, boolean_flag_counts
β”‚   └── helpscout_pdf.py                # HelpScoutDashboardPDF + HelpScoutAnalysisPDF
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ dashboard.py                    # Comment Dashboard page renderer
β”‚   β”œβ”€β”€ sentiment_analysis.py           # Sentiment Analysis page renderer
β”‚   β”œβ”€β”€ reply_required.py               # Reply Required page renderer
β”‚   β”œβ”€β”€ helpscout_dashboard.py          # HelpScout Dashboard page + compact summary widget
β”‚   └── helpscout_analysis.py           # HelpScout Analysis page (filterβ†’fetchβ†’chartsβ†’LLMβ†’PDF)
β”œβ”€β”€ visualizations/
β”‚   β”œβ”€β”€ sentiment_charts.py             # Plotly sentiment chart functions
β”‚   β”œβ”€β”€ distribution_charts.py          # Plotly distribution / heatmap / scatter functions
β”‚   β”œβ”€β”€ demographic_charts.py           # Plotly demographic chart functions
β”‚   β”œβ”€β”€ content_cards.py                # Streamlit card components (comment + content cards)
β”‚   └── helpscout_charts.py             # HelpScoutCharts Plotly factory (16 chart types)
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ base_agent.py                   # BaseVisualizationAgent (shared interface)
β”‚   β”œβ”€β”€ content_summary_agent.py        # AI analysis for comment content summarisation
β”‚   └── helpscout_summary_agent.py      # HelpScoutSummaryAgent β€” page-level LLM summary from SUMMARY fields
β”œβ”€β”€ img/
β”‚   └── musora.png                      # Sidebar logo
└── SnowFlakeConnection.py              # Snowflake connection wrapper (Snowpark session)

How Data Flows

Snowflake
    β”‚
    β”œβ”€β”€ data_loader.py (SentimentDataLoader)
    β”‚       β”œβ”€β”€ load_dashboard_data()        ──► st.session_state['dashboard_df']
    β”‚       β”‚                                       └─► sidebar (filter options, counts)
    β”‚       β”‚                                       └─► dashboard.py (all charts)
    β”‚       β”œβ”€β”€ load_sa_data()               ──► st.session_state['sa_contents', 'sa_comments']
    β”‚       β”‚   (on-demand, Fetch button)           └─► sentiment_analysis.py
    β”‚       └── load_reply_required_data()   ──► st.session_state['rr_df']
    β”‚           (on-demand, Fetch button)           └─► reply_required.py
    β”‚
    └── helpscout_data_loader.py (HelpScoutDataLoader)
            β”œβ”€β”€ load_dashboard_data()        ──► st.session_state['helpscout_df']
            β”‚                                       └─► helpscout_dashboard.py
            β”‚                                       └─► dashboard.py (compact summary)
            └── load_analysis_data()         ──► st.session_state['hs_analysis_df']
                (on-demand, Fetch button)           └─► helpscout_analysis.py

Key principle: Data is loaded as little as possible, as late as possible.

  • Dashboard queries are lightweight (no text columns, no content join) and cached 24 hours.
  • Sentiment Analysis, Reply Required, and HelpScout Analysis pages wait for the user to click Fetch Data.
  • All data lives in st.session_state so page navigation and widget interactions never re-trigger Snowflake queries.

Data Loading Strategy

Comment data (data/data_loader.py β€” SentimentDataLoader)

load_dashboard_data()

  • Fetches only: comment_sk, content_sk, platform, brand, sentiment_polarity, intent, requires_reply, detected_language, comment_timestamp, processed_at, author_id.
  • No text columns, no DIM_CONTENT join.
  • Merges demographics data if demographics_query is configured.
  • Cached 24 hours. Called once at startup; stored in st.session_state['dashboard_df'].

load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, emotions, date_range)

  • Runs two Snowflake queries:
    1. Content aggregation β€” groups by content_sk, counts per sentiment, computes severity score, returns top N.
    2. Sampled comments β€” up to 50 per sentiment group per content (QUALIFY ROW_NUMBER() <= 50). display_text computed in SQL.
  • Returns (contents_df, comments_df). Cached 24 hours.

load_reply_required_data(platforms, brands, date_range)

  • Filters REQUIRES_REPLY = TRUE. Conditionally includes the social media table and/or musora table. Cached 24 hours.

SQL column qualification note

The social media table and DIM_CONTENT share column names. Any WHERE clause inside a query that joins them must use the table alias prefix (e.g. s.PLATFORM, s.COMMENT_TIMESTAMP) to avoid Snowflake ambiguous column name errors.


HelpScout data (data/helpscout_data_loader.py β€” HelpScoutDataLoader)

load_dashboard_data()

  • Lightweight query from SOCIAL_MEDIA_DB.ML_FEATURES.HELPSCOUT_CONVERSATION_FEATURES.
  • Columns: conversation_id, status, source, created_at, updated_at, duration_hours, sentiment_polarity, topics, is_refund_request, is_cancellation, is_membership, customer_email.
  • Merges demographics (age/timezone/experience) via email join (LOWER(customer_email) = LOWER(usora_users.email)).
  • After the demographics merge, adds is_member boolean: True when the customer email matched a Musora user record, False otherwise.
  • Cached 24 hours. Stored in st.session_state['helpscout_df'].

load_analysis_data(date_start, date_end, topics, sentiments, statuses, sources, is_refund, is_cancellation, is_membership)

  • Adds summary, sentiment_notes, topic_notes, customer_first_name, customer_last_name columns.
  • SQL WHERE pushdown for all filters; multi-label topic filter uses ARRAY_CONTAINS('topic_id'::VARIANT, SPLIT(TOPICS, ',')).
  • Cached 24 hours keyed on filter tuple. Stored in st.session_state['hs_analysis_df'].

get_filter_options(df)

  • Returns sentiments, topics (exploded and label-mapped from taxonomy), statuses, states, sources.

Pages

The app has 5 pages navigated via the sidebar radio:

1. Sentiment Dashboard (components/dashboard.py)

Receives: filtered_df β€” lightweight comment dataframe (after optional global filter from app.py).

Key sections:

  • Summary stats + health indicator
  • Sentiment distribution (pie + gauge)
  • Sentiment by brand and platform (stacked + percentage bar charts)
  • Intent analysis (bar + pie)
  • Emotion analysis (bar + pie) β€” only when emotions column is non-null
  • Brand–Platform heatmap
  • Reply requirements + urgency breakdown
  • Demographics (age, timezone, experience) β€” only when demographics were merged
  • HelpScout compact summary β€” appended at bottom; reads st.session_state['helpscout_df'] directly (guarded by try/except so failures never break the main dashboard)

2. Custom Sentiment Queries (components/sentiment_analysis.py)

Receives: data_loader instance only.

Flow:

  1. Reads st.session_state['dashboard_df'] for filter option lists.
  2. Pre-populates platform/brand dropdowns from st.session_state['global_filters'].
  3. On Fetch Data: calls data_loader.load_sa_data(...), stores results in st.session_state['sa_contents'] and ['sa_comments'].
  4. Renders content cards, per-content sentiment + intent + emotion charts, AI analysis buttons, sampled comment expanders.

Pagination: st.session_state['sentiment_page'] (5 contents per page). Reset on new fetch.


3. Reply Required (components/reply_required.py)

Receives: data_loader instance only.

Flow:

  1. Pre-populates platform/brand/date from st.session_state['global_filters'].
  2. On Fetch Data: calls data_loader.load_reply_required_data(...), stores result in st.session_state['rr_df'].
  3. Shows urgency breakdown, in-page filters (applied in Python, no extra query), paginated comment cards, and "Reply by Content" summary.

Pagination: st.session_state['reply_page'] (10 comments per page). Reset on new fetch.


4. HelpScout Dashboard (components/helpscout_dashboard.py)

Receives: helpscout_loader instance.

Reads from: st.session_state['helpscout_df'] (loaded at app startup).

Key sections:

  • Member status filter (radio at top): "All Customers / Members Only / Non-Members Only" β€” filters the entire dashboard view before any section renders
  • PDF export button (HelpScout Dashboard PDF)
  • 6 KPI metrics: total conversations, average duration, refund requests, cancellations, negative rate, membership joins
  • Sentiment distribution (pie + bar)
  • Topic distribution and sentiment heatmap (from process_helpscout/config_files/topics.json taxonomy)
  • Boolean flags (refund, cancellation, membership) breakdown
  • Status and source breakdown
  • Timelines expander (daily conversation volume, refund/cancel trend)
  • Depth expander (topic co-occurrence, escalation funnel)
  • Member vs Non-Member section: KPI metrics (member count, non-member count, email match rate) + member share pie chart + sentiment by member status stacked bar + top topics by member status grouped bar
  • Demographics (age, timezone, experience)

Note: Global sidebar filters (brand, platform, sentiment, date) do not apply to HelpScout pages β€” HelpScout is brand-agnostic and uses its own filter panel.


5. HelpScout Analysis (components/helpscout_analysis.py)

Receives: helpscout_loader instance.

Flow:

  1. Filter panel β€” date range, top_n, topics (multi-select with human-readable labels), sentiments, statuses, sources, 3 boolean checkboxes (refund / cancellation / membership), and a "Customer Type" selectbox (All / Members Only / Non-Members Only).
  2. Fetch Data button β€” calls helpscout_loader.load_analysis_data(...), stale-checked via fetch_key tuple. The Customer Type filter is not part of the Snowflake query β€” it is applied in Python after fetching, using the member email set derived from st.session_state['helpscout_df'].
  3. KPI row + distribution charts (sentiment, topics, flags, status).
  4. Member vs Non-Member section β€” always rendered when member data is available; shows share pie, sentiment stacked bar, and top-topics grouped bar split by member status.
  5. AI Summary section:
    • "Generate AI Summary" button β†’ calls HelpScoutSummaryAgent, stores result in st.session_state['hs_analysis_summary'].
    • Renders: executive summary, top themes, top complaints, unexpected insights, notable quotes.
    • "Export Analysis PDF" button β†’ generates HelpScoutAnalysisPDF.
  6. Paginated conversation cards β€” 10 per page; each card shows customer name, status, topics (label-mapped), summary, sentiment/topic notes.
  7. CSV export button.

Pagination: st.session_state['hs_analysis_page']. Reset on new fetch.

Date range default: Clamps to max(min_date, max_date βˆ’ default_date_range_days) so the default is always within the available data window.


Global Filters & Session State

Global filters apply only to comment pages (Dashboard, Sentiment Analysis, Reply Required). They have no effect on HelpScout pages.

st.session_state['global_filters'] = {
    'platforms':  ['facebook', 'instagram'],
    'brands':     ['drumeo'],
    'sentiments': [],
    'date_range': (date(2025, 1, 1), date(2025, 12, 31)),  # or None
}

Full session state key reference

Key Set by Used by
dashboard_df app.py startup sidebar, dashboard.py, SA + RR filter lists
global_filters sidebar "Apply Filters" app.py (dashboard filter), SA + RR pre-populate
filters_applied sidebar buttons app.py
sa_contents SA fetch button sentiment_analysis.py
sa_comments SA fetch button sentiment_analysis.py
sa_fetch_key SA fetch button SA stale-check
rr_df RR fetch button reply_required.py
rr_fetch_key RR fetch button RR stale-check
sentiment_page SA page / fetch SA pagination
reply_page RR page / fetch RR pagination
content_summaries SA AI buttons SA AI analysis display
helpscout_df app.py startup helpscout_dashboard.py (includes is_member), dashboard.py compact summary, helpscout_analysis.py member filter
hs_analysis_df HS Analysis fetch helpscout_analysis.py charts + cards
hs_analysis_fetch_key HS Analysis fetch HS Analysis stale-check
hs_analysis_filter_desc HS Analysis fetch human-readable filter string for PDF + agent
hs_analysis_summary "Generate AI Summary" HS Analysis summary renderer
hs_analysis_summary_key "Generate AI Summary" invalidated on re-fetch
hs_analysis_page HS Analysis page / fetch HS Analysis pagination

Snowflake Queries

Comment tables

Table Platform Notes
SOCIAL_MEDIA_DB.ML_FEATURES.COMMENT_SENTIMENT_FEATURES facebook, instagram, youtube, twitter Needs LEFT JOIN DIM_CONTENT for PERMALINK_URL
SOCIAL_MEDIA_DB.ML_FEATURES.MUSORA_COMMENT_SENTIMENT_FEATURES musora_app Has PERMALINK_URL and THUMBNAIL_URL natively

HelpScout table

Table Notes
SOCIAL_MEDIA_DB.ML_FEATURES.HELPSCOUT_CONVERSATION_FEATURES One row per conversation; multi-label topics in comma-separated TOPICS column

Static queries (in viz_config.json)

Key Purpose
dashboard_query Lightweight comment query β€” no text, no DIM_CONTENT join
demographics_query Joins usora_users + preprocessed.users for age/timezone/experience
helpscout.dashboard_query Lightweight HelpScout query (no SUMMARY/notes)
helpscout.demographics_query Same demographics join, keyed on customer_email

Dynamic queries (built in helpscout_data_loader.py)

Method Description
_build_analysis_query() Full HelpScout query including SUMMARY/notes; multi-label topic filter via ARRAY_CONTAINS

Authentication

Module: utils/auth.py

  • AUTHORIZED_EMAILS allowlist + APP_TOKEN env var.
  • render_login_page() renders the login form and calls st.stop() when not authenticated.
  • Gate is placed at the top of app.py (after st.set_page_config, before data loaders).
  • Current user and logout button are shown in the sidebar.

Required env vars:

APP_TOKEN=<shared token>

PDF Reports

Comment Dashboard PDF (utils/pdf_exporter.py β€” DashboardPDFExporter)

Generated from the "Export PDF Report" expander at the top of the Dashboard page.

Sections: cover, executive summary, sentiment, brand, platform, intent, cross-dimensional, volume, reply requirements, demographics (optional), language (optional), HelpScout summary (if data loaded), data summary.

HelpScout Dashboard PDF (utils/helpscout_pdf.py β€” HelpScoutDashboardPDF)

Generated from the HelpScout Dashboard page. Sections: cover, KPI summary, sentiment, topics, flags & escalation, status & source, timelines, depth, member vs non-member (metrics + pie + sentiment bar + topic grouped bar), demographics.

HelpScout Analysis PDF (utils/helpscout_pdf.py β€” HelpScoutAnalysisPDF)

Generated from the "Export Analysis PDF" button on the HelpScout Analysis page (only available after an AI Summary has been generated).

Sections: cover, filter summary, KPI summary (including member/non-member counts when available), chart snapshots, member vs non-member breakdown (pie + sentiment bar + topic grouped bar), AI summary (executive summary, top themes, top complaints, unexpected insights, notable quotes), conversation cards sample, metadata.

Dependencies: fpdf2, kaleido (for Plotly PNG rendering at 3Γ— scale).


AI Agents

ContentSummaryAgent (agents/content_summary_agent.py)

Summarises sampled comments for a single content item on the Sentiment Analysis page. Called per-content when the user clicks the AI analysis button. Results cached in st.session_state['content_summaries'].

HelpScoutSummaryAgent (agents/helpscout_summary_agent.py)

Produces a page-level executive report from the filtered HelpScout conversations by reading their pre-extracted SUMMARY fields through an LLM.

  • Stratified sample by sentiment_polarity β€” capped at max_conversations (default 300).
  • Builds aggregate context: sentiment breakdown, top topics, flag counts, average duration, then per-conversation summaries (capped at 250 chars each).
  • Prompt asks the LLM to surface patterns beyond the pre-tagged topics/sentiments.
  • Output structure:
{
    "executive_summary": "...",
    "top_themes": [{"theme": "...", "description": "...", "prevalence": "..."}],
    "top_complaints": ["..."],
    "unexpected_insights": ["..."],
    "notable_quotes": ["..."]
}
  • Uses LLMHelper.get_structured_completion() with up to 3 retries.

Adding or Changing Things

Add a new chart to the Comment Dashboard

  1. Write the chart function in the appropriate visualizations/ file.
  2. Call it from render_dashboard() in components/dashboard.py.

Add a new chart to the HelpScout Dashboard

  1. Add the chart method to HelpScoutCharts in visualizations/helpscout_charts.py.
  2. Call it from render_helpscout_dashboard() in components/helpscout_dashboard.py.

Add a new HelpScout filter

  1. Add the widget to the filter panel in helpscout_analysis.py.
  2. Include the new value in the fetch_key tuple.
  3. Add the corresponding WHERE clause condition to _build_analysis_query() in helpscout_data_loader.py.

Python-side filters (those whose data is not in the Snowflake HelpScout table) are applied after fetching rather than in SQL. The member/non-member filter is the canonical example: is_member is derived from st.session_state['helpscout_df'] after the Snowflake fetch. Such filters should not be included in the fetch_key tuple.

Add a new HelpScout topic

  • Edit process_helpscout/config_files/topics.json (the taxonomy file).
  • helpscout_utils.load_topic_taxonomy() reloads it on each app start; no other changes needed.

Change the cache duration

@st.cache_data(ttl=86400) appears on load_dashboard_data, _fetch_sa_data, _fetch_rr_data, load_demographics_data, and their HelpScout equivalents. Change 86400 to the desired TTL. Users can always force a refresh with "Reload Data" in the sidebar.

Add a new page

  1. Create components/new_page.py with a render_new_page(...) function.
  2. Import and add a radio option in app.py.
  3. Add data loading to the appropriate loader class.
  4. If the page should be excluded from global comment filters, extend the _hs_page guard in app.py.

Change what the Sentiment Analysis page queries

  • Edit _build_sa_content_query() and/or _build_sa_comments_query() in data_loader.py.
  • Update _process_sa_content_stats() and/or _process_sa_comments() for new columns.

Running the App

# From the project root
streamlit run visualization/app.py

Required environment variables (in .env at project root):

SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ACCOUNT
SNOWFLAKE_ROLE
SNOWFLAKE_DATABASE
SNOWFLAKE_WAREHOUSE
SNOWFLAKE_SCHEMA
OPENAI_API_KEY
APP_TOKEN

Configuration Reference

config/viz_config.json controls:

Section What it configures
color_schemes.sentiment_polarity Hex colors for each sentiment level
color_schemes.intent Hex colors per intent label
color_schemes.emotion Hex colors per emotion label
color_schemes.platform Hex colors per platform
color_schemes.brand Hex colors per brand
color_schemes_helpscout.topics Hex colors for HelpScout topic bars
color_schemes_helpscout.status Hex colors for conversation status values
color_schemes_helpscout.boolean_flags Hex colors for refund/cancellation/membership flags
sentiment_order Display order for sentiment categories
intent_order Display order for intent categories
emotion_order Display order for emotion categories
negative_sentiments Which sentiment values count as "negative"
dashboard.default_date_range_days Default date filter window for comment pages
helpscout.default_date_range_days Default date filter window for HelpScout Analysis
helpscout.max_summary_conversations Cap on conversations sent to LLM summary agent
helpscout.escalation_sentiments Sentiment values that count as escalation
snowflake.dashboard_query Lightweight comment dashboard query
snowflake.demographics_query Demographics join query (comment pages)
helpscout.dashboard_query Lightweight HelpScout dashboard query
helpscout.demographics_query Demographics join query (HelpScout, keyed on email)
demographics.age_groups Age bucket definitions (label β†’ [min, max])
demographics.experience_groups Experience bucket definitions
demographics.top_timezones_count How many timezones to show in the geographic chart