A newer version of the Streamlit SDK is available: 1.58.0
Musora Sentiment Analysis Dashboard
A Streamlit dashboard for visualising sentiment analysis results from social media comments (Facebook, Instagram, YouTube, Twitter), the Musora internal app, HelpScout customer support conversations, and Learning Paths lesson engagement data across brands (Drumeo, Pianote, Guitareo, Singeo, Playbass).
Table of Contents
- Project Structure
- How Data Flows
- Data Loading Strategy
- Pages
- Global Filters & Session State
- Snowflake Queries
- Authentication
- PDF Reports
- AI Agents
- Adding or Changing Things
- Running the App
- Configuration Reference
Project Structure
visualization/
βββ app.py # Entry point β routing, sidebar, session state
βββ config/
β βββ viz_config.json # Colors, query strings, dashboard settings
βββ data/
β βββ data_loader.py # Comment Snowflake queries and caching
β βββ helpscout_data_loader.py # HelpScout Snowflake queries and caching
βββ utils/
β βββ auth.py # Login page, authentication helpers
β βββ data_processor.py # Pandas aggregations (intent dist, content summary, etc.)
β βββ metrics.py # KPI calculations (sentiment score, urgency, etc.)
β βββ pdf_exporter.py # DashboardPDFExporter (comment dashboard PDF)
β βββ helpscout_utils.py # Pure helpers: parse_topics, explode_topics, boolean_flag_counts
β βββ helpscout_pdf.py # HelpScoutDashboardPDF + HelpScoutAnalysisPDF
βββ components/
β βββ dashboard.py # Comment Dashboard page renderer
β βββ sentiment_analysis.py # Sentiment Analysis page renderer
β βββ reply_required.py # Reply Required page renderer
β βββ helpscout_dashboard.py # HelpScout Dashboard page + compact summary widget
β βββ helpscout_analysis.py # HelpScout Analysis page (filterβfetchβchartsβLLMβPDF)
βββ visualizations/
β βββ sentiment_charts.py # Plotly sentiment chart functions
β βββ distribution_charts.py # Plotly distribution / heatmap / scatter functions
β βββ demographic_charts.py # Plotly demographic chart functions
β βββ content_cards.py # Streamlit card components (comment + content cards)
β βββ helpscout_charts.py # HelpScoutCharts Plotly factory (16 chart types)
βββ agents/
β βββ base_agent.py # BaseVisualizationAgent (shared interface)
β βββ content_summary_agent.py # AI analysis for comment content summarisation
β βββ helpscout_summary_agent.py # HelpScoutSummaryAgent β page-level LLM summary from SUMMARY fields
βββ img/
β βββ musora.png # Sidebar logo
βββ SnowFlakeConnection.py # Snowflake connection wrapper (Snowpark session)
How Data Flows
Snowflake
β
βββ data_loader.py (SentimentDataLoader)
β βββ load_dashboard_data() βββΊ st.session_state['dashboard_df']
β β βββΊ sidebar (filter options, counts)
β β βββΊ dashboard.py (all charts)
β βββ load_sa_data() βββΊ st.session_state['sa_contents', 'sa_comments']
β β (on-demand, Fetch button) βββΊ sentiment_analysis.py
β βββ load_reply_required_data() βββΊ st.session_state['rr_df']
β (on-demand, Fetch button) βββΊ reply_required.py
β
βββ helpscout_data_loader.py (HelpScoutDataLoader)
βββ load_dashboard_data() βββΊ st.session_state['helpscout_df']
β βββΊ helpscout_dashboard.py
β βββΊ dashboard.py (compact summary)
βββ load_analysis_data() βββΊ st.session_state['hs_analysis_df']
(on-demand, Fetch button) βββΊ helpscout_analysis.py
Key principle: Data is loaded as little as possible, as late as possible.
- Dashboard queries are lightweight (no text columns, no content join) and cached 24 hours.
- Sentiment Analysis, Reply Required, and HelpScout Analysis pages wait for the user to click Fetch Data.
- All data lives in
st.session_stateso page navigation and widget interactions never re-trigger Snowflake queries.
Data Loading Strategy
Comment data (data/data_loader.py β SentimentDataLoader)
load_dashboard_data()
- Fetches only:
comment_sk, content_sk, platform, brand, sentiment_polarity, intent, requires_reply, detected_language, comment_timestamp, processed_at, author_id. - No text columns, no
DIM_CONTENTjoin. - Merges demographics data if
demographics_queryis configured. - Cached 24 hours. Called once at startup; stored in
st.session_state['dashboard_df'].
load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, emotions, date_range)
- Runs two Snowflake queries:
- Content aggregation β groups by
content_sk, counts per sentiment, computes severity score, returns top N. - Sampled comments β up to 50 per sentiment group per content (
QUALIFY ROW_NUMBER() <= 50).display_textcomputed in SQL.
- Content aggregation β groups by
- Returns
(contents_df, comments_df). Cached 24 hours.
load_reply_required_data(platforms, brands, date_range)
- Filters
REQUIRES_REPLY = TRUE. Conditionally includes the social media table and/or musora table. Cached 24 hours.
SQL column qualification note
The social media table and DIM_CONTENT share column names. Any WHERE clause inside a query that joins them must use the table alias prefix (e.g. s.PLATFORM, s.COMMENT_TIMESTAMP) to avoid Snowflake ambiguous column name errors.
HelpScout data (data/helpscout_data_loader.py β HelpScoutDataLoader)
load_dashboard_data()
- Lightweight query from
SOCIAL_MEDIA_DB.ML_FEATURES.HELPSCOUT_CONVERSATION_FEATURES. - Columns:
conversation_id, status, source, created_at, updated_at, duration_hours, sentiment_polarity, topics, is_refund_request, is_cancellation, is_membership, customer_email. - Merges demographics (age/timezone/experience) via email join (
LOWER(customer_email) = LOWER(usora_users.email)). - After the demographics merge, adds
is_memberboolean:Truewhen the customer email matched a Musora user record,Falseotherwise. - Cached 24 hours. Stored in
st.session_state['helpscout_df'].
load_analysis_data(date_start, date_end, topics, sentiments, statuses, sources, is_refund, is_cancellation, is_membership)
- Adds
summary, sentiment_notes, topic_notes, customer_first_name, customer_last_namecolumns. - SQL
WHEREpushdown for all filters; multi-label topic filter usesARRAY_CONTAINS('topic_id'::VARIANT, SPLIT(TOPICS, ',')). - Cached 24 hours keyed on filter tuple. Stored in
st.session_state['hs_analysis_df'].
get_filter_options(df)
- Returns
sentiments,topics(exploded and label-mapped from taxonomy),statuses,states,sources.
Pages
The app has 5 pages navigated via the sidebar radio:
1. Sentiment Dashboard (components/dashboard.py)
Receives: filtered_df β lightweight comment dataframe (after optional global filter from app.py).
Key sections:
- Summary stats + health indicator
- Sentiment distribution (pie + gauge)
- Sentiment by brand and platform (stacked + percentage bar charts)
- Intent analysis (bar + pie)
- Emotion analysis (bar + pie) β only when
emotionscolumn is non-null - BrandβPlatform heatmap
- Reply requirements + urgency breakdown
- Demographics (age, timezone, experience) β only when demographics were merged
- HelpScout compact summary β appended at bottom; reads
st.session_state['helpscout_df']directly (guarded bytry/exceptso failures never break the main dashboard)
2. Custom Sentiment Queries (components/sentiment_analysis.py)
Receives: data_loader instance only.
Flow:
- Reads
st.session_state['dashboard_df']for filter option lists. - Pre-populates platform/brand dropdowns from
st.session_state['global_filters']. - On Fetch Data: calls
data_loader.load_sa_data(...), stores results inst.session_state['sa_contents']and['sa_comments']. - Renders content cards, per-content sentiment + intent + emotion charts, AI analysis buttons, sampled comment expanders.
Pagination: st.session_state['sentiment_page'] (5 contents per page). Reset on new fetch.
3. Reply Required (components/reply_required.py)
Receives: data_loader instance only.
Flow:
- Pre-populates platform/brand/date from
st.session_state['global_filters']. - On Fetch Data: calls
data_loader.load_reply_required_data(...), stores result inst.session_state['rr_df']. - Shows urgency breakdown, in-page filters (applied in Python, no extra query), paginated comment cards, and "Reply by Content" summary.
Pagination: st.session_state['reply_page'] (10 comments per page). Reset on new fetch.
4. HelpScout Dashboard (components/helpscout_dashboard.py)
Receives: helpscout_loader instance.
Reads from: st.session_state['helpscout_df'] (loaded at app startup).
Key sections:
- Member status filter (radio at top): "All Customers / Members Only / Non-Members Only" β filters the entire dashboard view before any section renders
- PDF export button (HelpScout Dashboard PDF)
- 6 KPI metrics: total conversations, average duration, refund requests, cancellations, negative rate, membership joins
- Sentiment distribution (pie + bar)
- Topic distribution and sentiment heatmap (from
process_helpscout/config_files/topics.jsontaxonomy) - Boolean flags (refund, cancellation, membership) breakdown
- Status and source breakdown
- Timelines expander (daily conversation volume, refund/cancel trend)
- Depth expander (topic co-occurrence, escalation funnel)
- Member vs Non-Member section: KPI metrics (member count, non-member count, email match rate) + member share pie chart + sentiment by member status stacked bar + top topics by member status grouped bar
- Demographics (age, timezone, experience)
Note: Global sidebar filters (brand, platform, sentiment, date) do not apply to HelpScout pages β HelpScout is brand-agnostic and uses its own filter panel.
5. HelpScout Analysis (components/helpscout_analysis.py)
Receives: helpscout_loader instance.
Flow:
- Filter panel β date range, top_n, topics (multi-select with human-readable labels), sentiments, statuses, sources, 3 boolean checkboxes (refund / cancellation / membership), and a "Customer Type" selectbox (All / Members Only / Non-Members Only).
- Fetch Data button β calls
helpscout_loader.load_analysis_data(...), stale-checked viafetch_keytuple. The Customer Type filter is not part of the Snowflake query β it is applied in Python after fetching, using the member email set derived fromst.session_state['helpscout_df']. - KPI row + distribution charts (sentiment, topics, flags, status).
- Member vs Non-Member section β always rendered when member data is available; shows share pie, sentiment stacked bar, and top-topics grouped bar split by member status.
- AI Summary section:
- "Generate AI Summary" button β calls
HelpScoutSummaryAgent, stores result inst.session_state['hs_analysis_summary']. - Renders: executive summary, top themes, top complaints, unexpected insights, notable quotes.
- "Export Analysis PDF" button β generates
HelpScoutAnalysisPDF.
- "Generate AI Summary" button β calls
- Paginated conversation cards β 10 per page; each card shows customer name, status, topics (label-mapped), summary, sentiment/topic notes.
- CSV export button.
Pagination: st.session_state['hs_analysis_page']. Reset on new fetch.
Date range default: Clamps to max(min_date, max_date β default_date_range_days) so the default is always within the available data window.
Global Filters & Session State
Global filters apply only to comment pages (Dashboard, Sentiment Analysis, Reply Required). They have no effect on HelpScout pages.
st.session_state['global_filters'] = {
'platforms': ['facebook', 'instagram'],
'brands': ['drumeo'],
'sentiments': [],
'date_range': (date(2025, 1, 1), date(2025, 12, 31)), # or None
}
Full session state key reference
| Key | Set by | Used by |
|---|---|---|
dashboard_df |
app.py startup |
sidebar, dashboard.py, SA + RR filter lists |
global_filters |
sidebar "Apply Filters" | app.py (dashboard filter), SA + RR pre-populate |
filters_applied |
sidebar buttons | app.py |
sa_contents |
SA fetch button | sentiment_analysis.py |
sa_comments |
SA fetch button | sentiment_analysis.py |
sa_fetch_key |
SA fetch button | SA stale-check |
rr_df |
RR fetch button | reply_required.py |
rr_fetch_key |
RR fetch button | RR stale-check |
sentiment_page |
SA page / fetch | SA pagination |
reply_page |
RR page / fetch | RR pagination |
content_summaries |
SA AI buttons | SA AI analysis display |
helpscout_df |
app.py startup |
helpscout_dashboard.py (includes is_member), dashboard.py compact summary, helpscout_analysis.py member filter |
hs_analysis_df |
HS Analysis fetch | helpscout_analysis.py charts + cards |
hs_analysis_fetch_key |
HS Analysis fetch | HS Analysis stale-check |
hs_analysis_filter_desc |
HS Analysis fetch | human-readable filter string for PDF + agent |
hs_analysis_summary |
"Generate AI Summary" | HS Analysis summary renderer |
hs_analysis_summary_key |
"Generate AI Summary" | invalidated on re-fetch |
hs_analysis_page |
HS Analysis page / fetch | HS Analysis pagination |
Snowflake Queries
Comment tables
| Table | Platform | Notes |
|---|---|---|
SOCIAL_MEDIA_DB.ML_FEATURES.COMMENT_SENTIMENT_FEATURES |
facebook, instagram, youtube, twitter | Needs LEFT JOIN DIM_CONTENT for PERMALINK_URL |
SOCIAL_MEDIA_DB.ML_FEATURES.MUSORA_COMMENT_SENTIMENT_FEATURES |
musora_app | Has PERMALINK_URL and THUMBNAIL_URL natively |
HelpScout table
| Table | Notes |
|---|---|
SOCIAL_MEDIA_DB.ML_FEATURES.HELPSCOUT_CONVERSATION_FEATURES |
One row per conversation; multi-label topics in comma-separated TOPICS column |
Static queries (in viz_config.json)
| Key | Purpose |
|---|---|
dashboard_query |
Lightweight comment query β no text, no DIM_CONTENT join |
demographics_query |
Joins usora_users + preprocessed.users for age/timezone/experience |
helpscout.dashboard_query |
Lightweight HelpScout query (no SUMMARY/notes) |
helpscout.demographics_query |
Same demographics join, keyed on customer_email |
Dynamic queries (built in helpscout_data_loader.py)
| Method | Description |
|---|---|
_build_analysis_query() |
Full HelpScout query including SUMMARY/notes; multi-label topic filter via ARRAY_CONTAINS |
Authentication
Module: utils/auth.py
AUTHORIZED_EMAILSallowlist +APP_TOKENenv var.render_login_page()renders the login form and callsst.stop()when not authenticated.- Gate is placed at the top of
app.py(afterst.set_page_config, before data loaders). - Current user and logout button are shown in the sidebar.
Required env vars:
APP_TOKEN=<shared token>
PDF Reports
Comment Dashboard PDF (utils/pdf_exporter.py β DashboardPDFExporter)
Generated from the "Export PDF Report" expander at the top of the Dashboard page.
Sections: cover, executive summary, sentiment, brand, platform, intent, cross-dimensional, volume, reply requirements, demographics (optional), language (optional), HelpScout summary (if data loaded), data summary.
HelpScout Dashboard PDF (utils/helpscout_pdf.py β HelpScoutDashboardPDF)
Generated from the HelpScout Dashboard page. Sections: cover, KPI summary, sentiment, topics, flags & escalation, status & source, timelines, depth, member vs non-member (metrics + pie + sentiment bar + topic grouped bar), demographics.
HelpScout Analysis PDF (utils/helpscout_pdf.py β HelpScoutAnalysisPDF)
Generated from the "Export Analysis PDF" button on the HelpScout Analysis page (only available after an AI Summary has been generated).
Sections: cover, filter summary, KPI summary (including member/non-member counts when available), chart snapshots, member vs non-member breakdown (pie + sentiment bar + topic grouped bar), AI summary (executive summary, top themes, top complaints, unexpected insights, notable quotes), conversation cards sample, metadata.
Dependencies: fpdf2, kaleido (for Plotly PNG rendering at 3Γ scale).
AI Agents
ContentSummaryAgent (agents/content_summary_agent.py)
Summarises sampled comments for a single content item on the Sentiment Analysis page. Called per-content when the user clicks the AI analysis button. Results cached in st.session_state['content_summaries'].
HelpScoutSummaryAgent (agents/helpscout_summary_agent.py)
Produces a page-level executive report from the filtered HelpScout conversations by reading their pre-extracted SUMMARY fields through an LLM.
- Stratified sample by
sentiment_polarityβ capped atmax_conversations(default 300). - Builds aggregate context: sentiment breakdown, top topics, flag counts, average duration, then per-conversation summaries (capped at 250 chars each).
- Prompt asks the LLM to surface patterns beyond the pre-tagged topics/sentiments.
- Output structure:
{
"executive_summary": "...",
"top_themes": [{"theme": "...", "description": "...", "prevalence": "..."}],
"top_complaints": ["..."],
"unexpected_insights": ["..."],
"notable_quotes": ["..."]
}
- Uses
LLMHelper.get_structured_completion()with up to 3 retries.
Adding or Changing Things
Add a new chart to the Comment Dashboard
- Write the chart function in the appropriate
visualizations/file. - Call it from
render_dashboard()incomponents/dashboard.py.
Add a new chart to the HelpScout Dashboard
- Add the chart method to
HelpScoutChartsinvisualizations/helpscout_charts.py. - Call it from
render_helpscout_dashboard()incomponents/helpscout_dashboard.py.
Add a new HelpScout filter
- Add the widget to the filter panel in
helpscout_analysis.py. - Include the new value in the
fetch_keytuple. - Add the corresponding
WHEREclause condition to_build_analysis_query()inhelpscout_data_loader.py.
Python-side filters (those whose data is not in the Snowflake HelpScout table) are applied after fetching rather than in SQL. The member/non-member filter is the canonical example:
is_memberis derived fromst.session_state['helpscout_df']after the Snowflake fetch. Such filters should not be included in thefetch_keytuple.
Add a new HelpScout topic
- Edit
process_helpscout/config_files/topics.json(the taxonomy file). helpscout_utils.load_topic_taxonomy()reloads it on each app start; no other changes needed.
Change the cache duration
@st.cache_data(ttl=86400) appears on load_dashboard_data, _fetch_sa_data, _fetch_rr_data, load_demographics_data, and their HelpScout equivalents. Change 86400 to the desired TTL. Users can always force a refresh with "Reload Data" in the sidebar.
Add a new page
- Create
components/new_page.pywith arender_new_page(...)function. - Import and add a radio option in
app.py. - Add data loading to the appropriate loader class.
- If the page should be excluded from global comment filters, extend the
_hs_pageguard inapp.py.
Change what the Sentiment Analysis page queries
- Edit
_build_sa_content_query()and/or_build_sa_comments_query()indata_loader.py. - Update
_process_sa_content_stats()and/or_process_sa_comments()for new columns.
Running the App
# From the project root
streamlit run visualization/app.py
Required environment variables (in .env at project root):
SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ACCOUNT
SNOWFLAKE_ROLE
SNOWFLAKE_DATABASE
SNOWFLAKE_WAREHOUSE
SNOWFLAKE_SCHEMA
OPENAI_API_KEY
APP_TOKEN
Configuration Reference
config/viz_config.json controls:
| Section | What it configures |
|---|---|
color_schemes.sentiment_polarity |
Hex colors for each sentiment level |
color_schemes.intent |
Hex colors per intent label |
color_schemes.emotion |
Hex colors per emotion label |
color_schemes.platform |
Hex colors per platform |
color_schemes.brand |
Hex colors per brand |
color_schemes_helpscout.topics |
Hex colors for HelpScout topic bars |
color_schemes_helpscout.status |
Hex colors for conversation status values |
color_schemes_helpscout.boolean_flags |
Hex colors for refund/cancellation/membership flags |
sentiment_order |
Display order for sentiment categories |
intent_order |
Display order for intent categories |
emotion_order |
Display order for emotion categories |
negative_sentiments |
Which sentiment values count as "negative" |
dashboard.default_date_range_days |
Default date filter window for comment pages |
helpscout.default_date_range_days |
Default date filter window for HelpScout Analysis |
helpscout.max_summary_conversations |
Cap on conversations sent to LLM summary agent |
helpscout.escalation_sentiments |
Sentiment values that count as escalation |
snowflake.dashboard_query |
Lightweight comment dashboard query |
snowflake.demographics_query |
Demographics join query (comment pages) |
helpscout.dashboard_query |
Lightweight HelpScout dashboard query |
helpscout.demographics_query |
Demographics join query (HelpScout, keyed on email) |
demographics.age_groups |
Age bucket definitions (label β [min, max]) |
demographics.experience_groups |
Experience bucket definitions |
demographics.top_timezones_count |
How many timezones to show in the geographic chart |