bbstrader.models package
Module contents
Overview
The Models Module provides a collection of quantitative models for financial analysis and decision-making. It includes tools for portfolio optimization and natural language processing (NLP) to extract insights from financial text data. This module is designed to support quantitative trading strategies by providing a robust framework for financial modeling.
Features
Portfolio Optimization: Implements techniques to optimize portfolio allocation, helping to maximize returns and manage risk.
Natural Language Processing (NLP): Provides tools for analyzing financial news and other text-based data to gauge market sentiment.
Extensible Design: Structured to allow for the easy addition of new quantitative models and algorithms.
Components
Optimization: Contains portfolio optimization models and related utilities.
NLP: Includes tools and models for natural language processing tailored for financial applications.
Examples
>>> from bbstrader.models import optimized_weights
>>> # Assuming 'returns' is a DataFrame of asset returns
>>> optimal_weights = optimized_weights(returns=returns)
>>> print(optimal_weights)
Notes
This module is focused on providing the analytical tools for quantitative analysis. The models can be integrated into trading strategies to provide data-driven signals.
Submodules
bbstrader.models.nlp module
- class bbstrader.models.nlp.SentimentAnalyzer[source]
Bases:
objectA financial sentiment analysis tool that processes and analyzes sentiment from news articles, social media posts, and financial reports.
This class utilizes NLP techniques to preprocess text and apply sentiment analysis using VADER (SentimentIntensityAnalyzer) and optional TextBlob for enhanced polarity scoring.
- analyze_sentiment(texts, lexicon=None, textblob=False) float[source]
Analyzes the sentiment of a list of texts using VADER or TextBlob.
Steps: 1. If a custom lexicon is provided, updates the VADER lexicon. 2. If textblob is set to True, computes sentiment using TextBlob. 3. Otherwise, preprocesses the text and computes sentiment using VADER. 4. Returns the average sentiment score of all input texts.
- Parameters:
texts (list of str) – A list of text inputs to analyze.
lexicon (dict, optional) – A custom sentiment lexicon to update VADER’s default lexicon.
textblob (bool, optional) – If True, uses TextBlob for sentiment analysis instead of VADER.
- Returns:
- The average sentiment score across all input texts.
Positive values indicate positive sentiment.
Negative values indicate negative sentiment.
Zero indicates neutral sentiment.
- Return type:
float
- display_sentiment_dashboard(tickers, asset_type='stock', lexicon=None, interval=100000, top_n=20, **kwargs)[source]
Creates and runs a real-time sentiment analysis dashboard for financial assets.
The dashboard visualizes sentiment scores for given tickers using interactive bar and scatter plots. It fetches new sentiment data at specified intervals.
- Parameters:
tickers (list of str or list of tuple) – A list of financial asset tickers to analyze. * If using tuples, the first element is the ticker and the second is the asset type. * If using a single string, the asset type must be specified or defaults to “stock”.
asset_type (str, optional) – The type of financial asset (“stock”, “forex”, “crypto”). Default is “stock”.
lexicon (dict, optional) – A custom sentiment lexicon. Default is None.
interval (int, optional) – The refresh interval (in milliseconds) for sentiment data updates. Default is 100000.
top_n (int, optional) – The number of top and bottom assets to display in the sentiment bar chart. Default is 20.
**kwargs (dict) – Additional arguments required for fetching sentiment data. Must include: * client_id (str): Reddit API client ID. * client_secret (str): Reddit API client secret. * user_agent (str): User agent for Reddit API. * fmp_api (str): Financial Modeling Prep (FMP) API key.
- Returns:
Starts a real-time interactive dashboard. Does not return any value.
- Return type:
None
Example
sa = SentimentAnalyzer() sa.display_sentiment_dashboard( tickers=["AAPL", "TSLA", "GOOGL"], asset_type="stock", lexicon=my_lexicon, display=True, interval=5000, top_n=10, client_id="your_reddit_id", client_secret="your_reddit_secret", user_agent="your_user_agent", fmp_api="your_fmp_api_key", )
Notes
Sentiment analysis is performed using financial news and social media discussions.
The dashboard updates in real-time at the specified interval.
The dashboard will keep running unless manually stopped (Ctrl+C).
- get_sentiment_for_tickers(tickers: List[str] | List[Tuple[str, str]], lexicon=None, asset_type='stock', top_news=10, **kwargs) Dict[str, float][source]
Compute sentiment scores for a list of financial tickers based on news and social media data.
Process
1. Collect news articles and posts related to each ticker from various sources: * Yahoo Finance News * Google Finance News * Reddit posts * Financial Modeling Prep (FMP) news 2. Analyze sentiment from each source: * Uses VADER for Yahoo and Google Finance news. * Uses TextBlob for Reddit and FMP news. 3. Compute an overall sentiment score using a weighted average approach.
- param tickers:
A list of asset tickers to analyze. * If using tuples, the first element is the ticker and the second is the asset type. * If using a single string, the asset type must be specified or defaults to “stock”.
- type tickers:
list of str or list of tuple
- param lexicon:
A custom sentiment lexicon to update VADER’s default lexicon. Default is None.
- type lexicon:
dict, optional
- param asset_type:
The type of asset. Default is “stock”. Supported types include: * “stock”: Stock symbols (e.g., AAPL, MSFT) * “etf”: Exchange-traded funds (e.g., SPY, QQQ) * “future”: Futures contracts (e.g., CL=F for crude oil) * “forex”: Forex pairs (e.g., EURUSD=X, USDJPY=X) * “crypto”: Cryptocurrency pairs (e.g., BTC-USD, ETH-USD) * “index”: Stock market indices (e.g., ^GSPC for S&P 500)
- type asset_type:
str, optional
- param top_news:
Number of news articles/posts to fetch per source. Default is 10.
- type top_news:
int, optional
- param **kwargs:
Additional parameters for API authentication and data retrieval. Must include: * fmp_api (str): API key for Financial Modeling Prep. * client_id, client_secret, user_agent (str): Credentials for Reddit API.
- type **kwargs:
dict
- returns:
A dictionary mapping each ticker to its overall sentiment score. * Positive values indicate positive sentiment. * Negative values indicate negative sentiment. * Zero indicates neutral sentiment.
- rtype:
dict of str to float
Notes
Ticker names must follow Yahoo Finance conventions.
- get_topn_sentiments(sentiments, topn=10)[source]
Retrieves the top and bottom N assets based on sentiment scores.
- Parameters:
sentiments (dict) – A dictionary mapping asset tickers to their sentiment scores.
topn (int, optional) – The number of top and bottom assets to return. Defaults to 10.
- Returns:
- A tuple containing two lists:
bottom (list of tuples): The topn assets with the lowest sentiment scores, sorted in ascending order.
top (list of tuples): The topn assets with the highest sentiment scores, sorted in descending order.
- Return type:
tuple
- preprocess_text(text: str)[source]
Preprocesses the input text by performing the following steps: 1. Converts text to lowercase. 2. Removes URLs. 3. Removes all non-alphabetic characters (punctuation, numbers, special symbols). 4. Tokenizes the text into words. 5. Removes stop words. 6. Lemmatizes the words using SpaCy, excluding pronouns.
- Parameters:
text (str) – The input text to preprocess.
- Returns:
The cleaned and lemmatized text.
- Return type:
str
- visualize_sentiments(sentiment_dict, mode='bar', top_n=10)[source]
Visualizes sentiment scores for financial assets using different chart types.
Visualization Modes: - “bar”: Displays a bar chart of the top N assets by sentiment score. - “scatter”: Displays a scatter plot of sentiment scores.
- Parameters:
sentiment_dict (dict) – A dictionary mapping asset tickers to their sentiment scores.
mode (str, optional) – The type of visualization to generate. Options: “bar” (default), “scatter”.
top_n (int, optional) – The number of top tickers to display in the bar chart. Only applicable when mode is “bar”.
- Returns:
Displays the sentiment visualization.
- Return type:
None
bbstrader.models.optimization module
- bbstrader.models.optimization.equal_weighted(prices=None, returns=None, round_digits=5)[source]
Generates an equal-weighted portfolio by assigning an equal proportion to each asset.
- Parameters:
prices (pd.DataFrame, optional) – Price data for assets, where each column represents an asset.
returns (pd.DataFrame, optional) – Return data for assets. One of prices or returns must be provided.
round_digits (int, optional) – Number of decimal places to round each weight to (default is 5).
- Returns:
Dictionary with equal weights assigned to each asset, summing to 1.
- Return type:
dict
- Raises:
ValueError – If neither prices nor returns are provided.
Notes
Equal weighting is a simple allocation method that assumes equal importance across all assets, useful as a baseline model and when no strong views exist on asset return expectations or risk.
- bbstrader.models.optimization.hierarchical_risk_parity(prices=None, returns=None, freq=252)[source]
Computes asset weights using Hierarchical Risk Parity (HRP) for risk-averse portfolio allocation.
- Parameters:
prices (pd.DataFrame, optional) – Price data for assets; if provided, daily returns will be calculated.
returns (pd.DataFrame, optional) – Daily returns for assets. One of prices or returns must be provided.
freq (int, optional) – Number of days to consider in calculating portfolio weights (default is 252).
- Returns:
Optimized asset weights using the HRP method, with asset weights summing to 1.
- Return type:
dict
- Raises:
ValueError – If neither prices nor returns are provided.
Notes
Hierarchical Risk Parity is particularly useful for portfolios with a large number of assets, as it mitigates issues of multicollinearity and estimation errors in covariance matrices by using hierarchical clustering.
- bbstrader.models.optimization.markowitz_weights(prices=None, rfr=0.0, freq=252)[source]
Calculates optimal portfolio weights using Markowitz’s mean-variance optimization (Max Sharpe Ratio) with multiple solvers.
- Parameters:
prices (pd.DataFrame, optional) – Price data for assets, where rows represent time periods and columns represent assets.
freq (int, optional) – Frequency of the data, such as 252 for daily returns in a year (default is 252).
- Returns:
Dictionary containing the optimal asset weights for maximizing the Sharpe ratio, normalized to sum to 1.
- Return type:
dict
Notes
This function attempts to maximize the Sharpe ratio by iterating through various solvers (‘SCS’, ‘ECOS’, ‘OSQP’) from the PyPortfolioOpt library. If a solver fails, it proceeds to the next one. If none succeed, an error message is printed for each solver that fails.
This function is useful for portfolio with a small number of assets, as it may not scale well for large portfolios.
- Raises:
Exception – If all solvers fail, each will print an exception error message during runtime.
- bbstrader.models.optimization.optimized_weights(prices=None, returns=None, rfr=0.0, freq=252, method='equal')[source]
Selects an optimization method to calculate portfolio weights based on user preference.
- Parameters:
prices (pd.DataFrame, optional) – Price data for assets, required for certain methods.
returns (pd.DataFrame, optional) – Returns data for assets, an alternative input for certain methods.
freq (int, optional) – Number of days for calculating portfolio weights, such as 252 for a year’s worth of daily returns (default is 252).
method (str, optional) – Optimization method to use (‘markowitz’, ‘hrp’, or ‘equal’) (default is ‘equal’).
- Returns:
Dictionary containing optimized asset weights based on the chosen method.
- Return type:
dict
- Raises:
ValueError – If an unknown optimization method is specified.
Notes
This function integrates different optimization methods: - ‘markowitz’: mean-variance optimization with max Sharpe ratio - ‘hrp’: Hierarchical Risk Parity, for risk-based clustering of assets - ‘equal’: Equal weighting across all assets