Finance Model Agent¶

🤖 Automated Quantitative Trading & Iterative Model Evolution¶

📖 Background¶

In the realm of quantitative finance, both factor discovery and model development play crucial roles in driving performance. While much attention is often given to the discovery of new financial factors, the models that leverage these factors are equally important. The effectiveness of a quantitative strategy depends not only on the factors used but also on how well these factors are integrated into robust, predictive models.

However, the process of developing and optimizing these models can be labor-intensive and complex, requiring continuous refinement and adaptation to ever-changing market conditions. And this is where the Finance Model Agent steps in.

🎥 Demo ¶

🌟 Introduction¶

In this scenario, our automated system proposes hypothesis, constructs model, implements code, conducts back-testing, and utilizes feedback in a continuous, iterative process.

The goal is to automatically optimize performance metrics within the Qlib library, ultimately discovering the most efficient code through autonomous research and development.

Here’s an enhanced outline of the steps:

Step 1 : Hypothesis Generation 🔍

Generate and propose initial hypotheses based on previous experiment analysis and domain expertise, with thorough reasoning and financial justification.

Step 2 : Model Creation ✨

Transform the hypothesis into a task.
Develop, define, and implement a quantitative model, including its name, description, and formulation.

Step 3 : Model Implementation 👨‍💻

Implement the model code based on the detailed description.
Evolve the model iteratively as a developer would, ensuring accuracy and efficiency.

Step 4 : Backtesting with Qlib 📉

Conduct backtesting using the newly developed model and 20 factors extracted from Alpha158 in Qlib.
Evaluate the model’s effectiveness and performance.

Dataset

Model

Factors

Data Split

CSI300

RDAgent-dev

20 factors (Alpha158)

Train	2008-01-01 to 2014-12-31
Valid	2015-01-01 to 2016-12-31
Test	2017-01-01 to 2020-08-01

Step 5 : Feedback Analysis 🔍

Analyze backtest results to assess performance.
Incorporate feedback to refine hypotheses and improve the model.

Step 6 :Hypothesis Refinement ♻️

Refine hypotheses based on feedback from backtesting.
Repeat the process to continuously improve the model.

⚡ Quick Start¶

Please refer to the installation part in Installation and Configuration to prepare your system dependency.

You can try our demo by running the following command:

🐍 Create a Conda Environment
- Create a new conda environment with Python (3.10 and 3.11 are well tested in our CI):
```
conda create -n rdagent python=3.10
```
- Activate the environment:
```
conda activate rdagent
```
📦 Install the RDAgent
- You can install the RDAgent package from PyPI:
```
pip install rdagent
```
🚀 Run the Application
- You can directly run the application by using the following command:
```
rdagent fin_model
```

🛠️ Usage of modules¶

Env Config

The following environment variables can be set in the .env file to customize the application’s behavior:

pydantic settings rdagent.app.qlib_rd_loop.conf.ModelBasePropSetting¶

Show JSON schema

{
   "title": "ModelBasePropSetting",
   "type": "object",
   "properties": {
      "scen": {
         "default": "rdagent.scenarios.qlib.experiment.model_experiment.QlibModelScenario",
         "title": "Scen",
         "type": "string"
      },
      "knowledge_base": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Knowledge Base"
      },
      "knowledge_base_path": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Knowledge Base Path"
      },
      "hypothesis_gen": {
         "default": "rdagent.scenarios.qlib.proposal.model_proposal.QlibModelHypothesisGen",
         "title": "Hypothesis Gen",
         "type": "string"
      },
      "interactor": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Interactor"
      },
      "hypothesis2experiment": {
         "default": "rdagent.scenarios.qlib.proposal.model_proposal.QlibModelHypothesis2Experiment",
         "title": "Hypothesis2Experiment",
         "type": "string"
      },
      "coder": {
         "default": "rdagent.scenarios.qlib.developer.model_coder.QlibModelCoSTEER",
         "title": "Coder",
         "type": "string"
      },
      "runner": {
         "default": "rdagent.scenarios.qlib.developer.model_runner.QlibModelRunner",
         "title": "Runner",
         "type": "string"
      },
      "summarizer": {
         "default": "rdagent.scenarios.qlib.developer.feedback.QlibModelExperiment2Feedback",
         "title": "Summarizer",
         "type": "string"
      },
      "evolving_n": {
         "default": 10,
         "title": "Evolving N",
         "type": "integer"
      },
      "train_start": {
         "default": "2008-01-01",
         "title": "Train Start",
         "type": "string"
      },
      "train_end": {
         "default": "2014-12-31",
         "title": "Train End",
         "type": "string"
      },
      "valid_start": {
         "default": "2015-01-01",
         "title": "Valid Start",
         "type": "string"
      },
      "valid_end": {
         "default": "2016-12-31",
         "title": "Valid End",
         "type": "string"
      },
      "test_start": {
         "default": "2017-01-01",
         "title": "Test Start",
         "type": "string"
      },
      "test_end": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "2020-08-01",
         "title": "Test End"
      }
   },
   "additionalProperties": false
}

Config:

env_prefix: str = QLIB_MODEL_
protected_namespaces: tuple = ()

field coder: str = 'rdagent.scenarios.qlib.developer.model_coder.QlibModelCoSTEER'¶: Coder class

field evolving_n: int = 10¶: Number of evolutions

field hypothesis2experiment: str = 'rdagent.scenarios.qlib.proposal.model_proposal.QlibModelHypothesis2Experiment'¶: Hypothesis to experiment class

field hypothesis_gen: str = 'rdagent.scenarios.qlib.proposal.model_proposal.QlibModelHypothesisGen'¶: Hypothesis generation class

field runner: str = 'rdagent.scenarios.qlib.developer.model_runner.QlibModelRunner'¶: Runner class

field scen: str = 'rdagent.scenarios.qlib.experiment.model_experiment.QlibModelScenario'¶: Scenario class for Qlib Model

field summarizer: str = 'rdagent.scenarios.qlib.developer.feedback.QlibModelExperiment2Feedback'¶: Summarizer class

field test_end: str | None = '2020-08-01'¶: End date of the test / backtest segment

field test_start: str = '2017-01-01'¶: Start date of the test / backtest segment

field train_end: str = '2014-12-31'¶: End date of the training segment

field train_start: str = '2008-01-01'¶: Start date of the training segment

field valid_end: str = '2016-12-31'¶: End date of the validation segment

field valid_start: str = '2015-01-01'¶: Start date of the validation segment

Qlib Config
- The config.yaml file located in the model_template folder contains the relevant configurations for running the developed model in Qlib. The default settings include key information such as:
  
  market: Specifies the market, which is set to csi300.
  
  fields_group: Defines the fields group, with the value feature.
  
  col_list: A list of columns used, including various indicators such as RESI5, WVMA5, RSQR5, and others.
  
  start_time: The start date for the data, set to 2008-01-01.
  
  end_time: The end date for the data, set to 2020-08-01.
  
  fit_start_time: The start date for fitting the model, set to 2008-01-01.
  
  fit_end_time: The end date for fitting the model, set to 2014-12-31.
- The default hyperparameters used in the configuration are as follows:
  
  n_epochs: The number of epochs, set to 100.
  
  lr: The learning rate, set to 1e-3.
  
  early_stop: The early stopping criterion, set to 10.
  
  batch_size: The batch size, set to 2000.
  
  metric: The evaluation metric, set to loss.
  
  loss: The loss function, set to mse.
  
  n_jobs: The number of parallel jobs, set to 20.