Версия: 2.0.x

Частые вопросы по загрузке данных

1. Что делать при ошибке "close index channel failed" или "too many tablet versions"?

Вы выполняли задания загрузки слишком часто, и данные не были своевременно скомпактифицированы. В результате количество версий данных, сгенерированных во время загрузки, превышает максимально допустимое количество версий данных (по умолчанию 1000). Используйте один из следующих методов для решения этой проблемы:

Увеличьте объём данных, загружаемых в каждом отдельном задании, тем самым уменьшив частоту загрузки.
Измените некоторые параметры конфигурации в файле конфигурации BE be.conf каждого BE для ускорения компактификации:
- For Duplicate Key tables, Aggregate tables, and Unique Key tables, you can appropriately increase the values of cumulative_compaction_num_threads_per_disk, base_compaction_num_threads_per_disk, and cumulative_compaction_check_interval_seconds. Example:
```
cumulative_compaction_num_threads_per_disk = 4
base_compaction_num_threads_per_disk = 2
cumulative_compaction_check_interval_seconds = 2
```
- For Primary Key tables, you can appropriately increase the value of update_compaction_num_threads_per_disk and decrease the value of update_compaction_per_tablet_min_interval_seconds.
After you modify the settings of the preceding configuration items, you must observe the memory and I/O to ensure that they are normal.

2. What do I do if the "Label Already Exists" error occurs?

This error occurs because the load job has the same label as another load job, which has been successfully run or is being run, within the same Selena database.

Stream Load jobs are submitted according to HTTP. In general, request retry logic is embedded in HTTP clients of all programmatic languages. When the Selena cluster receives a load job request from an HTTP client, it immediately starts to process the request, but it does not return the job result to the HTTP client in a timely manner. As a result, the HTTP client sends the same load job request again. However, the Selena cluster is already processing the first request and therefore returns the Label Already Exists error for the second request.

Do as follows to check that load jobs submitted by using different loading methods do not have the same label and are not repeatedly submitted:

View the FE log and check whether the label of the failed load job is recorded twice. If the label is recorded twice, the client has submitted the load job request twice.

NOTE

The Selena cluster does not distinguish between the labels of load jobs based on loading methods. Therefore, load jobs submitted by using different loading methods may have the same label.
Run SHOW LOAD WHERE LABEL = "xxx" to check for load jobs that have the same label and are in the FINISHED state.

NOTE

xxx is the label that you want to check.

Before you submit a load job, we recommend that you calculate the approximate amount of time required to load the data and then adjust the client-side request timeout period accordingly. This way, you can prevent the client from submitting the load job request multiple times.

3. What do I do if the "ETL_QUALITY_UNSATISFIED; msg:quality not good enough to cancel" error occurs?

Execute SHOW LOAD, and use the error URL in the returned execution result to view the error details.

Common data quality errors are as follows:

"convert csv string to INT failed."

Strings from a source column failed to be transformed into the data type of the matching destination column. For example, abc failed to be transformed into a numeric value.
"the length of input is too long than schema."

Values from a source column are in lengths that are not supported by the matching destination column. For example, the source column values of CHAR data type exceed the destination column's maximum length specified at table creation, or the source column values of INT data type exceed 4 bytes.
"actual column number is less than schema column number."

After a source row is parsed based on the specified column separator, the number of columns obtained is smaller than the number of columns in the destination table. A possible reason is that the column separator specified in the load command or statement differs from the column separator that is actually used in that row.
"actual column number is more than schema column number."

After a source row is parsed based on the specified column separator, the number of columns obtained is greater than the number of columns in the destination table. A possible reason is that the column separator specified in the load command or statement differs from the column separator that is actually used in that row.
"the frac part length longer than schema scale."

The decimal parts of values from a DECIMAL-type source column exceed the specified length.
"the int part length longer than schema precision."

The integer parts of values from a DECIMAL-type source column exceed the specified length.
"there is no corresponding partition for this key."

The value in the partition column for a source row is not within the partition range.

4. What do I do if RPC times out?

Check the setting of the write_buffer_size configuration item in the BE configuration file be.conf of each BE. This configuration item is used to control the maximum size per memory block on the BE. The default maximum size is 100 MB. If the maximum size is exceedingly large, Remote Procedure Call (RPC) may time out. To resolve this issue, adjust the settings of the write_buffer_size and tablet_writer_rpc_timeout_sec configuration items in the BE configuration file. For more information, see BE configurations.

5. What do I do if the "Value count does not match column count" error occurs?

After my load job failed, I used the error URL returned in the job result to retrieve the error details and found the "Value count does not match column count" error, which indicates a mismatch between the number of columns in the source data file and the number of columns in the destination Selena table:

Error: Value count does not match column count. Expect 3, but got 1. Row: 2023-01-01T18:29:00Z,cpu0,80.99
Error: Value count does not match column count. Expect 3, but got 1. Row: 2023-01-01T18:29:10Z,cpu1,75.23
Error: Value count does not match column count. Expect 3, but got 1. Row: 2023-01-01T18:29:20Z,cpu2,59.44

The reason for this issue is as follows:

The column separator specified in the load command or statement differs from the column separator that is actually used in the source data file. In the preceding example, the CSV-formatted data file consists of three columns, which are separated with commas (,). However, \t is specified as the column separator in the load command or statement. As a result, the three columns from the source data file are incorrectly parsed into one column.

Specify commas (,) as the column separator in the load command or statement. Then, submit the load job again.

6. What do I do if the "current running txns on db XXX is 100, larger than limit 100" error occurs?

Increase the value of the FE configuration max_running_txn_num_per_db.

7. Why do I get a curl ERRORURL saying `be/storage/error_log` does not exist during data import?

BE error logs are kept for 48 hours by default and are cleaned up afterward. You can adjust the retention time using load_error_log_reserve_hours.

8. How do I troubleshoot the error “Tablet is in error state … prepare_segment_writer meet invalid rssid” during import?

This issue is usually caused by version lag. Compare tablet versions at the partition level to check whether publish is stuck. Use the following SQL to compare versions:

SELECT * FROM information_schema.be_tablets;
SELECT * FROM information_schema.partitions_meta;

If only a few tablets are inconsistent, mark the lagging replicas as bad so they can be cloned from healthy ones.

If it's caused by an ongoing large table update or schema change, locate the affected partition based on the error and consider deleting and reloading it.

If the issue persists, try restarting FE and the problematic BE; if still ineffective, restart all BEs.

9. Why does DELETE fail with “failed to execute delete, transaction id xxx, timeout(ms) 30000”?

Increase the value of the FE configuration load_straggler_wait_second to 600 (Default: 300).

10. How to handle the error “Selena planner use long time 3000 ms …”?

The SQL may be too complex. Increase the value of the session variable new_planner_optimize_timeout.

11. How to fix the error “Primary-key index exceeds the limit.”?

It is because that the Primary Key index exceeded memory limits. You can enable persistent index by setting the table property enable_persistent_index to true.

12. How to resolve “current running txns on db XXX is 100, larger than limit 100”?

Increase the value of the FE configuration max_running_txn_num_per_db.

1. Что делать при ошибке "close index channel failed" или "too many tablet versions"?​

2. What do I do if the "Label Already Exists" error occurs?​

3. What do I do if the "ETL_QUALITY_UNSATISFIED; msg:quality not good enough to cancel" error occurs?​

4. What do I do if RPC times out?​

5. What do I do if the "Value count does not match column count" error occurs?​

6. What do I do if the "current running txns on db XXX is 100, larger than limit 100" error occurs?​

7. Why do I get a curl ERRORURL saying be/storage/error_log does not exist during data import?​

8. How do I troubleshoot the error “Tablet is in error state … prepare_segment_writer meet invalid rssid” during import?​

9. Why does DELETE fail with “failed to execute delete, transaction id xxx, timeout(ms) 30000”?​

10. How to handle the error “Selena planner use long time 3000 ms …”?​

11. How to fix the error “Primary-key index exceeds the limit.”?​

12. How to resolve “current running txns on db XXX is 100, larger than limit 100”?​