R is not able to cope with null-strings (
R is not able to cope with null-strings (\0) in characters, does anyone know how to handle this? More concrete, I want to store complex R objects within a database using an ODBC or JDBC connection. Since complex R objects are not easily to be mapped to dataframes, I need a different possibility to store such objects. An object could be for example:
R无法处理字符中的空字符串(\ 0),是否有人知道如何处理这个?更具体地说,我想使用ODBC或JDBC连接在数据库中存储复杂的R对象。由于复杂的R对象不容易映射到数据帧,因此我需要一种不同的方式来存储这些对象。一个对象可以是例如:
library(kernlab)
data(iris)
model <- ksvm(Species ~ ., data=iris, type="C-bsvc", kernel="rbfdot", kpar="automatic", C=10)
Because >model< cannot be stored directly in a database, I use the serialize() function to retrieve a binary representation of the object (in order to store it in a BLOB column):
因为> model <不能直接存储在数据库中,所以我使用serialize()函数来检索对象的二进制表示(以便将其存储在blob列中):< p>
serialModel <- serialize(model, NULL)
Now I would like to store this via ODBC/JDBC. To do so, I need a string representation of the object in order to send a query to the database, e.g. INSERT INTO. Since the result is a vector of type raw vector, I need to convert it:
现在我想通过ODBC / JDBC存储它。为此,我需要对象的字符串表示,以便向数据库发送查询,例如,插入。由于结果是原始向量类型的向量,我需要转换它:
stringModel <- rawToChar(serialModel)
And there is the problem:
还有一个问题:
Error in rawToChar(serialModel) :
embedded nul in string: 'X\n\0\0\0\002\0\002\v\0......
R is not able to deal with \0 in strings. Does anyone has an idea how to bypass this restriction? Or is there probably a completly different approach to achieve this goal?
R无法处理字符串中的\ 0。有没有人知道如何绕过这个限制?或者可能有一种完全不同的方法来实现这一目标?
Thanks in advance
提前致谢
2 个解决方案
#1
10
You need
你需要
stringModel <- as.character(serialModel)
for a character representation of the raw bit codes. rawToChar will try to convert the raw bit codes, which is not what you want in this case.
用于原始位代码的字符表示。 rawToChar将尝试转换原始位代码,在这种情况下,这不是您想要的。
The resulting stringModel can be converted later on back to the original model by :
生成的stringModel可以稍后通过以下方式转换回原始模型:
newSerialModel <- as.raw(as.hexmode(stringModel))
newModel <- unserialize(newSerialModel)
all.equal(model,newModel)
[1] TRUE
Regarding the writing of binary types to databases through RODBC : as for today, the vignette of RODBC reads (p.11) :
关于通过RODBC将二进制类型写入数据库:至于今天,RODBC的插图读取(第11页):
Binary types can currently only be read as such, and they are returned as column of class "ODBC binary" which is a list of raw vectors.
目前只能读取二进制类型,它们作为“ODBC二进制”类的列返回,它是一个原始向量列表。
#2
4
A completely different approach would be to simply store the output of capture.output(dput(model)) along with a descriptive name and then reconstitute it with <- or assign(). See comments below regarding the need for capture.output().
一种完全不同的方法是简单地将capture.output(dput(model))的输出与描述性名称一起存储,然后用< - 或assign()重新构建它。请参阅以下有关capture.output()需求的注释。
> dput(Mat1)
structure(list(Weight = c(7.6, 8.4, 8.6, 8.6, 1.4), Date = c("04/28/11",
"04/29/11", "04/29/11", "04/29/11", "05/01/11"), Time = c("09:30 ",
"03:11", "05:32", "09:53", "19:52")), .Names = c("Weight", "Date",
"Time"), row.names = c(NA, -5L), class = "data.frame")
> y <- capture.output(dput(Mat1))
> y <- paste(y, collapse="", sep="") # Needed because capture output breaks into multiple lines
> dget(textConnection(y))
Weight Date Time
1 7.6 04/28/11 09:30
2 8.4 04/29/11 03:11
3 8.6 04/29/11 05:32
4 8.6 04/29/11 09:53
5 1.4 05/01/11 19:52
> new.Mat <- dget(textConnection(y))
) in characters, does anyone know how to handle this? More concrete, I want to store complex R objects within a database using an ODBC or JDBC connection. Since complex R objects are not easily to be mapped to dataframes, I need a different possibility to store such objects. An object could be for example:R is not able to cope with null-strings (
R is not able to cope with null-strings (\0) in characters, does anyone know how to handle this? More concrete, I want to store complex R objects within a database using an ODBC or JDBC connection. Since complex R objects are not easily to be mapped to dataframes, I need a different possibility to store such objects. An object could be for example:
R无法处理字符中的空字符串(\ 0),是否有人知道如何处理这个?更具体地说,我想使用ODBC或JDBC连接在数据库中存储复杂的R对象。由于复杂的R对象不容易映射到数据帧,因此我需要一种不同的方式来存储这些对象。一个对象可以是例如:
library(kernlab)
data(iris)
model <- ksvm(Species ~ ., data=iris, type="C-bsvc", kernel="rbfdot", kpar="automatic", C=10)
Because >model< cannot be stored directly in a database, I use the serialize() function to retrieve a binary representation of the object (in order to store it in a BLOB column):
因为> model <不能直接存储在数据库中,所以我使用serialize()函数来检索对象的二进制表示(以便将其存储在blob列中):< p>
serialModel <- serialize(model, NULL)
Now I would like to store this via ODBC/JDBC. To do so, I need a string representation of the object in order to send a query to the database, e.g. INSERT INTO. Since the result is a vector of type raw vector, I need to convert it:
现在我想通过ODBC / JDBC存储它。为此,我需要对象的字符串表示,以便向数据库发送查询,例如,插入。由于结果是原始向量类型的向量,我需要转换它:
stringModel <- rawToChar(serialModel)
And there is the problem:
还有一个问题:
Error in rawToChar(serialModel) :
embedded nul in string: 'X\n\0\0\0\002\0\002\v\0......
R is not able to deal with \0 in strings. Does anyone has an idea how to bypass this restriction? Or is there probably a completly different approach to achieve this goal?
R无法处理字符串中的\ 0。有没有人知道如何绕过这个限制?或者可能有一种完全不同的方法来实现这一目标?
Thanks in advance
提前致谢
2 个解决方案
#1
10
You need
你需要
stringModel <- as.character(serialModel)
for a character representation of the raw bit codes. rawToChar will try to convert the raw bit codes, which is not what you want in this case.
用于原始位代码的字符表示。 rawToChar将尝试转换原始位代码,在这种情况下,这不是您想要的。
The resulting stringModel can be converted later on back to the original model by :
生成的stringModel可以稍后通过以下方式转换回原始模型:
newSerialModel <- as.raw(as.hexmode(stringModel))
newModel <- unserialize(newSerialModel)
all.equal(model,newModel)
[1] TRUE
Regarding the writing of binary types to databases through RODBC : as for today, the vignette of RODBC reads (p.11) :
关于通过RODBC将二进制类型写入数据库:至于今天,RODBC的插图读取(第11页):
Binary types can currently only be read as such, and they are returned as column of class "ODBC binary" which is a list of raw vectors.
目前只能读取二进制类型,它们作为“ODBC二进制”类的列返回,它是一个原始向量列表。
#2
4
A completely different approach would be to simply store the output of capture.output(dput(model)) along with a descriptive name and then reconstitute it with <- or assign(). See comments below regarding the need for capture.output().
一种完全不同的方法是简单地将capture.output(dput(model))的输出与描述性名称一起存储,然后用< - 或assign()重新构建它。请参阅以下有关capture.output()需求的注释。
> dput(Mat1)
structure(list(Weight = c(7.6, 8.4, 8.6, 8.6, 1.4), Date = c("04/28/11",
"04/29/11", "04/29/11", "04/29/11", "05/01/11"), Time = c("09:30 ",
"03:11", "05:32", "09:53", "19:52")), .Names = c("Weight", "Date",
"Time"), row.names = c(NA, -5L), class = "data.frame")
> y <- capture.output(dput(Mat1))
> y <- paste(y, collapse="", sep="") # Needed because capture output breaks into multiple lines
> dget(textConnection(y))
Weight Date Time
1 7.6 04/28/11 09:30
2 8.4 04/29/11 03:11
3 8.6 04/29/11 05:32
4 8.6 04/29/11 09:53
5 1.4 05/01/11 19:52
> new.Mat <- dget(textConnection(y))
) in