Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问谁有写好的将webmagic中的ResultItems 输出到excel中的文件?写好的pipeline文件,谢谢了 #683

Open
1BOB opened this issue Nov 5, 2017 · 11 comments

Comments

@1BOB
Copy link

1BOB commented Nov 5, 2017

No description provided.

@zyfxgo
Copy link
Contributor

zyfxgo commented Nov 5, 2017

可以输出csv格式

@1BOB
Copy link
Author

1BOB commented Nov 6, 2017

我是新手,能直接输出吗?之前在scrapy中能直接输出,webmagic要编码吗?,能给看看你写的代码吗?

@1BOB
Copy link
Author

1BOB commented Nov 6, 2017

看了,但是还不知道怎么处理map->csv的转换,原谅我还是菜鸟,求大神贴下代码。求!

@1BOB
Copy link
Author

1BOB commented Nov 7, 2017

在process函数里面有
page.putField("goods name", title);
page.putField("star level", star);
也就是在类ResultItems里面的 private Map<String, Object> fields = new LinkedHashMap<String, Object>();里面有这样的参数。
现在要求将goods name,star level放到csv中,要求最上面一行有个goods name star level 这样的分类(一般都会这么干吧),你给的链接好像点不开,能麻烦大哥帮我写下吗?谢谢你了

@1BOB
Copy link
Author

1BOB commented Nov 7, 2017

啊?没怎么听明白,[笑脸]

@1BOB
Copy link
Author

1BOB commented Nov 7, 2017

还请大哥不吝赐教,友情赠送我段代码吧。贴上一段项目自带的pipeline代码:
public class FilePipeline extends FilePersistentBase implements Pipeline {

private Logger logger = LoggerFactory.getLogger(getClass());

/**
 * create a FilePipeline with default path"/data/webmagic/"
 */
public FilePipeline() {
    setPath("/data/webmagic/");
}

public FilePipeline(String path) {
    setPath(path);
}

@Override
public void process(ResultItems resultItems, Task task) {
    String path = this.path + PATH_SEPERATOR + task.getUUID() + PATH_SEPERATOR;
    try {
        PrintWriter printWriter = new PrintWriter(new OutputStreamWriter(new FileOutputStream(getFile(path + DigestUtils.md5Hex(resultItems.getRequest().getUrl()) + ".html")),"UTF-8"));
        printWriter.println("url:\t" + resultItems.getRequest().getUrl());
        for (Map.Entry<String, Object> entry : resultItems.getAll().entrySet()) {
            if (entry.getValue() instanceof Iterable) {
                Iterable value = (Iterable) entry.getValue();
                printWriter.println(entry.getKey() + ":");
                for (Object o : value) {
                    printWriter.println(o);
                }
            } else {
                printWriter.println(entry.getKey() + ":\t" + entry.getValue());
            }
        }
        printWriter.close();
    } catch (IOException e) {
        logger.warn("write file error", e);
    }
}

}

@1BOB
Copy link
Author

1BOB commented Nov 7, 2017

我点开你给我的连接了,正在理解中,谢谢你

@1BOB
Copy link
Author

1BOB commented Nov 7, 2017

请问resultItems.keySet()这个函数是怎么写的,能贴一下吗?

@1BOB
Copy link
Author

1BOB commented Nov 8, 2017

public class CSVFilePipeline extends FilePersistentBase implements Pipeline {

private Logger logger = LoggerFactory.getLogger(getClass());

/**
 * create a FilePipeline with default path"/data/csv/"
 */
public CSVFilePipeline() {
    setPath("/data/csv/");
}

public CSVFilePipeline(String path) {
    setPath(path);
}

private final static String[] UN_INIT = new String[0];

private String CSV_SEPERATOR = ",";

private String[] names = UN_INIT;

public synchronized void init(String[] objects) {
	System.out.println("objects是"+objects.toString()+"UN_INIT是"+UN_INIT.toString());

// if (objects == UN_INIT) {
if(true){//上面的对象好像永远不能相等,后面再想办法,先让它进去一次
this.names = objects;
} else {
logger.warn("names is oready init", new UnsupportedOperationException());
}
}

public String[] getCacheArray(int x) {
	String[] cacheArray = new String[x];
	for (int i = 0; i < x; i++) {
		cacheArray[i] = new String("");
	}
	return cacheArray;
}

@Override
public void process(ResultItems resultItems, Task task) {
	if (names == UN_INIT) {
		String[] strArray = new String[6];
		init(resultItems.getAll().keySet().toArray(strArray));
	}

// String path = this.path + PATH_SEPERATOR + task.getUUID() + PATH_SEPERATOR;
String path="C:\Users\Administrator\Desktop";
System.out.println("000000000000000000000000000000000当前路径为"+path);
try {
System.out.println("接下来要创建文件了");
PrintWriter printWriter = new PrintWriter(new OutputStreamWriter(new FileOutputStream(getFile(path + DigestUtils.md5Hex(resultItems.getRequest().getUrl()) + ".csv")),"UTF-8"));
String[] cache = getCacheArray(names.length);
for (int i = 0; i < names.length; i++) {
Object value = resultItems.get(names[i]);
if (value != null) {
cache[i] = value.toString();
}
}
for (int i = 0; i < names.length; i++) {
if (i != 0) {
printWriter.print(CSV_SEPERATOR);
}
printWriter.print(cache[i]);
}
printWriter.close();
} catch (IOException e) {
logger.warn("write file error", e);
}
}

}

我稍微改了下,路径换成桌面了,但是桌面出不来.csv文件,这个单步执行也不知道哪错了,求指教

@1BOB
Copy link
Author

1BOB commented Nov 8, 2017

谢谢,谢谢

@wanfengsky
Copy link

请问这个怎么解决的?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants